Computers

Computers & software — programming, internet, hardware, security, AI & more.

39644 resources

Computers/Internet/Searching/Directories/Curlie/Data Users

Data Users resources

Advanced Methods to Audit Online Web Services

In her doctoral thesis, Pelayo Vallina Rodr��guez examines various criteria for the quality of web services using Curlie data: the timeliness of data, plausible application of taxonomy, a multi-lingual classification of domains, and frequency of identical site content etc. Universidad Carlos III de Madrid / IMDEA Networks Institute, 2022, 163 pages. [PDF]

Crawling the German Health Web: Exploratory Study and Graph Analysis

Richard Zowalla, Thomas Wetter, and Daniel Pfeifer from Heilbronn University and Heidelberg University used the World/Deutsch/Gesundheit branch dataset to examine the structure, origin, plausibility, and familiarity of health sites in Austria, Switzerland and Germany. The results were incorporated into further scientific research. Journal of Medical Internet Research, 2020.

Homepage2Vec: Language-Agnostic Website Embedding and Classification

The use of the Curlie dataset for software for language-independent classification and embedding of arbitrary websites is explained. The concept, model architecture, features, and training phase of a neural network, as well as the evaluation, are also described with diagrams. Sylvain Lugeon, Tiziano Piccardi, Robert West from Ecole polytechnique f��d��rale de Lausanne (EPFL), 2022, 7 pages.

Identifying Sensitive URLs at Web-Scale

Development of a text classifier to identify sensitive data URLs based on criteria such as religion, health, sexual orientation, and others. Curlie data was used for scientific training of the classifier in order to implement data protection laws. Srdjan Matic und Georgios Smaragdakis from Technische Universit?t Berlin, Costas Iordanou from Cyprus University of Technology, and Nikolaos Laoutaris from IMDEA Networks Institute Madrid; 2020, 15 pages.

On the Prevalence of Leichte Sprache on the German Web

Study on the prevalence of plain language on the German web using web analytics and qualitative methods based on Curlie data. Technical and political recommendations for a barrier-free web are provided. Hadi Asghari, Freya Hewett, Theresa Z��ger from Alexander von Humboldt Institute for Internet and Society Berlin, Germany; 2023, 6 pages.

Open WebSearch - Open Search Foundation e.V.

The data in the Curlie directory are part of the Open Web Index, which is maintained and developed at the University of Passau, Germany. With information about the index and research activities in the fields of information science and others.

TenTen Corpus Family - Lexical Computing CZ s.r.o.

The Curlie taxonomy act as a lexical standard to classify words and word meanings from web texts for text collections in over 35 languages. The collections are used in linguistics to validate linguistic rules, and to research the frequency of terms and word meanings, and to determine language development etc. The goal of the collection is 10 billion words (��tenTen��) in size.

Web Directories: A Searching Tool

Anubhaw Kumar Suman and Dr. Madhu Patel from the Mahatma Gandhi Central University, India, describe history, types, structure and function of web directories, Curlie included. Advantages and disadvantages are examined. Target group-specific aspects (academic, kids, family etc.) and relevance are explained. International Conference on Knowledge Management in Higher Education Institutions at Manipal University Jaipur, India. April 2022, 18 pages.

Web2Wiki: Characterizing Wikipedia Linking Across the Web

The study examines the role of Wikipedia in the ��information ecosystem�� of how Wikipedia is linked across the Web, providing foundational estimates of its presence and influence. Curlie's taxonomy and statistical data were used to evaluate the collected data. The frequency of citations and references to Wikipedia and the use of images from the encyclopedia were measured. From Veselovsky (Princeton University), Piccardi (Stanford University) et al.; 2025, 13 pages. [PDF]