E-LIS, Eprints in Library and Information Science Homepage E-LIS, Eprints in Library and Information Science
   home   |   about   |   search   |   browse   |   register   |   registered users area   |   help   |   FAQ   |   JITA   

Different Indexing Strategies for Multilingual Web Retrieval: Experiments with the EuroGOV Corpus

Niels, Jensen and Thomas, Mandl (2006) Different Indexing Strategies for Multilingual Web Retrieval: Experiments with the EuroGOV Corpus. Delivered at 17th ACM Conference on Hypertext and Hypermedia (HT '06), Odense, Denmark. Presentation.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

Experiments with a multi-lingual web
collection are presented. The EuroGOV
corpus is the first multi-lingual web corpus
for retrieval evaluation. We show how
indexes based on words and n-grams are
developed for different document parts.
Different indexes were based on the full
document content, partial content and the
title. The best results were achieved for a
title only index based on words.

Keywords:web information retrieval, multilingual information systems
Subjects:L. Information technology and library technology. > LM. Automatic text retrieval.
L. Information technology and library technology. > LS. Search engines.
L. Information technology and library technology. > LC. Internet, including WWW.
ID Code:7080
Deposited By:Mandl, Thomas
Deposited On:28 August 2006
Alternative Locations:http://www.uni-hildesheim.de/~mandl/Publikationen/ht06PosterPresentedJensenMandl.PDF
All fields:Show all fields

Jensen, Niels; Mandl, Thomas (2006): Different Indexing Strategies for Multilingual Web Retrieval: Experiments with the EuroGOV Corpus. In: Proceedings of the 17th ACM Conference on Hypertext and Hypermedia (HT '06) Odense, Denmark, August 22nd –25th. ACM Press. S. 169-170. http://doi.acm.org/10.1145/1149941.1149974

Archive Staff Only: edit this record