Different Indexing Strategies for Multilingual Web Retrieval: Experiments with the EuroGOV Corpus
(2006) Different Indexing Strategies for Multilingual Web Retrieval: Experiments with the EuroGOV Corpus. Delivered at 17th ACM Conference on Hypertext and Hypermedia (HT '06), Odense, Denmark. Presentation.
Full text available as: |
Abstract
Experiments with a multi-lingual web
collection are presented. The EuroGOV
corpus is the first multi-lingual web corpus
for retrieval evaluation. We show how
indexes based on words and n-grams are
developed for different document parts.
Different indexes were based on the full
document content, partial content and the
title. The best results were achieved for a
title only index based on words.
| Keywords: | web information retrieval, multilingual information systems |
|---|---|
| Subjects: | L. Information technology and library technology. > LM. Automatic text retrieval. L. Information technology and library technology. > LS. Search engines. L. Information technology and library technology. > LC. Internet, including WWW. |
| ID Code: | 7080 |
| Deposited By: | Mandl, Thomas |
| Deposited On: | 28 August 2006 |
| Alternative Locations: | http://www.uni-hildesheim.de/~mandl/Publikationen/ht06PosterPresentedJensenMandl.PDF |
| All fields: | Show all fields |
Archive Staff Only: edit this record

