Different Indexing Strategies for Multilingual Web Retrieval: Experiments with the EuroGOV Corpus

Niels, Jensen and Thomas, Mandl Different Indexing Strategies for Multilingual Web Retrieval: Experiments with the EuroGOV Corpus., 2006 . In 17th ACM Conference on Hypertext and Hypermedia (HT '06), Odense, Denmark, 2006 August 22nd –25th. (Unpublished) [Presentation]

[img]
Preview
PDF
ht06PosterPresentedJensenMandl.PDF

Download (165kB) | Preview

English abstract

Experiments with a multi-lingual web collection are presented. The EuroGOV corpus is the first multi-lingual web corpus for retrieval evaluation. We show how indexes based on words and n-grams are developed for different document parts. Different indexes were based on the full document content, partial content and the title. The best results were achieved for a title only index based on words.

Item type: Presentation
Keywords: web information retrieval, multilingual information systems
Subjects: L. Information technology and library technology > LM. Automatic text retrieval.
L. Information technology and library technology > LS. Search engines.
L. Information technology and library technology > LC. Internet, including WWW.
Depositing user: Thomas Mandl
Date deposited: 28 Aug 2006
Last modified: 02 Oct 2014 12:04
URI: http://hdl.handle.net/10760/8033

References

Jensen, Niels; Mandl, Thomas (2006): Different Indexing Strategies for Multilingual Web Retrieval: Experiments with the EuroGOV Corpus. In: Proceedings of the 17th ACM Conference on Hypertext and Hypermedia (HT '06) Odense, Denmark, August 22nd –25th. ACM Press. S. 169-170. http://doi.acm.org/10.1145/1149941.1149974


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item