REINA at WebCLEF 2006: Mixing Fields to Improve Retrieval

G.-Figuerola, Carlos and Zazo, Ángel F. and Alonso-Berrocal, José-Luis and Rodríguez-Vázquez-de-Aldana, Emilio . REINA at WebCLEF 2006: Mixing Fields to Improve Retrieval., 2006 In: WORKING NOTES CLEF 2006 Workshop, 20-22 September, Alicante, Spain. Results of the CLEF 2006 Cross-Language System Evaluation Campaign. UNSPECIFIED. [Book chapter]

[img]
Preview
PDF
figuerola2006reina.pdf

Download (229kB) | Preview

English abstract

This paper describes the participation of the REINA Research Group of the University of Salamanca at WebCLEF 2006. The task in that we have participated this year is the Monolingual Mixed Task in Spanish. To select web pages of the EuroGov collection in Spanish, the wide collection was processed with a language guesser, searching for pages in Spanish. All pages in the .es domain were also pre-selected. Our focus, this year, is to test pre-retrieval ways of mixing elds or elements of information in web pages, as well as to test the retrieval capacity of these elds. Mixing terms from several sources in a only index can be achieved, in retrieval systems based on the vector space model, operating on the term frequency in the document, if we use a tf * idf schemaof weigthing. BODY eld is, by the way, the most powerfull from the point of view of retrieval, but ANCHORS of backlinks add a considerable improvement. META elds, nevertheless, contribute little to the improvement in retrieval.

Item type: Book chapter
Keywords: web pages retrieval, information retrieval, web search, combining elds
Subjects: L. Information technology and library technology > LM. Automatic text retrieval.
I. Information treatment for information services > II. Filtering.
Depositing user: R. Gómez-Díaz
Date deposited: 11 Dec 2009
Last modified: 02 Oct 2014 12:16
URI: http://hdl.handle.net/10760/13966

References

Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, David Grossman, Ophir Frieder, and Nazli Goharian. On fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology (JASIST), 55(10):859{868, 2004.

William B. Cavnar and John M. Trenkle. N-gram-based text categorization. In Third Annual Symposium on Document Analysis and Information Retrieval. April 11-13, 1994, Las Vegas,Nevada, pages 161{175, 1994.

Carlos G. Figuerola, Jose L. Alonso Berrocal, Ángel F. Zazo Rodríguez, and Emilio Rodríguez. REINA at the WebCLEF task: Combining evidences and link analysis. In Peters Carlos G. Figuerola, José Luis A. Alonso Berrocal, Ángel F. Zazo Rodríguez, and Emilio Rodríguez Vázquez de Aldana. Herramientas para la investigación en recuperación de información: Karpanta, un motor de búsqueda experimental. Scire, 10(2):51{62, 2004.

Carlos G. Figuerola, Ángel F. Zazo, Emilio Rodríguez Vázquez de Aldana, and José Luis Alonso Berrocal. La recuperación de información en español y la normalización de términos. Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial , 8(22):135{145, 2004.

Edward A. Fox and Joseph A. Shaw. Combination of multiple searches. In The Second Text REtrieval Conference (TREC-2). NIST Special Publication 500-215, 1993.

W. Kraaij, T. Westerveld, and D. Hiemstra. The importance of prior probabilities for entry page search. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 27{34. ACM Press, 2002.

Gertjan van Noord. Texcat language guesser. http://www.let.rug.nl/ vannoord/TextCat Carol Peters, editor. Results of the CLEF 2005 Cross-Language System Evaluation Campaign. Working notes for the CLEF 2005 Workshop, 21-23 September, Vienna, Austria , 2005.

Vassilis Plachouras, Fidel Cacheda, Iadh Ounis, and Cornelis Joost van Rijsbergen. University of Glasgow at the Web Track: Dynamic application of hyperlink analysis using the query scope. In The Twelfth Text REtrieval Conference (TREC 2003). NIST Special Publication 500-255, 2003.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item