G.-Figuerola, Carlos and Zazo, Ángel F. and Alonso-Berrocal, José-Luis and Rodríguez-Vázquez-de-Aldana, Emilio . REINA at WebCLEF 2006: Mixing Fields to Improve Retrieval., 2006 In: WORKING NOTES CLEF 2006 Workshop, 20-22 September, Alicante, Spain. Results of the CLEF 2006 Cross-Language System Evaluation Campaign. UNSPECIFIED. [Book chapter]
Download (223Kb) | Preview
This paper describes the participation of the REINA Research Group of the University of Salamanca at WebCLEF 2006. The task in that we have participated this year is the Monolingual Mixed Task in Spanish. To select web pages of the EuroGov collection in Spanish, the wide collection was processed with a language guesser, searching for pages in Spanish. All pages in the .es domain were also pre-selected. Our focus, this year, is to test pre-retrieval ways of mixing elds or elements of information in web pages, as well as to test the retrieval capacity of these elds. Mixing terms from several sources in a only index can be achieved, in retrieval systems based on the vector space model, operating on the term frequency in the document, if we use a tf * idf schemaof weigthing. BODY eld is, by the way, the most powerfull from the point of view of retrieval, but ANCHORS of backlinks add a considerable improvement. META elds, nevertheless, contribute little to the improvement in retrieval.
|Item type:||Book chapter|
|Keywords:||web pages retrieval, information retrieval, web search, combining elds|
|Subjects:||L. Information technology and library technology. > LM. Automatic text retrieval.
I. Information treatment for information services > II. Filtering.
|Depositing user:||R. Gómez-Díaz|
|Date deposited:||11 Dec 2009|
|Last modified:||02 Oct 2014 12:16|
Actions (login required)