Stemming in Spanish: A First approach to its impact on information retrieval

G.-Figuerola, Carlos, Gómez-Díaz, Raquel, Zazo, Ángel F. and Alonso-Berrocal, José-Luis . Stemming in Spanish: A First approach to its impact on information retrieval., 2001 In: UNSPECIFIED, (ed.) Results of the CLEF 2001 Cross-Language System Evaluation Campaign. Working Notes for the CLEF 2001 Workshop. 3 September, Darmstadt, Germany. UNSPECIFIED, pp. 197-202. [Book chapter]

[thumbnail of figuerola2001stemming.pdf]
Preview
PDF
figuerola2001stemming.pdf

Download (32kB) | Preview

English abstract

Most models and techniques employed in Information Retireval at some time or other use frecuency counts of the terms appearing in both documents and queries. Many words that derive from the same stem have a close semantic content. Locating stems common to several words and grouping them by replacing them with the corresponding stem can improve the working of these systems. Stemming procedures differ, however, depending on the different languages. We describe a stemmer for Spanish and the tests carried out by applying it to Information Retrieval.

Item type: Book chapter
Keywords: Information Retrieval; Stemming
Subjects: L. Information technology and library technology > LL. Automated language processing.
I. Information treatment for information services > II. Filtering.
L. Information technology and library technology > LM. Automatic text retrieval.
Depositing user: R. Gómez-Díaz
Date deposited: 07 Dec 2009
Last modified: 02 Oct 2014 12:16
URI: http://hdl.handle.net/10760/13956

References

H. Abu-Salem, M. Al-Omari, and M. W. Evens. Stemming methodologies over individual queries words for an arabian information retrieval system. JASIS, 50(6):524–529, 1999.

F. Ahmad, M. Yussof, and M. T. Sembok. Experiments with a stemming algorithm for malay words. JASIS,47(12):909–918, 1996. ; C. Bell and K. P. Jones. Toward everyday languaje information retrieval system via minicomputer. JASIS,30:334–338, 1979.

J. Carmona, S. Cervell, L. Márquez, M. Martí, L. Padrón, R. Placer, H. Rodríguez, M. Taulé, and J. Turmo.An environment for morphosyntactic processing of unrestricted spanish text. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC’98), Granada, Spain, 1998.

J. Dawson. Suffix removal and word conflation. ALLC bulletin, 2(3):33–46, 1974. ; C. G. Figuerola. La investigación sobre recuparación de la información en español. In V. Gonzalo García, C. y García Yebra, editor, Documentación, Terminología y Traducción, pages 73–82, Madrid, 2000. Síntesis.

C. G. Figuerola, J. L. Alonso Berrocal, and A. F. Zazo Rodríguez. Disseny d’un motor de recuperació d’informació per a ús experimental i educatiú = diseño de un motor de recuperación de información para uso experimental y educativo. BiD. textos universitaris de biblioteconomia i documentació, 4, 2000.

C. G. Figuerola, R. Gómez, and E. López de San Román. Stemming and n-grams in spanish: an evaluation of their impact on information retrieval. Journal of Information Science, 26(6):461–467, 2000.

D. Harman. How effective is suffixing? JASIS, 42(1):7–15, 1991. D. Harman. Ranking algorithms. In Information retrieval: data structures and algorithms, pages 363–392, Upple Saddle River, NJ, 1992. Prentice-Hall.

D. Harman. Relevance Feedback and Others Query Modification Techniques. Prentice-Hall, Upple Saddle River, NJ, 1992. ; D. Harman. The trec conferences. In Proceedings of the HIM’95 (Hypertext-Information Retrieval-Multimedia), pages 9–23, 1995.

D. HULL. Stemming algorithms: a case study for detailed evaluation. JASIS, 47(1), 1996. ; T. Z. Kalamboukis. Suffix stripping with moderm greek. Program, 29(3):313–321, 1995.

W. Kraaij and R. Pohlmann. Porter’s stemming algorithm for dutch. In L. G. M. Noordman and W. A. M. de Vroomen, editors, informatiewetenschap, Tilburg, 1994.

STINFON. ; W. Kraaij and R. Pohlmann. Viewing stemming as recall enhancement. In SIGIR 96, pages 40–48, 1996. ; R. Krovetz. Viewing morphology as an inference process. In SIGIR 93, pages 191–203, 1993.

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22–31, 1968. ; C. D. Paice. Another stemmer. In SIGIR 90, pages 56–61, 1990. ; M. Popovic and P. Willet. The effectiveness of stemming for natural-language access to slovene textual data. JASIS, 43:384–390, 1992.

M. F. Porter. An algorithm for suffix stripping. Program, 14:130–137, 1980.

A. Robertson and P. Willet. Applications of n-grams in textual information systems. Journal of Documentation, 54(1):28–47, 1999.

S. Rodríguez and J. Carretero. A formal approach to spanish morphology: the coes tools. In XII Congreso de la SEPLN, pages 118–126, Sevilla, 1996.

G. Salton. Automatic Text Processing. Adisson-Wesley, Reading, MA, 1989.

G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

O. Santana, J. Pérez, F. Carreras, J. Duque, Z. Hernández, and G. Rodríguez. Flanom: Flexionador y lematizador automático de formas nominales. Lingüística Española Actual, XXI(2):253–297, 1999.

O. Santana, J. Pérez, Z. Hernández, F. Carreras, and G. Rodríguez. Flaver: Flexionador y lematizador automático de formas verbales. Lingüística Española Actual, XIX(2):229–282, 1997.

J. Savoy. Effectiveness of information retrieval systems used in a hypertext environment. Hypermedia, 5: 23–46, 1993.

J. Savoy. A stemming procedure and stopword list for general french corpora. JASIS, 50(10):944–952, 1999.

R. Schinke, A. Robertson, P. Willet, and M. Greengrass. A stemming algorithm for latin text databases. Journal of Documentation, 52(2):172–187, 1996.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item