Enriching thesauri with hierarchical relationships by pattern matching in dictionaries

Araujo, Lourdes and Pérez-Agüera, José R. Enriching thesauri with hierarchical relationships by pattern matching in dictionaries. FinTAL - 5th International Conference on Natural Language Processing, 2006, pp. 268-279. [Journal article (Paginated)]


Download (123kB) | Preview

English abstract

This paper proposes a pattern matching method applied to dictionaries to identify hierarchical relationships between terms. In this work we focus on this type of relationship because we use it in the automatic generation of thesauri, which are used to improve information retrieval tasks. However the method can also be applied to identify other semantic relationships. We distinguish two kinds of patterns: structural patterns, composed of a sequence of part-of-speech tags, and key patterns, typical of dictionary entries, composed of some key terms, along with some part-of-speech tags. This kind of patterns are automatically extracted for the dictionary entries by means of stochastic techniques. The thesaurus, that has been partially constructed previously, is then extended with the new relationships obtained by applying the patterns to a dictionary. We have based the system evaluation on the results obtained with and without the thesaurus in an information retrieval task proposed by the Cross-Language Evaluation Forum (CLEF). The results of these experiments have revealed a clear improvement on the performance.

Item type: Journal article (Paginated)
Keywords: automatic thesaurus extraction, information retrieval, query expansion,pattern matching, dictionary
Subjects: L. Information technology and library technology > LL. Automated language processing.
L. Information technology and library technology > LM. Automatic text retrieval.
Depositing user: José Ramón Pérez Agüera
Date deposited: 09 Nov 2006
Last modified: 02 Oct 2014 12:05
URI: http://hdl.handle.net/10760/8351


Hiyan Alshawi. Processing dictionary definitions with phrasal pattern hierarchies. Comput. Linguist., 13(3-4):195{202, 1987.

Angel F. Zazo and Carlos G. Figuerola and Jose L. Alonso Berrocal and Emilio Rodríguez. Reformulation of queries using similarity thesauri. Information Processing and Management, 41(5):1163{1173, 2005.

Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval. ACM Press / Addison-Wesley, 1999.

Martin S. Chodorow, Roy J. Byrd, and George E. Heidorn. Extracting semantic hierarchies from a large on-line dictionary. In Proceedings of the 23rd annual meeting on Association for Computational Linguistics, pages 299{304, Morristown, NJ,USA, 1985. Association for Computational Linguistics.

P. Vossen (Ed.). EuroWordNet A Multilingual Database with Lexical Semantic Networks. Kluwer Academic publishers., 1998.

Jesús Giménez and Lluís Márquez. Svmtool: A general pos tagger generator based on support vector machines. In Proceedings of the 4th LREC, 2004.

Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics, pages 539-545, Morristown, NJ, USA, 1992. Association for Computational Linguistics.

J. Jannink and G. Wiederhold. Thesaurus entry extraction from an on-line dictionary.In Fusion '99, pages 110{138, 1999.

Y. Jing and W. Bruce Croft. An association thesaurus for information retrieval. In Proceedings of RIAO-94, 4th International Conference \Recherche d'Information Assistee par Ordinateur", pages 146{160, New York, US, 1994.

K. Sparck Jones and R.M. Needham. Automatic Term Classi¯cation and Retrieval. Information Processing and Management, 4(1):91{100, 1968.

11. Juan Lloréns and Hernán Astudillo. Automatic generation of hierarchical taxonomies from free text using linguistic algorithms. In OOIS Workshops, pages 74{83, 2002.

Judith Markowitz, Thomas Ahlswede, and Martha Evens. Semantically signi¯cantpatterns in dictionary definitions. In Proceedings of the 24th annual meeting on Association for Computational Linguistics, pages 112{119, Morristown, NJ, USA, 1986. Association for Computational Linguistics.

National Information Standards Organization (U.S.). Guidelines for the Construction, Format, and Management of Monolingual Thesauri, volume ANSI/NISO 239.19-1993 of National information standards series. NISO PRESS, 1994.

Yonggang Qiu and Hans-Peter Frei. Concept-based query expansion. In Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval, pages 160{169, Pittsburgh, US, 1993.

Stephen D. Richardson, William B. Dolan, and Lucy Vanderwende. Mindnet: acquiring and structuring semantic information from text. In Proceedings of the 17th international conference on Computational linguistics, pages 1098{1102. Association for Computational Linguistics, 1998.

G. Salton, C. Buckley, and C. T. Yu. An evaluation of term dependence models in information retrieval. In SIGIR '82: Proceedings of the 5th annual ACM conference on Research and development in information retrieval, pages 151{173, New York, NY, USA, 1982. Springer-Verlag New York, Inc.

C.J van. Rijsbergen, D.J. Harper, and M.F. Porter. The selection of good search terms. Information Processing and Management, 17(2):77{91, 1981.

Ellen M. Voorhees. Using wordnet to disambiguate word senses for text retrieval. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 171{180, New York, NY, USA, 1993. ACM Press.


Downloads per month over past year

Actions (login required)

View Item View Item