Standardization of Terms Applying Finite-State Transducers (FST)

Gálvez, Carmen . Standardization of Terms Applying Finite-State Transducers (FST)., 2009 In: Handbook of Research on Digital Libraries: Design, Development and Impact. School of Communication and Information, Nanyang Technological University (Singapore) & Idea Group Inc., pp. 102-112. [Book chapter]

[img]
Preview
PDF
Handbook-Galvez.pdf

Download (2MB) | Preview

English abstract

This chapter presents the different standardization methods of terms at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of processes based on Finite-State Transducers (FST). Standardization of terms is the procedure of matching and grouping together variants of the same term that are semantically equivalent. A term variant is a text occurrence that is conceptually related to an original term and can be used to search for information in text database. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. This chapter describes the computational and linguistic base of the finite-state approach, with emphasis on the influence of the formal language theory in the standardization process of uniterms and multiterms. The lemmatization and the use of syntactic pattern-matching, through equivalence relations represented in FST, are emerging methods for the standardization of terms.

Item type: Book chapter
Keywords: Finite-State Transduces; Term Conflation; Automatic Indexing
Subjects: L. Information technology and library technology > LL. Automated language processing.
Depositing user: Carmen Galvez
Date deposited: 09 Feb 2009
Last modified: 02 Oct 2014 12:13
URI: http://hdl.handle.net/10760/12780

References

Abney, S. (1991), Parsing by chunks. In R. Berwick, S. Abney, & C. Tenny (Eds.), Principle-Based Parsing. Dordrecht: Kluwer Academic Publishers.

Arampatzis, A. T., Tsoris, T., Koster, C. H. A. & Van der Weide, P. (1998). Phrase-based information retrieval. Information Processing & Management, 34(6), 693-707.

Brill, E. (1992). A simple rule based part-of-speech tagger. Third Conference on Applied Natural Language Proceedings (pp. 152-155). ACM Press.

Croft, W.B., Turtle, H.R. & Lewis, D.D. (1991). The use of phrases and structured queries in information retrieval. Proceedings, SIGIR 1991.

Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton.

Church, K. (1988). A stochastic parts program and noun phrase parser for unrestricted text. Proceedings of the Second Conference on Applied Natural Language Processing (pp. 136-143). Austin, TX: ACL.

Cutting, D., Kupiec, J., Pedersen, J. & Sibun, P. (1992). A practical part-of-speech tagger. Third Conference on Applied Natural Language Processing (pp. 133-140). ACM Press.

Fagan, J.L. (1989). The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science, 40(2), 115-132.

Frakes, W.B. (1992), Stemming algorithms. In W.B. Frakes & R. Baeza-Yates (Eds.), Information Retrieval: Data Structures and Algorithms (pp.131-161). Englewood Cliffs, NJ: Prentice-Hall.

Galvez, C., Moya-Anegón, F. & Solana, V.H. (2005). Term conflation methods in information retrieval: Non-linguistic and linguistic approaches. Journal of Documentation, 61(4), 520-547.

Harris, Z.S. (1951). Methods in Structural Linguistics. Chicago: University of Chicago Press.

Hull, D.A. (1996). Stemming algorithms: A case study for detailed evaluation. Journal of the American Society for Information Science, 47(1),70-84.

Karttunen, L. (1983). KIMMO: A general morphological processor. Texas Linguistics Forum, 22, 217-228.

Karttunen, L., Kaplan, R.M. & Zaenen, A. (1992). Two-level morphology with composition. Proceedings of the 15th International Conference on Computational Linguistics (COLING'92) (pp. 141-148). ACM Press.

Koskenniemi, K. (1983). Two-level morphology: A general computational model for word-form recognition and production. Helsinki: Department of General Linguistics, University of Helsinki.

Kupiec, J. (1992). Robust part-of-speech tagging using a Hidden Markov Model. Computer Speech and Language, 6, 225-242.

Kupiec, J. (1993). Murax: A robust linguistic approach for question answer using an on-line encyclopedia. In R. Korfhage, E. Rasmussen & P. Willett (Eds.), Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 160-169). ACM Press.

Lovins, J.B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11, 22-31.

Paice, C.D. (1996). A method for evaluation of stemming algorithms based on error counting. Journal of the American Society for Information Science, 47(8), 632-649.

Pirkola, A. (2001). Morphological Typology of Languages for IR. Journal of Documentation, 57(3), 330-348.

Porter, M.F. (1980). An algorithm for suffix stripping. Program, 14, 130-137.

Roche, E. (1996). Finite-state transducers: Parsing free and frozen sentences. Proceedings of the ECAI 96 Workshop Extended Finite State Models of Language (pp. 52-57). Budapest, Hungary: ECAI.

Roche, E. & Schabes, Y. (1995). Deterministic part-of-speech tagging with finite state transducers. Computational Linguistics, 21(2), 227-253.

Roche, E. & Schabes, Y. (1997). Finite state language processing. Cambridge, Massachusetts: MIT Press.

Salton, G. (1989). Automatic text processing: The transformation, analysis and retrieval of information by computer. Reading, MA: Addison-Wesley.

Salton, G. & McGill, M.J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.

Schwarz, C. (1990). Automatic syntactic analysis of free text. Journal of the American Society for Information Science, 41(6), 408-417.

Sheridan, P. & Smeaton, A.F. (1992). The application of morpho-syntactic language processing to effective phrase matching. Information Processing & Management, 28(3), 349-369.

Silberztein, M. (1993). Dictionnaires électroniques et analyse automatique de textes: Le systčme INTEX. Paris: Masson.

Silberztein, M. (2000). INTEX: An FST toolbox. Theoretical Computer Science, 231(1), 33-46.

Smadja, F. (1993). Retrieving collocations from text: XTRACT. Computational Linguistics, 19(1), 143-177.

Sparck Jones, K. & Tait, J.I. (1984). Automatic search term variant generation. Journal of Documentation, 40(1), 50-66.

Strzalkowski, T. (1996). Natural language information retrieval. Information Processing & Management, 31(3), 397-417.

Strzalkowski, T., Lin, F., Wang, J. & Pérez-Carballo, J. (1999), Evaluating natural language processing techniques in information retrieval: A TREC perspective. In T. Strzalkowski (Ed.), Natural language information retrieval (pp. 113-145). Dordrecht: Kluwer Academic Publishers.

Tolle, K.M. & Chen, H. (2000). Comparing noun phrasing techniques for use with medical digital library tools. Journal of the American Society for Information Science, 51(4), 352-370.

Tzoukermann, E., Klavans, J.L. & Jacquemin, C. (1997). Effective use of natural language processing techniques for automatic conflation of multi-word terms: The role of derivational morphology, part of speech tagging, and shallow parsing. Proceedings 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'97) (pp. 148-155). Philadelphia, Pennsylvania.

Voutilainen, A. (1997). A short introduction to NPtool. Available at: http://www.lingsoft.fi/doc/nptool/intro/.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item