Identificación de nombres personales por medio de sistemas de codificación fonética

Galvez, Carmen Identificación de nombres personales por medio de sistemas de codificación fonética. Encontros Bibli : Revista Eletrônica de Biblioteconomia e Ciência da Informação, 2006, vol. 2 seme, n. 22, pp. 105-116. [Journal article (Paginated)]

[thumbnail of Galvez-Encontros-1.pdf]
Preview
PDF
Galvez-Encontros-1.pdf

Download (508kB) | Preview

English abstract

The need to identify the variants of personal names is a well-known problem in applications such as information retrieval systems (IRS), digital libraries, databases of patients in a hospital, the electronic systems of air reserves, or the systems of census. The phonetic codification methods constitute one of the procedures for the solution of this problem, permitting to obtain canonical or normalized names. These systems are included inside the general techniques of approximate string matching. In this work a revision of the processes is carried out that utilize the Soundex, Daitch-Mokotoff Soundex, Phonix, Metaphone and NYSIIS systems for the assignment of phonetic keys. The phonetic codification permits reduce to a common form those personal names that are similar in its pronunciation; performance simpler the string matching due to that the common code is stored instead of the complete name. Nevertheless, these systems are dependent of the language utilized, doing necessary the execution of modifications according to the language on the one that apply.

Spanish abstract

La necesidad de identificar las variantes de los nombres personales es un problema muy conocido en diversas aplicaciones, tales como los sistemas de recuperación de información (SRI), las bibliotecas digitales, las bases de datos de pacientes en un hospital, los sistemas de reservas aéreas, o los sistemas de censo. Los métodos de codificación fonética constituyen uno de los procedimientos para la solución de este problema, permitiendo obtener cadenas canónicas o normalizadas. Estos sistemas se engloban dentro de las técnicas generales de equiparación aproximada de cadenas. En este trabajo se realiza una revisión de los procesos que utilizan los sistemas Soundex, Daitch-Mokotoff Soundex, Phonix, Metaphone y NYSIIS para la asignación de claves fonéticas. La codificación fonética permite reducir a una forma común aquellos nombres personales que son similares en cuanto a su pronunciación, haciendo más sencilla la comparación de una cadena con otra, debido a que se almacena el código generado en lugar del nombre completo. Sin embargo, la principal limitación de estos sistemas es que son dependientes del lenguaje utilizado, lo que hace necesario la realización de modificaciones de acuerdo al idioma que se va a emplear.

Item type: Journal article (Paginated)
Keywords: Phonetic codification; Personal name-matching; Name-matching techniques; Codificación fonética; Equiparación de nombres personales; Algoritmos de equiparación de nombre.
Subjects: L. Information technology and library technology > LM. Automatic text retrieval.
Depositing user: Carmen Galvez
Date deposited: 06 Aug 2007
Last modified: 02 Oct 2014 12:08
URI: http://hdl.handle.net/10760/10017

References

ANGELL, R. C., FREUND, G. E., WILLETT, P. Automatic spelling correction using a trigram similarity measure. Information Processing & Management, v. 19, n. 4, p. 255-261, 1983.

BALUJA, S., MITTAL, V., SUKTHANKAR, R. Applying machine learning for high performance name-entity extraction. Computational Intelligence, v. 16, 2000.

BLAIR, C. R. A program for correcting spelling errors. Information and Control, v. 3, p. 60-67, 1960.

BORGMAN, C. L., SIEGFRIED, S. L. Getty's synoname and its cousins: a survey of applications of personal name-matching algorithms. Journal of the American Society for Information Science, v. 43, n. 7, p. 459-476, 1992.

BOUCHARD, G., POUYEZ, C. Name variations and computerized record linkage. Historical Methods, v. 13, n. 2, p. 119-125, 1980.

CHINCHOR, N. Named entity task definition, version 3.5. In: SEVENTH MESSAGE UNDERSTANDIG CONFERENCE. Proceedings… Fairfax, VA: Morgan Kaufmann, 1997

DAITCH-MOKOTOFF SOUNDEX SYSTEM. Disponível em: <http://www.jewishgen.org.>

DAMERAU, F. J. A technique for computer detection and correction of spelling errors. Communications of the ACM, v. 7, n. 4, p. 171-176, 1964.

DAMERAU, F. J., MAY, E. An examination of undetected typing errors. Information Processing & Management, v. 25, n. 6, p. 659-664, 1989.

GADD, T. N. Fisching for werds: Phonetic retrieval of written text in information systems. Program: Automated Library and Information Science, v. 22, n. 3, p. 222-237, 1988.

GADD, T. N. (1990). PHONIX: the algorithm. Program: Automated Library and Information Science, v. 24, n. 4, p. 363-366.

GALVEZ, C., MOYA-ANEGÓN, F. Approximate personal name-matching through finite-state graphs. Journal of the American Society for Information Science (en prensa).

GAIZAUSKAS, R., et. al. University of Sheffield: description of the LaSIE system as used for MUC-6. In: Sixth Message Understanding Conference. Proceedings…Columbia, MD: Morgan Kaufmann, 1995.

HALL, P. A. V., DOWLING, G. R. (1980). Approximate string matching. Computing Surveys, v. 12, n. 4, p. 381-402, 1980.

KNUTH, D. The art of computer programming: sorting and searching. Reading, Massachusetts : Addison-Wesley, 1973

MUC-4. In: FOURTH MESSAGE UNDERSTANDING CONFERENCE. Proceedings…McLean, VA: Morgan Kaufmann, 1992.

MUC-6. In: SIXTH MESSAGE UNDERSTANDING CONFERENCE. Proceedings…Columbia, MD: Morgan Kaufmann, 1995.

MUC-7. In: SEVENTH MESSAGE UNDERSTANDING CONFERENCE. Proceedings…Fairfax, Virginia: Morgan Kaufmann, 1997.

ODELL, M. K., RUSSELL, R. C. U. S. Patent Numbers 1261167 (1918) and 1435663 (1922). Washington, D.C.: U.S. Patent Office, 1918.

PETERSEN, J. L. A note on undetected typing errors. Communications of the ACM, v. 29, n. 7, 1986.

PHILIPS, L. 1990. Handing on the Metaphone. Computer Language, v. 7, n. 12, p. 39-43, 1990.

POLLOCK, J. J., ZAMORA, A. Automatic spelling correction in scientific and scholarly text. Communications of the ACM, v. 27, n. 4, p. 358-368, 1984.

RAVIN, Y., WACHOLDER, N. 1996. Extracting names from natural-language text. IBM Research Report 20338, 1996

RISEMAN, E. M., ELRICH, R. W. Contextual word recognition using binary digrams. IEEE Transactions on Computers, v. 20, n. 4, p. 397-403, 1971.

SALTON, G. Automatic text processing: the transformation, analysis and retrieval of information by computer. Reading, Massachusetts: Addison-Wesley, 1989.

TAFT, R. L. Special Report nº. 1. Albany, New York: Bureau of Systems Development, New York State Identification and Intelligence Systems (NYSIIS), 1970.

THOMPSON, P., DOZIER, C.C. Name recognition and retrieval performance. In: Strzalkowski, T. (Ed.). Natural language information retrieval. Dordrecht: Kluwer Academic Publishers, 1999, p. 25-74.

ULLMANN, J. R. A binary n-gram technique for automatic correction of substitution, deletion, insertion and reversal errors. The Computer Journal, v. 20, n. 2, p. 141-147, 1977

ZAMORA, E., POLLOC, J., ZAMORA, A. The use of trigrams analysis for spelling error detection. Information Processing and Management, v. 17, n. 6, p. 305-316, 1981.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item