The unification of institutional addresses applying parametrized finite-state graphs (P-FSG)

Galvez, Carmen and Moya-Anegón, Félix The unification of institutional addresses applying parametrized finite-state graphs (P-FSG). Scientometrics, 2006, vol. 69, n. 2, pp. 323-345. [Journal article (Paginated)]

[thumbnail of Galvez-Scientometrics-1.pdf]

Download (350kB) | Preview

English abstract

We propose a semi-automatic method based on finite-state techniques for the unification of corporate source data, with potential applications for bibliometric purposes. Bibliographic and citation databases have a well-known problem of inconsistency in the data at micro-level and meso-level, affecting the quality of bibliometric searches and the evaluation of research performance. The unification method applies parametrized finite-state graphs (P-FSG) and involves three stages: (1) breaking of corporate source data in independent units of analysis; (2) creation of binary matrices; and (3) drawing finite-state graphs. This procedure was tested on university departmental addresses, downloaded from the ISI Web of Science. Evaluation was in terms of an adaptation of the measures of precision and recall. The results demonstrate the usefulness of this approach, though it requires some human processing.

Item type: Journal article (Paginated)
Keywords: Finite-state transducers; Unification of corporate source data; Bibliometrics; Institutional addresses
Subjects: L. Information technology and library technology > LL. Automated language processing.
Depositing user: Carmen Galvez
Date deposited: 06 Aug 2007
Last modified: 02 Oct 2014 12:08


ABNEY, S. (1996), Partial parsing via finite-state cascades, Natural Language Engineering, 2 : 337–344.

AIT-MOKHTAR, S., CHANOD, J. (1997), Incremental finite state parsing. In: Proceedings of the fifth conference on applied Natural Language Processing (ANLP-97). ACL, pp. 72–79.

ANDERSON, J., COLLINS, P. M. D., IRVINE, J., ISARD, P. A., MARTIN, B. R., NARIN, B. R., STEVENS, K. (1988), On-line approaches to measuring national scientific output: A cautionary tale, Science and Public Policy, 15 : 153–161.

BAKER, B. S. (1996), Parameterized pattern matching: Algorithms and applications, Journal of Computing and System Sciences, 52 : 28–42.

BAKER, B. S. (1993), A theory of parameterized pattern matching: Algorithms and applications (extended abstract). In: Proceedings of the 25th annual symposium on theory of computing. ACM Press, pp. 71–80.

BOURKE, P., BUTLER, L. (1996), Standards issues in a national bibliometric database: The Australian case, Scientometrics, 35 : 199–207.

BOURKE, P., BUTLER, L. (1998), Institutions and the map of science: Matching university departments and fields of research, Research Policy, 26 : 711–718.

BRAUN, T., BROCKEN, M., GLÄNZEL, W., RINIA, E., SCHUBERT, A. (1995), "Hyphenation" of databases in building scientometric indicators: Physics briefs, SCI based indicators of 13 European countries, 1980–1989, Scientometrics 33 : 131–148.

CARPENTER, M. P., GIBB, F., HARRIS, J., IRVINE, J., NARIN, F. (1988), Bibliometric profiles for British academic institutions: An experiment to develop research output indicators, Scientometrics, 14 : 213–234.

CRONIN, B., SNYDER, H. W. (1997), Comparative citation ranking of authors in monographic and journal literature: A study of sociology, Journal of Documentation, 53 : 263–273.

DE BRUIN, R. E., MOED, H. F. (1993), Delimitation of scientific subfields using cognitive words from corporate addresses in scientific publications, Scientometrics, 26 : 65–80.

DE BRUIN, R. E., MOED, H. F. (1990), The unification of addresses in scientific publications. In: L. Egghe, R. Rousseau (Eds), Informetrics 1989/90. Elsevier Science Publishers, Amsterdam, pp. 65–78.

FRENCH, J. C., POWELL, A. L., SCHULMAN, E. (2000), Using clustering strategies for creating authority files, Journal of the American Society for Information Science and Technology, 51 : 774–786.

GALVEZ, C., MOYA-ANEGÓN, F. (2006), Approximate personal name-matching through finite-state graphs. Journal of the American Society for Information Science and Technology, in press.

GARFIELD, E. (1979), Citation Indexing: Its Theory and Application in Science, Technology, and Humanities, John Wiley, New York.

GARFIELD, E. (1983a), Idiosyncrasies and errors, or the terrible things journals do to us, Current Contents, 2 : 5–11.

GARFIELD, E. (1983b), Quality control at ISI, Current Contents, 19 : 5–12.

GILES, C. L., BOLLACKER, K., LAWRENCE, S. (1998), CiteSeer: An automatic citation indexing System. In: I. Witten, R. Akscyn, F. M. Shipman III (Eds), Digital libraries 98 - The third ACM conference on digital libraries (pp. 89–98). ACM Press, pp. 89–98.

GROSS, M. (1975), Méthodes en Syntaxe, Hermann, Paris.

GROSS, M. (1997), The construction of local grammars. In: E. Roche, Y. Schabes (Eds), Finite-state language processing (pp. 329–352). MIT Press, pp. 329–352.

HALL, P. A. V., DOWLING, G. R. (1980), Approximate string matching, Computing Surveys, 12(4), 381–402.

HERBERTZ, H., MÜLLER-HILL, B. (1995), Quality and efficiency of basic research in molecular biology: A bibliometric analysis of thirteen excellent research institutes, Research Policy, 24 : 959–979.

HOOD, W. W., WILSON, C. S. (2003), Informetric studies using databases: Opportunities and challenges. Scientometrics, 58 : 587–608.

JACQUEMIN, C., TZOUKERMANN, E. (1999), NLP for term variant extraction: Synergy between morphology, lexicon, and syntax. In: T. Strzalkowski (Ed.), Natural language information retrieval. Kluwer Academic Publishers, Dordrecht, pp. 25–74.

LEYDESDORFF, L. (1988), Problems with the 'measurement' of national scientific performance, Science and Public Policy, 15 : 149–152.

MÄHLCK, P., PERSSON, O. (2000), Socio-bibliometric mapping of intra-departmental networks, Scientometrics, 49 : 81–91.

McGRATH, W. (1996), The unit of analysis (object of study) in biblometrics and scientometrics, Scientometrics, 32 : 257–264.

MELIN, G., PERSSON, O. (1996), Studying research collaboration using co-authorships, Scientometrics, 36 : 363–377.

MOHRI, M. (1996), On some applications of finite-state automata theory to natural language processing, Journal of Natural Language Engineering, 2 : 61–80.

MOHR, L. B. (1990), Understanding Significance Testing, Sage Publications, Newbury Park, CA.

MOED, H. F. (2000), Bibliometric indicators reflect publication and management strategies, Scientometrics, 47 : 323–346.

MOED, H. F., VAN RAAN, A. F. J. (1988), Indicators of research performance: Applications in university research policy. In: A. F. J. VAN RAAN (Ed.), Handbook of Quantitative Studies of Science and Technology. Elsevier Science Publishers, Amsterdam, pp. 177–192.

MOED, H. F., VIRIENS, M. (1989), Possible inaccuracies occurring in citation analysis, Journal of Information Science, 15 : 95–117.

MOYA-ANEGÓN, F., VARGAS-QUESADA, B., HERRERO-SOLANA, V., CHINCHILLA-RODRÍGUEZ, Z., CORERA-ÁLVAREZ, E., MUNOZ-FERNANDEZ, F. J. (2004), A new technique for building maps of large scientific domains based on the cocitation of classes and categories, Scientometrics, 61 : 129–145.

MOYA-ANEGÓN, F., VARGAS-QUESADA, B., CHINCHILLA-RODRÍGUEZ, Z., CORERA-ÁLVAREZ, E., HERRERO-SOLANA, V., GERRERO-BOTE, V. (2003), SCImago: A proposal of integrated visual scientific information systems. In: Proceedings of the 9th international conference on scientometrics & informetrics (ISSI-2003).

NOYONS, E. C. M., MOED, H. F., LUWEL, M. (1999), Combining mapping and citation analysis for evaluative bibliometric purposes: A bibliometric study, Journal of the American Society for Information Science, 50 : 115–131.

PAO, M. L. (1989), Importance of quality data for bibliometric research. In: C. NIXON, L. PADGETT (Eds), National Online Meeting. Proceedings. Learned Information, Medford, NJ, pp. 321–327.

PAUMIER, S. (2003), De la reconnaissance de formes linguistiques a l'analyse syntaxique, Ph.D., Université de Marne-la-Vallée.

PITERNICK, A. B. (1982), Standardization of journal titles in databases (letter to the editor), Journal of the American Society for Information Science, 33 : 105.

RICE, R. E., BORGMAN, C. L., BEDNARSKI, D., HART, P. J. (1989), Journal-to-journal citation data: Issues of validity and reliability, Scientometrics, 15 : 257–282.

RINIA, E. J., DE LANGE, C., MOED, H. F. (1993), Measuring national output in physics: Delimitation problems, Scientometrics, 28 : 89–110.

ROCHE, E. (1993), Analyse Syntaxique Transformationelle du Français par Transducteurs et Lexique-Grammaire, PhD thesis, Université Paris, Paris.

ROCHE, E. (1996), Finite-state transducers: Parsing free and frozen sentences. In: A. Kornai (Ed.), Proceedings of the ECAI 96 Workshop extended finite state models of language. ECAI, pp. 52–57.

ROCHE, E., SCHABES, Y. (1995), Deterministic part-of-speech tagging with finite state transducers, Computational Linguistics, 21 : 227–253.

SHER, I. H., GARFIELD, E., ELIAS, A. W. (1966), Control and elimination of errors in ISI services, Journal of Chemical Documentation, 6 : 132–135.

SHRUM, W., MULLINS, N. (1988), Network analysis in the study of science and technology. In: A. F. J. VAN RAAN (Ed.), Handbook of Quantitative Studies of Science and Technology. Elsevier Science Publishers, Amsterdam, pp. 107–133.

SILBERZTEIN, M. (1993), Dictionnaires Électroniques et Analyse Automatique de Textes: Le Systčme INTEX, Masson, Paris.

SILBERZTEIN, M. (2000), INTEX: An FST toolbox, Theoretical Computer Science, 231 : 33–46.

STEFANIAK, B. (1987), Use of bibliographic data bases for scientometric studies, Scientometrics, 12 : 149–161.

The Thomson Corporation (2005), ISI Web of Science. Available from: (visited: 11/07/2005)

VAN DEN BERGHE, H., DE BRUIN, R. E., HOUBEN, J. A., KINT, A., LUWEL, M., SPRUYT, E., MOED, H. F. (1998), Bibliometric indicators of university research performance in Flanders, Journal of the American Society for Information Science, 49 : 59–67.

VAN RAAN, A. F. J. (1993), Advanced bibliometric methods to assess research performance and scientific development: Basis principles and recent practical applications, Research Evaluation, 3 : 151–166.

VAN RAAN, A. F. J. (1999), Avanced bibliometric methods for the evaluation of universities, Scientometrics, 45 : 417–423.

VAN RAAN, A. F. J. (2005), Fatal attraction: conceptual and methodological problems in the ranking of universities by bibliometric methods, Scientometrics, 62 : 133–143.

VAN RAAN, A. F. J. (2003), The use of bibliometric analysis in research performance assessment and monitoring of interdisciplinary scientific developments, Technikfolgenabschätzung -Theorie Und Praxis, 12 : 20–29.

VAN RIJSBERGEN, C. J. (1979), Information Retrieval, Butterworths, London.

WILLIAMS, M. E., LANNOM, L. (1981), Lack of standardization of the journal title data element in databases, Journal of the American Society for Information Science, 32 : 229–233.


Downloads per month over past year

Actions (login required)

View Item View Item