Spanish personal name variations in national and international biomedical databases: implications for information retrieval and bibliometric studies

Jimenez-Contreras, Evaristo and Ruiz-Pérez, Rafael and Delgado-Lopez-Cozar, Emilio Spanish personal name variations in national and international biomedical databases: implications for information retrieval and bibliometric studies. Journal Medical Library Association, 2002, vol. 90, n. 4. [Journal article (Unpaginated)]


English abstract

Objectives: The study sought to investigate how Spanish names are handled by national and international databases and to identify mistakes that can undermine the usefulness of these databases for locating and retrieving works by Spanish authors. Methods: The authors sampled 172 articles published by authors from the University of Granada Medical School between 1987 and 1996 and analyzed the variations in how each of their names was indexed in Science Citation Index (SCI), MEDLINE, and I´ndice Me´dico Español (IME). The number and types of variants that appeared for each author’s name were recorded and compared across databases to identify inconsistencies in indexing practices. We analyzed the relationship between variability (number of variants of an author’s name) and productivity (number of items the name was associated with as an author), the consequences for retrieval of information, and the most frequent indexing structures used for Spanish names. Results: The proportion of authors who appeared under more then one name was 48.1% in SCI, 50.7% in MEDLINE, and 69.0% in IME. Productivity correlated directly with variability: more than 50% of the authors listed on five to ten items appeared under more than one name in any given database, and close to 100% of the authors listed on more than ten items appeared under two or more variants. Productivity correlated inversely with retrievability: as the number of variants for a name increased, the number of items retrieved under each variant decreased. For the most highly productive authors, the number of items retrieved under each variant tended toward one. The most frequent indexing methods varied between databases. In MEDLINE and IME, names were indexed correctly as ‘‘first surname second surname, first name initial middle name initial’’ (if present) in 41.7% and 49.5% of the records, respectively. However, in SCI, the most frequent method was ‘‘first surname, first name initial second name initial’’ (48.0% of the records) and first surname and second surname run together, first name initial (18.3%). Conclusions: Retrievability on the basis of author’s name was poor in all three databases. Each database uses accurate indexing methods, but these methods fail to result in consistency or coherence for specific entries. The likely causes of inconsistency are: (1) use by authors of variants of their names during their publication careers, (2) lack of authority control in all three databases, (3) the use of an inappropriate indexing method for Spanish names in SCI, (4) authors’ inconsistent behaviors, and (5) possible editorial interventions by some journals. We offer some suggestions as to how to avert the proliferation of author name variants in the databases.

Keywords: Information retrieval
Subjects: B. Information use and sociology of information > BB. Bibliometric methods
