Automatic indexing of scientific articles on Library and Information Science with SISA, KEA and MAUI

Gil-Leiva, Isidoro, Díaz-Ortuño, Pedro Daniel and Corrêa, Renato Fernandes Automatic indexing of scientific articles on Library and Information Science with SISA, KEA and MAUI. Revista Espanola de Documentacion Cientifica, 2022, vol. 45, n. 4, pp. 1-18. [Journal article (Paginated)]

[thumbnail of SISA_KEA_MAUI_Gil_Leiva_Fernandes_Diaz_2022.pdf]

Preview

Text
SISA_KEA_MAUI_Gil_Leiva_Fernandes_Diaz_2022.pdf
Download (619kB) | Preview

Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Alternative locations: https://redc.revistas.csic.es/index.php/redc/article/view/1371

English abstract

This article evaluates the SISA (Automatic Indexing System), KEA (Keyphrase Extraction Algorithm) and MAUI (Multi-Purpose Automatic Topic Indexing) automatic indexing systems to find out how they perform in relation to human indexing. SISA algorithm is based on rules about the position of terms in the different structural components of the document, while the algorithms for KEA and MAUI are based on machine learning and the statistical features of terms. For evaluation purposes, a document collection of 230 scientific articles from the Revista EspaÃ±ola de DocumentaciÃ³n CientÃfica published by the Consejo Superior de Investigaciones CientÃficas (CSIC) was used, of which 30 were used for training tasks and were not part of the evaluation test set. The articles were written in Spanish and indexed by human indexers using a controlled vocabulary in the InDICES database, also belonging to the CSIC. The human indexing of these documents constitutes the baseline or golden indexing, against which to evaluate the output of the automatic indexing systems by comparing terms sets using the evaluation metrics of precision, recall, F-measure and consistency. The results show that the SISA system performs best, followed by KEA and MAUI.

Spanish abstract

Este artículo evalúa los sistemas de indización automática SISA (Automatic Indexing System), KEA (Keyphrase Extraction Algorithm) y MAUI (Multi-Purpose Automatic Topic Indexing) para averiguar cómo funcionan en relación con la indización realzada por especialistas. El algoritmo de SISA se basa en reglas sobre la posición de los términos en los diferentes componentes estructurales del documento, mientras que los algoritmos de KEA y MAUI se basan en el aprendizaje automático y las frecuencia estadística de los términos. Para la evaluación se utilizó una colección documental de 230 artículos científicos de la Revista Española de Documentación Científica, publicada por el Consejo Superior de Investigaciones Científicas (CSIC), de los cuales 30 se utilizaron para tareas formativas y no formaban parte del conjunto de pruebas de evaluación. Los artículos fueron escritos en español e indizados por indizadores humanos utilizando un vocabulario controlado en la base de datos InDICES, también perteneciente al CSIC. La indización humana de estos documentos constituye la referencia contra la cual se evalúa el resultado de los sistemas de indización automáticos, comparando conjuntos de términos usando métricas de evaluación de precisión, recuperación, medida F y consistencia. Los resultados muestran que el sistema SISA funciona mejor, seguido de KEA y MAUI.

Item type:	Journal article (Paginated)
Additional information:	cited By 0
Keywords:	automatic indexing; automatic indexing systems; SISA; KEA; MAUI; indexing assessment; indización automática; sistemas de indización automática; evaluación de indización
Subjects:	I. Information treatment for information services > IB. Content analysis (A and I, class.) I. Information treatment for information services > IC. Index languages, processes and schemes.
Depositing user:	Isidoro Gil Leiva
Date deposited:	22 Mar 2023 15:38
Last modified:	22 Mar 2023 15:38
URI:	http://hdl.handle.net/10760/44190

Check full metadata for this record

References

Downloads

Downloads per month over past year

Actions (login required)

View Item

Facebook

Twitter

RSS