Sistemas de recuperación de información implementados a partir de CORD-19: herramientas clave en la gestión de la información sobre COVID-19

López-Carreño, Rosana and Martínez-Méndez, Francisco-Javier Sistemas de recuperación de información implementados a partir de CORD-19: herramientas clave en la gestión de la información sobre COVID-19. Revista Española de Documentación Científica, 2020, vol. 43, n. 4, pp. 1-12. [Journal article (Unpaginated)]

[thumbnail of Research article]
Preview
Text (Research article)
1300-Texto del artículo-6706-1-10-20201201 (2).pdf - Published version
Available under License Creative Commons Attribution.

Download (834kB) | Preview

English abstract

Research on the coronavirus has generated an extraordinary production of scientific documents. Their treatment and assimilation by the scientific community has required the help of specifically designed information retrieval systems. Some of the world’s leading institutions involved in the fight against the pandemic have developed the CORD-19 dataset that stands out from other projects of a similar nature. The documents collected in this source have been processed by various information retrieval tools, sometimes prototypes or previously implemented systems. The typology and main characteristics of these systems have been analysed, concluding that there are three main non-exclusive categories among them: terminological search, information visualisation and natural language processing. It should be noted that most of them use semantic search technologies in order to facilitate the acquisition of knowledge by researchers and to help them in their enormous task. The crisis caused by the pandemic has been taken advantage of by semantic search engines to find their site.

Spanish abstract

La investigación sobre el coronavirus ha generado una producción de documentos científicos extraordinaria. Su tratamiento y asimilación por parte de la comunidad científica ha necesitado de la ayuda de sistemas de recuperación de información diseñados específicamente. Algunas de las principales instituciones mundiales dedicadas a la lucha contra la pandemia han desarrollado el conjunto de datos CORD-19 que destaca sobre otros proyectos de similar naturaleza. Los documentos recopilados en esta fuente han sido procesados por distintas herramientas de recuperación de información, a veces prototipos o sistemas que ya estaban implementados. Se ha analizado la tipología y características principales de estos sistemas concluyendo que hay tres grandes categorías no excluyentes entre ellas: búsqueda terminológica, visualización de información y procesamiento de lenguaje natural. Destaca enormemente que la gran mayoría de ellos emplean preferentemente tecnologías de búsqueda semántica con el objeto de facilitar la adquisición de conocimiento s los investigadores y ayudarlas en su ingente tarea. La crisis provocada por la pandemia ha sido aprovechada por los buscadores semánticos para encontrar su sitio.

Item type: Journal article (Unpaginated)
Keywords: Datasets; COVID-19; Information Retrieval Systems; information management; Conjuntos de datos; Sistemas de Recuperación de Información; Gestión de información; CORD-19
Subjects: H. Information sources, supports, channels. > HS. Repositories.
L. Information technology and library technology > LS. Search engines.
Depositing user: Francisco-Javier Martinez-Mendez
Date deposited: 21 Jan 2021 11:46
Last modified: 21 Jan 2021 11:46
URI: http://hdl.handle.net/10760/40968

References

Adams, J., Light, R. (2020). What Role Does Collaboration have in Responding to COVID-19? https://osf.io/preprints/socarxiv/jqwyr/

Bao, Y., Bossion, A., Brambilla, D., Buriak, J. M., Cai, K., Chen, L., Horton, M. K. (2020). Snapshots of Life—Early Career Materials Scientists Managing in the Midst of a Pandemic. Chemistry of Materials, 32 (9), 3673-3677. https://doi.org/10.1021/acs.chemmater.0c01624

Baumann N. (2016). How to use the medical subject headings (MeSH). International. Journal of Clinical Practice, 70(2). pp.171-174. https://doi.org/10.1111/ijcp.12767

Callaghan S. (2020). COVID-19 Is a Data Science Issue. Patterns, 1 (2), 100022. Preprint. https://doi.org/10.1016/j.patter.2020.100022

Colavizza, G., Costas, R., Traag, V. A., Van Eck, N. J., Van Leeuwen, T., Waltman, L. (2020). A scientometric overview of CORD-19. BioRxiv. https://doi.org/10.1101/2020.04.20.046144

Donnelly, K. (2006). SNOMED-CT: The advanced terminology and coding system for eHealth. Studies in health technology and informatics, 121, 279-290.

Dousset, B., Mothe, J. (2020). Getting Insights from a Large Corpus of Scientific Papers on Specialisted Comprehensive Topics--the Case of COVID-19. arXiv preprint.https://arxiv.org/abs/2005.00485

Fernández-Sellers, M.; Acedo J.; Lozano-Tello, A. (2019). Identification of representative terms of datasets. 2019 14th Iberian Conference on Information Systems and Technologies (CISTI), Coimbra, Portugal, pp. 1-6.

Huang, T. H. K., Huang, C. Y., Ding, C. K. C., Hsu, Y. C., Giles, C. L. (2020). CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset. arXiv preprint. https://arxiv.org/abs/2005.02367

Kousha, K., Thelwall, M. (2020). COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls, Reddit posts. Quantitative Science Studies, 1 (3), 1068-1091. https://doi.org/10.1162/qss_a_00066

Nasution, D. K. (2018). Corpus Based-Approach in Enhancing Students’ Academic Writing Skill: Its Efficacy and Students Perspectives. International Journal, 6 (2), 210-217. https://doi.org/10.15640/ijll.v6n2a25

Otegi, A.; Soroa, A.; Agirre, E. y Campos, J.A. (2020). Cómo gestionar la sobrecarga de información científica sobre COVID-19. https://theconversation.com/como-gestionar-la-sobrecarga-de-informacion-cientifica-sobre-covid-19-138651

Pahins, C. A., Omidvar-Tehrani, B., Amer-Yahia, S., Siroux, V., Pepin, J. L., Borel, J. C., Comba, J. L. (2019). COVIZ: a system for visual formation and exploration of patient cohorts. Proceedings of the VLDB Endowment, 12 (12), 1822-1825. https://doi.org/10.14778/3352063.3352075

Roberts, K., Alam, T., Bedrick, S., Demner-Fushman, D., Lo, K., Soboroff, I., Hersh, W. R. (2020). TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19. Journal of the American Medical Informatics Association, 27 (9), 1431-1436. https://doi.org/10.1093/jamia/ocaa091

Salas, J. (2020). Sepultados bajo la mayor avalancha de estudios científicos. El País. https://elpais.com/ciencia/2020-05-04/sepultados-bajo-la-mayor-avalancha-de-estudios-cientificos.html

Su, D., Xu, Y., Yu, T., Siddique, F. B., Barezi, E. J., Fung, P. (2020). CAiRE-COVID: A Question Answering and Multi-Document Summarization System for COVID-19 Research. arXiv preprint. https://arxiv.org/abs/2005.03975

Torres-Salinas, Daniel (2020). Ritmo de crecimiento diario de la producción científica sobre Covid-19. Análisis en bases de datos y repositorios en acceso abierto. El profesional de la información, v. 29, n. 2, e290215

https://doi.org/10.3145/epi.2020.mar.15

Wang, L. L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide, D. (2020a). CORD-19: The Covid-19 Open Research Dataset. arXiv preprint. https://arxiv.org/abs/2004.10706

Wang, X., Liu, W., Chauhan, A., Guan, Y., Han, J. (2020b). Automatic Textual Evidence Mining in COVID-19 Literature. arXiv preprint. https://arxiv.org/abs/2004.12563

Zhang, E., Gupta, N., Nogueira, R., Cho, K., Lin, J. (2020). Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned. arXiv preprint. https://arxiv.org/abs/2004.05125


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item