Minería de textos: la nueva generación de análisis de literatura científica en biología molecular y genómica

Gálvez, Carmen Minería de textos: la nueva generación de análisis de literatura científica en biología molecular y genómica. Encontros Bibli : Revista Eletrônica de Biblioteconomia e Ciência da Informação, 2008, vol. 13, n. 25. [Journal article (Unpaginated)]

[img]
Preview
PDF
Galvez-Encontros3.pdf

Download (589kB) | Preview

English abstract

Una vez descifrado la secuencia del genoma humano, el paradigma de investigación ha cambiado dando paso a la descripción de las funciones de los genes y a futuros avances en la lucha contra enfermedades. Este nuevo contexto ha despertado el interés de la Bioinformática, que combina métodos de las Ciencias de la Vida con las Ciencias de la Información haciendo posible el acceso a la gran cantidad de información biológica almacenada en las bases de datos, y de la Genómica, dedicada al estudio de las interacciones de los genes y su influencia en el desarrollo de enfermedades. En este contexto, la minería de textos surge como un instrumento emergente para el análisis de la literatura científica. Una tarea habitual de la minería de textos en Biología Molecular y Genómica es el reconocimiento de entidades biológicas, tales como genes, proteínas y enfermedades. El paso siguiente en el proceso de minería lo constituye la identificación entre entidades biológicas, tales como el tipo de interacción entre gengen, gen-enfermedad, gen-proteína, para interpretar funciones biológicas, o formular hipótesis de investigación. El objetivo de este trabajo es examinar el auge y las limitaciones la nueva generación de herramientas de análisis de la información en lenguaje natural, almacenada en bases de datos bibliográficas, como PubMed o MEDLINE.

Item type: Journal article (Unpaginated)
Keywords: Biomedical Text-Mining; Natural Language Processing; Information Extraction
Subjects: I. Information treatment for information services > IC. Index languages, processes and schemes.
Depositing user: Carmen Galvez
Date deposited: 08 May 2008
Last modified: 02 Oct 2014 12:11
URI: http://hdl.handle.net/10760/11501

References

BLASCHKE, C.; VALENCIA, A. Can bibliographic pointers for known biological data be found automatically? protein interactions as a case study. Comparative and Functional Genomics, v. 2, p. 196-206, 2001.

BLASCHKE, C.; VALENCIA, A. The frame-based module of the SUISEKI information extraction system. IEEE Intelligent Systems, v. 17, n. 2, p. 14-20, 2002.

BLASOKLONNY, M. V.; PARDEE, A. B. Conceptual biology: unearthing the gems. Nature, v. 416, p. 373.

CHANG, J. T.; SCHÜTZE, H.; ALTMAN, R. B. Creating an online dictionary of abbreviations from MEDLINE. Journal of the American Medical Informatics Association, v. 9, n.6, p. 612-20, 2002.

CHIANG, J. H.; YU, H. C.; HSU, H. J. GIS: a biomedical text-mining system for gene information discovery. Bioinformatics, v. 20, n. 1, p. 120-121, 2004.

COLLIER, N.; NOBATA C.; TSUJII, J. Extracting the names of genes and gene products with a Hidden Markov Model. Proceedings COLING 2000, p. 201-207, 2000.

CRIM, J.; MCDONALD, R.; PEREIRA, F. Automatically annotating documents with normalized gene lists. BMC Bioinformatics, v. 6, n. 1, p. 13-19, 2005.

FRIEDMAN, C.; KRA, P.; Yu, H.; KRAUTHAMMER, M.; RZHETSKY, A. GENIS: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics, v. 17, n. 1, p. 74-82, 2001.

FUKUDA, K.; TSUNODA, T.; TAMURA, A.; TAKAGI, T. Toward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing, p. 705-716, 1998.

GALVEZ, C.; MOYA-ANEGÓN, F. Aproximación Bio-Bibliométrica a la detección de relaciones biológicas entre genes. II Conferęncia Ibérica de Sistemas e Tecnologias de Informaçăo - CISTI 2007, p. 469-480, 2007.

GALVEZ, C.; MOYA-ANEGÓN, F. Extracción y normalización de entidades genómicas en textos biomédicos: una propuesta basada en transductores gráficos. I Conferęncia Ibérica de Sistemas e Tecnologias de Informaçăo - CISTI 2006, p. 697-709, 2006b.

GALVEZ, C.; MOYA-ANEGÓN, F. Identificación de nombres de genes en la literatura biomédica. Proceedings of the I International Conference on Multidisciplinary Information Sciences and Technologies - InSciT2006, p. 344-348, 2006a.

GLENISSON, P.; GLÄNZEL, W; PERSSON, O. Combining full-text analysis and bibliometric indicators. a pilot study. Scientometrics, v. 63, n. 1, p. 163-80, 2005.

HATZIVASSILOGLOU, V.; Duboue, P. A.; RZHETSKY, A. Disambiguating proteins, genes, and RNA in text: a machine learning approach. Bioinformatics, v. 17, p. 97-106, 2001.

HEARST, M. Untangling text data mining. Proceedings of ACL'99: the 37th Annual Meeting of the Association For Computational Linguistic ACL, p. 3-10, 1999.

HERSH, W. Evaluation of biomedical text-mining systems: lessons learned from information retrieval. Briefings in Bioinformatics, v. 6, n. 4, p. 344-356, 2005.

HIRSCHMAN, L.; PARK, C.; TSUJII, J.; WONG, L.; WU, C. H. Accomplishments and challenges in literature data mining for biology. Bioinformatics, v.18, n. 12, p. 1553-1561, 2002.

HIRSCHMAN, L.; YEH, A.; BLASCHKE, C.; VALENCIA, A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics, v. 6 (Suppl. 1), 2005.

HUMPHREYS, K.; DEMETRIOU, G.; GAIZAUSKAS, R. Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Proceedings of the Pacific Symposium on Biocomputing (PSB-2000), p. 505-516, 2000.

JENSSEN, T.-K.; LAEGREID, A.; KOMOROWSKI, J.; HOVIG, E. A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, v. 28, n. 1, p. 21-28, 2001.

KIM, J. D. ; T. OHTA; Y. TATEISI ; J. TSUJII. GENIA corpus - semantically annotated corpus for bio-textmining. Bioinformatics, v. 19, p. 180-182, 2003.

KIM, J. D.; OHTA, T.; TSURUOKA, Y.; TATEISI, Y.; COLLIER, N. Introduction to the biol-entity recognition task at JNLPBA. Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004), p. 70-76, 2004.

LEROY, G.; CHEN, H. Genescene: An ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts. Journal of the American Society for Information Science and Technology, v. 56, n. 5, p. 457-468, 2005.

LINDSAY, R. K.; GORDON, M. D. Literature-based discovery by lexical statistics. Journal of the American Society for Information Science and Technology, v. 50, n. 7, p. 574-587, 1999.

LIU, H.; JOHNSON, S. B.; FRIEDMAN, C. Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS. Journal of the American Medical Informatics Association Online, v. 9, p. 621-636, 2002.

LIU, H.; LUSSIER, Y. A.; FRIEDMAN, C. Disambiguating ambiguos biomedical terms in biomedical narrative text: an unsupervised method. Journal of Biomedical Informatics, v. 34, p. 249-261, 2001.

NG, S.; WONG, M. Toward routine automatic pathway discovery from on-line scientific text abstracts. Proceedings of Genome Informatics, p. 104-112, 1999.

NOBATA, C.; COLLIER, N.; TSUJII, J. Automatic term identification and classification in biology texts. Proceedings of the 5th Natural Language Processing Pacific Rim Symposium, p. 369-374, 1999.

PEARSON, H. Biology's name game. Nature, v. 411, p. 631-632, 2001.

PEREZ-IRATXETA, C.; BORK, P.; ANDRADE, M. A. XplorMed: a tool for exploring MEDLINE abstracts. Trends in Biochemical Sciences, v. 26, n. 9, p. 573-575, 2001.

PORTER, M. F. An algorithm for suffix stripping. Program, v. 14, p. 130-137, 1980.

PROUX, D.; RECHENMANN, F.; JULLIARD, L. Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction. Proceedings of Genome Informatics, p. 72-80, 1998.

RAYCHAUDHURI, S.; CHANG, J. T.; SUTPHIN, P. D.; ALTMAN, R. B. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Research, v. 12, p. 203-214, 2002a.

RAYCHAUDHURI, S.; SCHÜTZE, H.; ALTMAN, R. B. Using text analysis to identify functionally coherent gene groups. Genome Research, v. 12, p. 1582-1590, 2002b.

RINDFLESCH, T. C.; TANABE, L.; WEINSTEIN, J. N.; HUNTER, L. EDGAR: extraction of drugs, genes and relations from the biomedical literature. Pacific Symposium on Biocomputing, p. 517-528, 2000.

SAFRAN, M.; SOLOMON, I.; SHMUELI, O.; LAPIDOT, M.; SHEN-ORR, S.; ADATO, A.; BEN-DOR, U.; ESTERMAN, N.; ROSEN, N.; PETER, I.; OLENDER, T.; CHALIFA-CASPI, V.; LANCET, D. GeneCards 2000: towards a complete, object-oriented, human gene compendium. Bioinformatics, v. 18, p. 1542-1543, 2002.

SCHUEMIE, M. J.; WEEBER, M.; SCHIJVENAARS, B. J. A.; VAN MULLIGEN, E. M.; VAN DER EIJK, C. C.; JELIER, R.; MONS, B.; KORS, J. A. Distribution on information in biomedical abstracts and full-text publications. Bionformatics, v. 20, n. 16, p. 2597-2604, 2004.

SRINIVASAN, P. Text mining: generating hypotheses from MEDLINE. Journal of the American Society for Information Science and Technology, v. 55, p. 396-413, 2004.

SRINIVASAN, P.; LIBBUS, B. Mining MEDLINE for implicit links between dietary substances and diseases. Bioinformatics, v. 20 (Suppl. 1), p. 1290-1296, 2004.

STAPLEY, B. J.; BENOIT, G. Biobibliometrics: information retrieval and visualization from co-occurrence of gene names in Medline abstracts. Proceedings of Pacific Symposium on Biocomputing, p. 529-540, 2000.

SWANSON, D. R. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, v. 30, n. 1, p. 7-18, 1986.

SWANSON, D. R. Migraine and magnesium: eleven neglected connections. Perspectives in Biology and Medicine, v. 31, p. 526-557, 1988.

SWANSON, D. R. Two medical literatures that are logically but not bibliographically connected. Journal of the American Society for Information Science, v. 38, n. 4, p. 228-233, 1987.

TANABE, L. The genomic data mine. En: H. CHEN, H.; FULLER, S. S.; FRIEDMAN, C.; HERSH, W. (Eds.). Medical informatics: knowledge management and data mining in biomedicine. New York: Springer, 2005.

TANABE, L.; SCHERF, U.; SMITH, L.; LEE, J.; HUNTER, L.; WEINSTEIN, J. MedMiner: an Internet tex-mining tool for biomedical information, with application to gene expression profiling. BioTechniques, v. 27, n. 6, p. 1210-1217, 1999.

TUASON, O.; CHEN, L.; LIU, H.; BLAKE, J.; FRIEDMAN, C. Biological nomenclatures: a source of lexical knowledge and ambiguity. Proceedings of the Pacific Symposium on Biocomputing, p. 238-249, 2004.

WEEBER, M.; VOS, R.; KLEIN, H.; DE JONG-VAN DEN BERG, L. T. W.; ARONSON, A.; MOLEMA, G. Generating hypotheses by discovering implicit associations in the literature: a case report for new potential therapeutic uses for Thalidomide. Journal of the American Medical Informatics Association, v. 10, n. 3, p. 252-259, 2003.

WREN, J. D.; GARNER, H. R. Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics, v. 20, n. 2, p. 191-98, 2004.

YANDELL, M. D.; MAJOROS, W. H. Genomics and natural language processing. Nature Reviews Genetics, v. 3, p. 601-610, 2002.

YEH, A. S.; HIRSCHMAN, L.; MORGAN, A. A. Evaluation of text data mining for database curation: lessons learned from the KDD challenge cup. Bioinformatics, v. 19 (Suppl. 1), p. 331-339, 2003.

YU, H.; AGICHTEIN, E. Extracting synonymous gene and protein terms from biological literature. BMC Bioinformatics, v. 19, n. 1, p. 340-349, 2003.

YU, H.; HRIPCSAK, G.; FRIEDMAN, C. Mapping abbreviations to full forms in biomedical articles. Journal of the American Medical Informatics Association, v. 9, p. 262-272, 2002.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item