Text-mining research in genomics

Gálvez, Carmen and Moya-Anegón, Félix Text-mining research in genomics., 2008 . In IADIS International Conference Applied Computing 2008, Algarve (Portugal), 10-13 April 2008. [Conference paper]


Download (280kB) | Preview

English abstract

Biomedical text-mining have great promise to improve the usefulness of genomic researchers. The goal of text-mining is analyzed large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns of knowledge. The analysis of biomedical texts and available databases, such as Medline and PubMed, can help to interpret a phenomenon, to detect gene relations, or to establish comparisons among similar genes in different specific databases. All these processes are crucial for making sense of the immense quantity of genomic information. In genomics, text-mining research refers basically to the creation of literature networks of related biological entities. Text data represent the genomics knowledge base and can be mined for relationships, literature networks, and new discoveries by literature relational chaining. However, text-mining is an emerging field without a clear definition in the genomics. This work presents some applications of text-mining to genome-based research, such as the genomic term identification in curation processes, the formulation of hypotheses about disease, the visualization of biological relationships, or the life-science domain mapping.

Item type: Conference paper
Keywords: Text-Mining; Information Extraction; Knowledge Discovery in Text
Subjects: L. Information technology and library technology > LL. Automated language processing.
L. Information technology and library technology > LM. Automatic text retrieval.
Depositing user: Carmen Galvez
Date deposited: 25 Jul 2008
Last modified: 02 Oct 2014 12:12
URI: http://hdl.handle.net/10760/12140


Andrade, M. A. and Bork, P., 2000. Automated Extraction of Information in Molecular Biology. FEBS Letters, Vol. 476, pp. 12-17.

Blaschke, C. and Valencia, A., 2002. The Frame-Based Module of the SUISEKI Information Extraction System. IEEE Intelligent Systems, Vol. 17, No. 2, pp. 14-20.

Blasoklonny, M. V. and Pardee, A. B., 2002. Conceptual Biology: Unearthing the Gems. Nature, Vol. 416, p. 373.

Boyack, K., Mane, K. and Börner, K., 2004. Mapping Medline Papers, Genes, and Proteins Related to Melanoma Research. Eight International Conference on Information Visualization, Proceedings (IV'04). London, UK, IEEE Conference on Information Visualization, pp. 965-971.

Feldman, R., Regev, Y., Hurvitz, E. and Finkelstein-Landau, M., 2003. Mining the Biomedical Literature Using Semantic Analysis and Natural Language Processing Techniques. Biosilico: Information Technology in Drug Discovery, Vol. 1, No. 2, pp. 69-80.

Glenisson, P., Antal, P., Mathys, J., Moreau, Y. and De Moor, B., 2003a. Evaluation of the Vector Space Representation for Text-Based Gene Clustering. Pacific Symposium on Biocomputing, Vol. 8, pp. 391-402.

Glenisson, P., Mathys, J. and De Moor, B., 2003b. Meta-Clustering of Gene Expression Data and Literature-Based Information. ACM SIG KDD Explorations, Special Issue on Microarray Data Mining, Vol. 5, No. 2, pp.101-112.

Hearst, M., 1999. Untangling Text Data Mining. Proceedings of ACL'99: the 37th Annual Meeting of the Association For Computational Linguistic ACL. University of Maryland, pp. 3-10.

Hersh, W., 2005. Evaluation of Biomedical Text-Mining Systems: Lessons Learned from Information Retrieval. Briefings in Bioinformatics, Vol. 6, No. 4, pp. 344-356.

Hirschman, L., Park, C., Tsujii, J., Wong, L. and Wu, C. H., 2002. Accomplishments and Challenges in Literature Data Mining for Biology. Bioinformatics, Vol. 18, No. 12, pp. 1553-1561.

Iliopoulos I., Enright, A. J. and Ouzounis, C. A., 2001. Textquest: Document Clustering of MEDLINE Abstracts for Concept Discovery in Molecular Biology. Pacific Symposium on Biocomputing, Vol. 6, pp. 384-395.

Jenssen, T.-K., Laegreid, A., Komorowski, J. and Hovig, E., 2001. A Literature Network of Human Genes for High-Throughput Analysis of Gene Expression. Nature Genetics, Vol. 28, No. 1, pp. 21-28.

Krallinger, M., Erhardt, R. A. A. and Valencia, A., 2005. Text-Mining Approach in Molecular Biology and Biomedicine. Drug Discovery Today, Vol. 10, No. 6, pp. 439-445.

Leroy, G. and Chen, H., 2005. Genescene: An Ontology-Enhanced Integration of Linguistic and Co-Occurrence Based Relations in Biomedical Texts. Journal of the American Society for Information Science and Technology, Vol. 56, No. 5, pp. 457-468.

Lindsay, R. K. and Gordon, M. D., 1999. Literature-Based Discovery by Lexical Statistics. Journal of the American Society for Information Science and Technology, Vol. 50, No. 7, pp. 574-587.

Morgan, A. A., Hirschman, L., Colosimo, M., Yeh, A. S. and Colombe, J. B., 2004. Gene Name Identification and Normalization Using a Model Organism Database. Journal of Biomedical Informatics, Vol. 37, pp. 396-410.

Pearson, H., 2001. Biology's Name Game. Nature, Vol. 411, pp. 631-632.

Shatkay, H., Edwards, S. and Boguski, M., 2002. Information Retrieval Meets Gene Analysis. IEEE Intelligent Systems, Vol. 17, No. 2, pp. 45-53.

Shatkay, H., Edwards, S., Wilbur, W. J. and Boguski, M., 2000. Genes, Themes and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis. Proceedings of the International Conference on Intelligent Systems for Molecular Biology. New York, AAAI, pp. 317-328.

Smalheiser, N. R. and Swanson, D. R., 1998. Using ARROWSMITH: A Computer-Assisted Approach to Formulating and Assessing Scientific Hypotheses. Computer Methods and Programs in Biomedicine, Vol. 57, pp. 149-153.

Srinivasan, P. 2004. Text Mining: Generating Hypotheses From MEDLINE. Journal of the American Society for Information Science and Technology, Vol. 55, pp. 396-413.

Stapley, B. J. and Benoit, G., 2000. Biobibliometrics: Information Retrieval and Visualization from Co-Occurrence of Gene Names in Medline Abstracts. Proceedings of Pacific Symposium on Biocomputing. Hawaii, USA, pp. 529-540.

Stephens, M., Palakal, M., Mukhopadhyay, S., Raje, R. and Mostafa, J., 2001. Detecting Gene Relations from MEDLINE Abstracts. Proceedings of Pacific Symposium on Biocomputing. Hawaii, USA, pp. 483-496.

Swanson, D. R., 1986. Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge. Perspectives in Biology and Medicine, Vol. 30, No. 1, pp. 7-18.

Swanson, D. R., 1988. Migraine and Magnesium: Eleven Neglected Connections. Perspectives in Biology and Medicine, Vol. 31, pp. 526-557.

Swanson, D. R., 1987. Two Medical Literatures that are Logically but not Bibliographically Connected. Journal of the American Society for Information Science, Vol. 38, No. 4, pp. 228-233.

Swanson, D. R., Smalheiser, N. R. and Torvik, V. I., 2006. Ranking Indirect Connections in Literature-Based Discovery: the Role of Medical Subject Heading. Journal of the American Society for Information Science and Technology, Vol. 57, No. 11, pp. 1427-439.

Tanabe, L., 2005. The Genomic Data Mine. In: H. Chen, Fuller, S. S., Friedman, C. and Hersh, W. (Eds.), Medical Informatics: Knowledge Management and Data Mining in Biomedicine. New York, Springer, pp. 547-71.

Tuason, O., Chen, L., Liu, H., Blake, J. and Friedman, C., 2004. Biological Nomenclatures: A Source of Lexical Knowledge and Ambiguity. Proceedings of the Pacific Symposium on Biocomputing. Hawaii, USA, pp. 238-249.

Weeber, M., Klein, H., Lolkje, T. W. and De Jong-van den Berg, L. T. W., 2001. Using Concepts in Literature-Based Discovery: Simulating Swanson's Raynaud-Fish Oil and Migraine-Magnesium Discoveries. Journal of the American Society for Information Science and Technology, Vol. 52, No. 7, pp. 548-557.

Yandell, M. D. and Majoros, W. H., 2002. Genomics and Natural Language Processing. Nature Reviews Genetics, Vol. 3, pp. 601-610.


Downloads per month over past year

Actions (login required)

View Item View Item