MARTT: Using Induced Knowledge Base to Automatically Mark up Plant Taxonomic Descriptions with XML

Cui, Hong MARTT: Using Induced Knowledge Base to Automatically Mark up Plant Taxonomic Descriptions with XML., 2005 . In 68th Annual Meeting of the American Society for Information Science and Technology (ASIST), Charlotte (US), 28 October - 2 November 2005. [Conference paper]

[thumbnail of Cui_MARTT.pdf]
Preview
PDF
Cui_MARTT.pdf

Download (540kB) | Preview

English abstract

Despite the sub-language nature of taxonomic descriptions of plants, researchers warned about the large variations among different collections of descriptions in terms of information contents and presentations. These variations impose a serious challenge to the development of automatic tools for the semantic markup of large volumes of freetext descriptions. This paper presents a new approach to automatic markup of multiple collections of taxonomic descriptions with XML. The effectiveness of the approach was demonstrated with markup experiments using three contemporary floras. The markup system, MARTT, was based on supervised machine learning algorithms and enhanced by machine learned association rules representing certain types of domain knowledge and conventions. Experiments showed that our simple and efficient markup algorithm outperformed popular general-purpose algorithms (including SVMs) across different floras. More importantly, the results demonstrated that the domain knowledge learned from one flora was useful for improving the markup performance on a second flora, especially on elements with sparse training examples. The system design and the evaluation of markup algorithms are reported in this paper. The study on the effectiveness of the induced knowledge base will be reported in a later paper. In this paper, common practices of flora authors and the potentials of MARTT system for improving the efficiency and effectiveness of the creation, organization, and utilization of plant descriptions are also discussed.

Item type: Conference paper
Keywords: automated metadata generation ; XML markup ; semantic markup ; taxonomies ; plant descriptions
Subjects: I. Information treatment for information services > IB. Content analysis (A and I, class.)
I. Information treatment for information services > IE. Data and metadata structures.
L. Information technology and library technology > LP. Intelligent agents.
I. Information treatment for information services > IA. Cataloging, bibliographic control.
Depositing user: Norm Medeiros
Date deposited: 08 Feb 2006
Last modified: 02 Oct 2014 12:02
URI: http://hdl.handle.net/10760/6895

References

Abascal, R., & Sánchez, J.A. (1999). X-tract: structure extraction from botanical textual descriptions. In Proceedings of the string processing & Information Retrieval Symposium and International Workshop on Groupware, (pp. 2-7}.

Blum, S.D. (2000). An overview of biodiversity informatics. Retrieved April 1, 2004 from

http://www.calacademy.org/research/informatics/sblum/pub/biodiv_informatics.html.

Cui, H., Heidorn, P.B., & Zhang, H. (2002). An approach to automatic classification for information retrieval. In Proceedings of the Joint Conference of Digital Libraries 2002 (96-97).

Dallwitz, M. J. (1980). A general system for coding taxonomic descriptions. Taxon 29, 41-46.

Han, J. & Kamber, M. (2001). Data mining: concepts and techniques. San Francisco: Morgan Kaufmann.

Lehrberger, J. (1982). Automatic translation and the concept of sublanguage. In R. Kittredge and J. Lehrberger (Eds.), Sublanguage: Studies of Language in Restricted Semantic Domain. Berlin/New York: Walter de Gruyter

Lydon, S., Wood, M.M., Huxley, R., & Sutton, D. (2003). Data patterns in multiple botanical descriptions: Implications for automatic processing of legacy data.Systematics and Biodiversity, 1(2), 151-157.

McCallum, A, K. (1996). Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. Retrieved April 1, 2004 from http://www.cs.cmu.edu/~mccallum/bow

Taylor, A. (1995). Extracting knowledge from biological descriptions. In Proceedings of 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases (pp 114-119).

Thiele, K. (2003). SDD part 0: Introduction and primer to the SDD standard. Retrieved May 12, 2005 from http://160.45.63.11/Projects/TDWG-SDD/Primer/index.htm


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item