A Conception-Based Approach to Automatic Subject Term Assignment for Scientific Journal Articles

Chung, EunKyung and Hastings, Samantha K. A Conception-Based Approach to Automatic Subject Term Assignment for Scientific Journal Articles., 2006 . In 69th Annual Meeting of the American Society for Information Science and Technology (ASIST), Austin (US), 3-8 November 2006. [Conference paper]

[img]
Preview
PDF
Chung_Conception.pdf

Download (111kB) | Preview

English abstract

This study proposes a conception-based approach to automatic subject term assignment when using Text Classification (TC) techniques. From the perspective of conceptual and theoretical views of subject indexing, this study identifies three conception-based approaches, Domain-Oriented, Document-Oriented, and Content-Oriented, in conjunction with eight semantic sources in typical scientific journal articles. Based on the identification of semantic sources and conception-based approaches, the experiment explores the significance of individual semantic sources and conception-based approaches for the effectiveness of subject term assignment. The results of the experiment demonstrate that some semantic sources and conception-based approaches are better performers than the full text-based approach which has been dominant in TC fields. In fact, this study indicates that subject terms are better assigned by TC techniques when the indexing conceptions are considered in conjunction with semantic sources.

Item type: Conference paper
Keywords: computer-generated subject indexing ; scientific literature
Subjects: L. Information technology and library technology > LL. Automated language processing.
I. Information treatment for information services > IC. Index languages, processes and schemes.
I. Information treatment for information services > IA. Cataloging, bibliographic control.
Depositing user: Norm Medeiros
Date deposited: 20 Dec 2006
Last modified: 02 Oct 2014 12:05
URI: http://hdl.handle.net/10760/8651

References

Albrechtsen, H. (1993) Subject analysis and indexing: from automated indexing to domain analysis The Indexer 18(4), 219-224

Blair, D.C. (1990) Language and Representation in Information Retrieval Amsterdam: Elsevier Science Publishers

Brank, J., Grobelnik, M., Milic-Frayling, N., & Mladenic, D. (2002) Interaction of feature selection methods and linear classification models Proceedings of the ICKM-02 Workshop on Text Learning

Calvo, R. A., Lee, J., & Li, X. (2004) Managing content with automatic document classification Journal of Digital Information 52(2)

Chan, L.M. (1981) Cataloging and Classification: An Introduction New York City, NY: McGraw-Hill

Chan, L.M. (1987) Instructional materials used in teaching cataloging and classification Cataloging and Classification 7, 131-144

Chu, C.M. & O’Brien, A. (1993) Subject analysis: the critical first stage in indexing Journal of Information Science 19, 439-454

Cooper, W.S. (1978) Indexing documents by gedanken experimentation Journal of the American Society for Information Science 29, 107-119

Cunningham, S.J., Witten, I. H., & Littin, J. (1999) Applications of machine learning in information retrieval Annual Review of Information Science and Technology 34, 341-384

DDC. (2004) Dewey Decimal Classification and Relative Index Edition 22, edited by Joan S. Mitchell [et.al]. Dublin, OH: OCLC Online Computer Library Center, Inc.

Diaz, I., Ranilla, J., Montanes, E., Fernandez, J., & Combarro, E. (2004) Improving performance of text categorization by combining filtering and Support Vector Machines Journal of the American Society for Information Science and Technology 55(7), 579-592

Efron, M., Marchionini, G., Elsas, J., & Zhang, J. (2004) Machine learning for information architecture in a large governmental website Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries 151-159

Fidel, R. (1994) User-centered indexing Journal of the American Society for Information Science 45(8), 572-576

Foskett, A.C. (1996) The Subject Approach to Information London: Library Association Publishing

Hjørland, B. (1992) The concept of ‘subject’ in information science Journal of Documentation 48(2), 172-200

Hjørland, B. (2001) Towards a theory of aboutness, subject, topicality, theme, domain, field, content… and relevance Journal of the American Society for Information Science and Technology 52(9), 774-778

Hjørland, B. & Nielsen, L. K. (2001) Subject access points in electronical retrieval Annual Review of Information Science and Technology 35, 249-298

Hjørland, B. (2002) Domain analysis in information science: eleven approaches-traditional as well as innovative Journal of Documentation 58(4), 422-462

Hjørland, B. & Albrechtsen, H. (1995) Toward a new horizon in information science: domain-analysis Journal of the American Society for Information Science 46(6), 400-425

Hovi, I. (1988) The cognitive structure of classification work The Proceedings of 44th FID Conference and Congress

ISO (1985) Documentation-Methods for Examining Documents, Determining Their Subjects and Selecting Indexing Terms International Standard Organization INSPEC (2005) Engineering Village 2 Elsevier Engineering Information Inc., Hoboken, NJ

Jeng, L.H. (1996) Using verbal reports to understand cataloging expertise: two cases Library Resources and Technical Services 40(4), 343-358

Joachims, T. (1998) Text categorization with Support Vector Machine: learning with many relevant features Proceedings of the 10th European Conference on Machine Learning 137-142

Larkey, L. S. (1999) A patent search and classification system Proceedings of the fourth ACM conference on Digital libraries 179-187

Lewis, D. D. (1995) Evaluating and optimizing autonomous text classification systems Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 246-254

Lewis, D. D. (2000) Machine learning for text categorization: background and characteristics Proceedings of the Twenty-First National Online Meeting

Mai, J.E. (2000) Deconstructing the indexing process Advances in Librarianship 23, 269-298

Mai, J.E. (2005) Analysis in indexing: document and domain centered approaches Information Processing and Management 41, 599-611

Miksa, F. (1983) The Subject in the Dictionary Catalog from Cutter to the Present Chicago, IL: American Library Association

Moens, M.F. (2002) Automatic Indexing and Abstracting of Document Texts Kluwer Academic Publishers

Porter, M.F. (1980) An algorithm for suffix stripping Program 14, 130-137

Sauperl, A. (2002) Subject determination during the cataloging process Lanham, MD: Scarecrow Press

Sauperl, A. (2004) Catalogers’ common ground and shared knowledge Journal of the American Society for Information Science and Technology 55(1), 55-63

Sauperl, A. & Saye, J.D. (1998) Subject determination during cataloging Proceedings of the 61st American Society of Information Science Annual Meeting

Sebastiani, F. (2002) Machine learning in automated categorization ACM Computing Surveys 34(1), 1-47

Sebastiani, F. (2005) Text categorization In Alessandro Zanasi (ed.) Text Mining and its Applications WIT Press, Southampton, U.K., 109-129

Soergel, D. (1985) Organizing Information: Principles of Database and Retrieval Systems NY: Academic Press

Taylor, A. G. (2003) The Organization of Information Englewood, CO: Libraries Unlimited

Watters, C., Zheng, W., & Milios, E. (2002) Filtering for medical news items The Proceedings of the ASIS&T 284-291

Weinberg, B.H. (1988) Why indexing fails the researcher The Indexer 16(1), 3-6

Wilson, P. (1968) Two Kinds of Power: An Essay on Bibliographic Control Berkeley, CA: University of California Press

Witten, I.H. & Frank, E. (2000) Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations CA: San Diego, Academic Press

Xu, Z., Yu, K., Tresp, V., Xu, X., & Wang, J. (2003) Representative sampling for text classification using Support Vector Machine The Proceedings of 25th European Conference on Information Retrieval Research 393-407

Zhang, B., Goncalves, M. A., Fan, W., Chen, Y., Fox, E.A., Calado, P. & Cristo, M. (2004) Combining structural and citation-based evidence for text classification Proceedings of the 13th ACM Conference on Information and Knowledge Management 162-163


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item