TicTOCron: an Automatic Solution for Propagating Quality Metadata to Scholarly TOC RSS Feed Metadata

Chumbe, Santiago and Macleod, Roderick TicTOCron: an Automatic Solution for Propagating Quality Metadata to Scholarly TOC RSS Feed Metadata., 2009 . In 30th IATUL Annual Conference, Leuven (Belgium), 1-4 June 2009. (Unpublished) [Conference paper]

[thumbnail of SChumbe_paper_iatul09.doc] Microsoft Word
SChumbe_paper_iatul09.doc

Download (183kB)
[thumbnail of SChumbe_paper_iatul09.pdf]
Preview
PDF
SChumbe_paper_iatul09.pdf

Download (120kB) | Preview

English abstract

Institutions and researchers stand to benefit from the facilitation of more widespread syndication of, and easier access to, Table of Content (TOC) RSS (Really Simple Syndication [1]) feeds produced for scholarly journals. However, many journal TOC RSS feeds are at present being produced with erroneous, poor or incomplete metadata. This can hamper the usefulness of scholarly current awareness services, and also cause problems for individual subscribers to those feeds. This is exactly what the ticTOCron software toolkit aims to overcome. The ticTOCron toolkit automatically enhances poor, heterogeneous and incomplete metadata found in TOC RSS feeds by making use of a pre-defined "Best Practice" metadata scheme suitable for scholarly journals. In this work we depict the main issues and "bad practices" found in TOC RSS metadata obtained from more than 435 scholarly publishers. Then, we describe software solutions implemented via ticTOCron. Some references are made to the algorithms for generating semantic relations within, between and from the harvested TOCs and to the mechanisms for propagating "metadata associations" from a previously crawled metadata-rich reference set. However, an effort is made to avoid technical jargon and to replace complex technical descriptions with samples and simple comparisons. The original metadata is converted to a canonical format using the "Best Practices metadata set" for scholarly papers proposed by the ticTOCs Project [2]. We also present the results produced by ticTOCron when it was used for enhancing and normalizing TOC RSS feeds collected from more than 12,000 journals. Finally we propose a sustainable and scalable computational model whereby the automatic solution is complemented and fine-tuned by a cost-effective human cross-validation process.

Item type: Conference paper
Keywords: CRON job, current awareness, journal TOC RSS feeds, metadata, metadata quality enhancement, table of contents, ticTOCs
Subjects: I. Information treatment for information services > IE. Data and metadata structures.
L. Information technology and library technology
Depositing user: Roderick A MacLeod
Date deposited: 02 Apr 2009
Last modified: 02 Oct 2014 12:14
URI: http://hdl.handle.net/10760/12961

References

Aeleen Frisch, A. (2002) Essential System Administration: Help for UNIX System Administrators. Published by O'Reilly, ISBN 0596003439, 97805960034321149, Third Edition. pp. 90-99

Albassuny, B.M. (2008) Automatic metadata generation applications: a survey study. Int. J. Metadata, Semantics and Ontologies, Vol. 3, No. 4, pp.260–282

Berson Alex, Dubov Larry and Dubov Lawrence. (2007) Master Data Management and Customer Data Integration for a Global Enterprise. Book published by McGraw-Hill Professional, ISBN 0072263490, 9780072263497. 406 pages

Craven, T., (2001) Changes in Metatag descriptions over time. First Monday, Vol. 6, No. 10 - 1 Oct. 2001

Chumbe, S., MacLeod, R., Barker, P., Moffat, M. and Rist, R. (2006) Overcoming the obstacles of harvesting and searching digital repositories from federated searching toolkits, and embedding them in VLEs. Proceedings 2nd International Conference on Computer Science and Information Systems, Athens, Greece.: http://eprints.rclis.org/archive/00006394

Farooq, U., Ganoe, Craig H., Carroll, John M., Councill, Isaac G., and Giles, C. Lee (2008). Design and evaluation of awareness mechanisms in CiteSeer. Information Processing and Management, 44, pp. 596–612

Greenberg, J. (2004). Metadata Extraction and Harvesting: A Comparison of Two Automatic Metadata Generation Applications. Journal of Internet Cataloguing, 6(4): pp. 59-82

Greenberg, J., Spurgin, K. and Crystal, A. (2006). Functionalities for automatic metadata generation applications: a survey of metadata experts’ opinions. Int. J. Metadata, Semantics and Ontologies, Vol. 1, No. 1, pp.3–20

Hammersley, B (2005) Developing feeds with RSS and Atom. Book published by O'Reilly, ISBN 0596008813, 9780596008819. 276 pages

Lagoze, C., Krafft, D., Cornwell, T., Dushay, N., Eckstrom, D., Saylor, J. (2006) Metadata aggregation and "automated digital libraries": a retrospective on the NSDL experience. Proceedings of the 6th ACM/IEEE-CS joint Int. Conference on Digital Libraries. pp. 230-239

Liu, J. (2007) Metadata and its Applications in the Digital Library: Approaches and Practices. Publised by Libraries Unlimited, London, pp.143–149

Manola, F. and Miller, E. (2004) RDF Primer. W3C recommendation. http://www.uazuay.edu.ec/bibliotecas/conectividad/pdf/RDF%20Primer.pdf

Margaritopoulos, M., Margaritopoulos, T., Kotini, I. and Manitsaris, A. (2008). Automatic metadata generation by utilising pre-existing metadata of related resources. Int. J. Metadata, Semantics and Ontologies, Vol. 3, No. 4, pp.292–304

Reuven M. Lerner, R., (2004) At the forge: aggregating syndication feeds. Linux Journal, published by Specialized Systems Consultants, Inc., ISSN: 1075-3583. Issue 128 (December 2004), Page 7

Rogers, L. (2008) RSS and scholarly journal tables of contents: the ticTOCs project, and good practice guidelines for publishers. FUMSI Magazine, October 2008 [online] URL: http://web.fumsi.com/go/article/share/3356

Schwartz, C. (2002). Sorting out the web: approaches to subject access. Westport, Connecticut: Ablex publishing. Part of the Contemporary Studies in Information Management, Policies, and Services series by Hernon, P (Ed.)

Tonkin, E., Muller, H. (2008) Keyword and metadata extraction from pre-prints. ELPUB2008. Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 - Proceedings of the 12th International Conference on Electronic Publishing held in Toronto, Canada 25-27 June 2008. pp. 30-44

Van de Sompel, H., and Lagoze, C. (2001) The Open Archives Initiative Protocol for Metadata Harvesting. URL: http://www.openarchives.org/OAI_protocol/openarchivesprotocol.html

Wittenbrink Heinz. (2005) RSS and Atom: Understanding And Implementing Content Feeds & Syndication. Packt Publishing. ISBN 1904811574, 9781904811572. 250 pages


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item