Reducing semantic complexity in distributed Digital Libraries: treatment of term vagueness and document re-ranking

Mayr, Philipp and Mutschke, Peter and Petras, Vivien Reducing semantic complexity in distributed Digital Libraries: treatment of term vagueness and document re-ranking. Library Review, 2008, vol. 57, n. 3. (Unpublished) [Journal article (Unpaginated)]

[img]
Preview
PDF
LR-10-07.pdf

Download (151kB) | Preview

English abstract

Purpose - The general science portal vascoda merges structured, high-quality information collections from more than 40 providers on the basis of search engine technology (FAST) and a concept which treats semantic heterogeneity between different controlled vocabularies. First experiences with the portal show some weaknesses of this approach which come out in most metadata-driven Digital Libraries (DL) or subject specific portals. The purpose of the paper is to propose models to reduce the semantic complexity in heterogeneous DLs. The aim is to introduce value-added services (treatment of term vagueness and document re-ranking) that gain a certain quality in DLs if they are combined with heterogeneity components established in the project “Competence Center Modeling and Treatment of Semantic Heterogeneity”. Design/methodology/approach - First, semantic heterogeneity components translate automatically between different indexing languages. This approach will have an impact on search in a scenario when the searcher uses controlled vocabularies which are cross-linked with cross-concordances. However, users usually formulate query terms freely without any vocabulary support. Empirical observations show that freely formulated user terms and terms from controlled vocabularies are often not the same or match just by coincidence. Therefore, a value-added service will be developed which rephrases the natural language searcher terms into suggestions from the controlled vocabulary, the Search Term Recommender (STR). Second, the result sets of transformed or expanded queries in distributed collections are often very large and tests show that the conventional web-based ranking methods are not appropriate for presenting heterogeneous metadata records as suitable result sets to the user. Therefore, two methods, which are derived from scientometrics and network analysis, will be implemented with the objective to re-rank result sets by the following structural properties: the ranking of the results by core journals (so-called Bradfordizing) and ranking by centrality of authors in co-authorship networks. Findings - The methods, which will be implemented, focus on the query and on the result side of a search and are designed to positively influence each other. Conceptually they will improve the search quality and guarantee that the most relevant documents in result sets will be ranked higher. Originality/value - The central impact of the paper focuses on the integration of three structural value-adding methods which aim at reducing the semantic complexity represented in distributed DLs at several stages in the information retrieval process: query construction, search and ranking, and re-ranking. Paper type - Research paper

Item type: Journal article (Unpaginated)
Keywords: Digital Library, Semantic Heterogeneity, Search Term Recommender, Re-Ranking, Bradfordizing, Co-Author Networks, Network Analysis
Subjects: H. Information sources, supports, channels. > HL. Databases and database Networking.
I. Information treatment for information services > IC. Index languages, processes and schemes.
B. Information use and sociology of information > BB. Bibliometric methods
H. Information sources, supports, channels. > HR. Portals.
L. Information technology and library technology > LS. Search engines.
Depositing user: Philipp Mayr
Date deposited: 16 Dec 2007
Last modified: 02 Oct 2014 12:10
URI: http://hdl.handle.net/10760/10893

References

Bates, M. J. (2002), “Speculations on Browsing, Directed Searching, and Linking in Relation to the Bradford Distribution”, in Bruce, H., Fidel, R., Ingwersen, P. and Vakkari, P. (Eds.), Fourth International Conference on Conceptions of Library and Information Science (CoLIS 4). http://www.gseis.ucla.edu/faculty/bates/articles/Searching_Bradford-m020430.html

Beaver, D. (2004) “Does collaborative research have greater epistemic authority?”, Scientometrics, Vol 60 No 3, pp. 309-408.

Blair, D. C. (1990), Language and representation in information retrieval, Elsevier Science Publishers Amsterdam, New York. 335 p.

Blair, D. C. (2002) “The challenge of commercial document retrieval, Part II: a strategy for document searching based on identifiable document partitions”, Information Processing & Management, Vol 38 No 2, pp. 293-304.

Blair, D. C. (2003) “Information retrieval and the philosophy of language”, Annual Review of Information Science and Technology, Vol 37, pp. 3-50.

Bradford, S. C. (1948), Documentation, Lockwood, London, 156 p.

Gey, F., Chen, H., Norgard, B., Buckland, M., Kim, Y., Chen, A., Lam, B., Purat, Y. and Larson, R. (1999), “Advanced Search Technology for Unfamiliar Metadata”, Third IEEE Metadata Conference, Bethesda, Maryland.

Hellweg, H., Krause, J., Mandl, T., Marx, J., Müller, M. N. O., Mutschke, P. and Strötgen, R. (2001), “Treatment of Semantic Heterogeneity in Information Retrieval“, (IZ Working paper; No 23), IZ Sozialwissenschaften, Bonn, 47 p.

URL: http://www.gesis.org/Publikationen/Berichte/IZ_Arbeitsberichte/pdf/ab_23.pdf

Krause, J. (2006), “Shell Model, Semantic Web and Web Information Retrieval”, in Harms, I., Luckhardt, H.-D. and Giessen, H. W. (Eds.), Information und Sprache: Beiträge zu Informationswissenschaft, Computerlinguistik, Bibliothekswesen und verwandten Fächern, Festschrift für Harald H. Zimmermann, Saur, München, pp. 95-106.

Krause, J. (2007), “The Concepts of Semantic Heterogeneity and Ontology of the Semantic Web as a Background of the German Science Portals vascoda and sowiport”, in Prasad, A. R. D. and Madalli, D. P. (Eds.), International Conference on Semantic Web and Digital Libraries (ICSD 2007), Documentation Research and Training Centre, Indian Statistical Institute , Bangalore, India. pp. 13-24. https://drtc.isibang.ac.in/bitstream/1849/307/1/002_p39_krause_germany_formatted.pdf

Krause, J. (to appear) “Semantic heterogeneity: comparing new Semantic Web approaches with those of digital libraries”, Library Review, Vol 57 No. 3.

Larson, R. R. (1991) “Classification Clustering, Probabilistic Information-Retrieval, and the Online Catalog”, Library Quarterly, Vol 61 No 2, pp. 133-173.

Larson, R. R. (1992) “Experiments in Automatic Library-of-Congress Classification”, Journal of the American Society for Information Science, Vol 43 No 2, pp. 130-148.

Liang, A. C. and Sini, M. (2006) “Mapping AGROVOC and the Chinese Agricultural Thesaurus: Definitions, tools, procedures”, New Review in Hypermedia and Multimedia, Vol 12 No 1, pp. 51-62.

Mayr, P. and Umstätter, W. (2007) “Why is a new Journal of Informetrics needed?”, Cybermetrics, Vol 11 No 1. http://www.cindoc.csic.es/cybermetrics/articles/v11i1p1.html

Mayr, P. and Walter, A.-K. (2007a), “Einsatzmöglichkeiten von Crosskonkordanzen“, in Stempfhuber, M. (Ed.), Lokal - Global: Vernetzung wissenschaftlicher Infrastrukturen: 12, Kongress der IuK-Initiative der Wissenschaftlichen Fachgesellschaft in Deutschland, GESIS - IZ Sozialwissenschaften, Bonn, pp. 149-166. http://www.gesis.org/Information/Forschungsuebersichten/Tagungsberichte/Vernetzung/Mayr-Walter.pdf

Mayr, P. and Walter, A.-K. (2007b), “Zum Stand der Heterogenitätsbehandlung in vascoda: Bestandsaufnahme und Ausblick“, in BID (Ed.), Information und Ethik 3. Leipziger Kongress für Information und Bibliothek, Verlag Dinges & Frick, Leipzig, URL: http://www.opus-bayern.de/bib-info/volltexte/2007/290/

Mutschke, P. (2003), “Mining Networks and Central Entities in Digital Libraries: A Graph Theoretic Approach applied to Co-Author Networks”, IDA 2003 - The 5th International Symposium on Intelligent Data Analysis, Berlin, (12/11/2007) URL: http://fuzzy.cs.uni-magdeburg.de/confs/ida2003/

Plaunt, C. and Norgard, B. A. (1998) “An association-based method for automatic indexing with a controlled vocabulary”, Journal of the American Society for Information Science, Vol 49 No 10, pp. 888-902.

White, H. D. (1981) “'Bradfordizing' search output: how it would help online users”, Online Review, Vol 5 No 1, pp. 47-54.

Vizine-Goetz, D., Hickey, C., Houghton, A. and Thompsen, R. (2004) “Vocabulary Mapping for Terminology Services”, In Journal of Digital Information, Vol 4 No 4, (12/11/2007) URL: http://jodi.tamu.edu/Articles/v04/i04/Vizine-Goetz/.

Zeng, M. L. and Chan, L. M. (2004) “Trends and Issues in Establishing Interoperability Among Knowledge Organization Systems”, Journal of the American Society for Information Science and Technology, Vol 55 No 3, pp. 377-395.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item