Clustering semantic relations for constructing and maintaining knowledge organization tools

Ibekwe-SanJuan, Fidelia Clustering semantic relations for constructing and maintaining knowledge organization tools. Journal of Documentation, 2006, vol. 62, n. 2, pp. 229-250. [Journal article (Paginated)]

[thumbnail of JDOC.pdf]
Preview
PDF
JDOC.pdf

Download (531kB) | Preview

English abstract

We propose a comprehensive methodology for thesaurus construction and maintenance combining shallow NLP with a clustering algorithm and an information visualization interface. The resulting system TermWatch, extracts terms from a text collection, mines semantic relations between them using complementary linguistic approaches and clusters terms using these semantic relations. The clusters formed exhibit the different relations necessary to populate a thesaurus or an ontology: synonymy, generic/specific and relatedness. The clusters represent, for a given term, its closest neighbours in terms of semantic relations. The clusters are mapped onto a 2D using an integrated visualization tool. This could change the way in which information professionals (librarians and documentalists) undertake knowledge organization tasks. TermWatch can be useful either as a starting point for grasping the conceptual organization of knowledge in a huge text collection without having to read the texts, then actually serving as a suggestive tool for populating different hierarchies of a thesaurus or an ontology because its clusters are based on semantic relations.

Item type: Journal article (Paginated)
Keywords: Knowledge organization, Thesaurus construction, Shallow NLP, Semantic relations acquisition, Term clustering, Information visualization.
Subjects: I. Information treatment for information services > IC. Index languages, processes and schemes.
I. Information treatment for information services > ID. Knowledge representation.
Depositing user: Fidelia Ibekwe-SanJuan
Date deposited: 26 Feb 2008
Last modified: 02 Oct 2014 12:10
URI: http://hdl.handle.net/10760/11147

References

Aitchison, J., Gilchrist, A., Bawden, D. (2000), Thesaurus construction and use: A practical manual, 4th ed., Aslib, London, 240p.

Condamines, A. (2002), ''Corpus Analysis and Conceptual Relation Patterns'', Terminology, Vol. 8 No. 1, pp.141-162.

Callon, M., Courtial, J-P., Turner, W., Bauin, S. (1983), ''From translation to network : The co-word analysis”, Scientometrics, Vol.5 No. 1.

Church, K.W., Hanks P. (1990), “Word association norms, mutual information and lexicography”, Computational Linguistics, Vol.16 No. 1, pp. 22-29.

Cruse, D.A. (1986). Lexical Semantics, Cambridge: Cambridge University Press.

Daille, B. (1996). “Study and implementation of combined techniques for automatic extraction of terminology”, in P. Resnik and J. Klavans (eds.). The Balancing Act : Combining Symbolic and Statistical Approaches to Language, Cambridge: MIT Press, pp. 49-66.

Daille, B. (2003), “Conceptual structuring through term variations”, Proceedings of the ACL-2003, Workshop on MultiWord Expressions: Analysis, Acquisition and Treatment, Saporro, Japan, pp. 9-16.

Dowdall, J., F. Rinaldi, F. Ibekwe-SanJuan and E. SanJuan. (2003), “Complex structuring of term variants for question answering”, in Bond, F. A. Korhonen, D. MacCarthy and A. Villacicencio (eds.). Proceedings ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment. Sapporo, Japan, pp. 1-8.

Fellbaum, C. (1998), Wordnet. An Electronic Lexical Database. Cambridge, London: The MIT Press.

Ganzmann J.(1990), Criteria for the evaluation of thesaurus software, International Classification, Vol. 17, No 3/4, 148-157.

Grefenstette, G. (1997), SQLET:Short Query Linguistic Expansion Techniques, Palliating One-Word Queries by Providing Intermediate Structure to Text, Proceedings of “Recherche d'Information assistée par ordinateur” (RIAO), pp. 500-9.

Grefenstette, G. (1994), Exploration in Automatic Thesaurus Discovery, Boston, MA: Kluwer Academic Publisher.

Harris, Z. S. (1968), Mathematical Structures of Language, New York: Wiley.

Hearst, M.A. (1992), “Automatic acquisition of hyponyms from large text corpora”, Proceedings of the COLING'92, Nantes, pp. 539-545.

Hindle, D. (1990), “Noun classification from predicate argument structures”, Proceedings of the 28th Annual Meeting of the Association for Computational Lunguistics, Pittsburg, PA.

Ibekwe-SanJuan, F., Condamines, A., Cabré, T. (eds.) (2005), “Application-driven Terminology engineering”, Special issue of Terminology : International journal of theoretical and applied issues in specialized communication, John Benjamins, Vol. 11 No. 1, 200.

Ibekwe-SanJuan, F. and SanJuan, E. (2004), “Mining textual data through term variant clustering: the termwatch system”, Proceedings “Recherche d'Information assistée par ordinateur” (RIAO), Avignon, pp. 487-503.

Ibekwe-SanJuan F., SanJuan E. (2002) From term variants to research topics. Journal of Knowledge Organization (ISKO), Special issue on Human Language Technology, Vol. 29 No 3/4, 181-197.

Ibekwe-SanJuan, F. (1998), “A linguistic and mathematical method for mapping thematic trends from texts”, Proceedings of the 13th European Conference on Artificial Intelligence (ECAI’98), Brighton UK, 23-28 August 1998, pp. 170-174.

Jacquemin, C., and Bourigault, D. (2003), “Term Extraction and Automatic Indexing”, in R. Mitkov, (eds), Handbook of Computational Linguistics, Oxford University Press, pp. 599-615.

Jacquemin, C. (2001), Spotting and discovering terms through Natural Language Processing, MIT Press, 378p.

Pantel P., and Lin, D. (2002), “Discovering word senses from texts”, Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2002, Edmonton, Canada pp. 613-619.

Lyons, J. (1978), Éléments de sémantique. Paris: Larousse Universités.

Morin, E., Jacquemin, C. (2004), “Automatic acquisition and expansion of hypernym links”, Computer and the humanities, Vol. 38, No. 4, pp. 363-396.

Morin, E. (1998), ''Prométhée : un outil d'aide à l'acquisition de relations sémantiques entre termes'', Proceedings Traitement automatique des langues naturelles, Paris, France, pp. 172-181.

Pedersen, T., Patwardhan, S., Michelizzi, J. (2004), WordNet::Similarity : Measuring the Relatedness of Concepts, Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), San Jose, CA, July 25-29, 4p.

Rees-Potter, L.K. (1989), Dynamic thesaural systems: a bibliometric study of terminological and conceptual change in sociology and economics with the application to the design of dynamic thesaural systems, Information Processing & Management, Vol. 25 No. 6, 677-91.

Ruge, G. (1992). Experiments on linguistically-based term associations, Information Processing & Management, Vol. 28 No. 3, pp. 317-32.

Sander, G. (1996), ''Visualisierungstechniken für den Compilerbau”, Dissertation, Pirrot Verlag & Druck.

Sanderson, M, Croft, W.B. (1999), “Deriving concept hierarchies from text”, Proceedings of the 22nd Annual ACM SIGIR Conference on research in Information Retrieval, Berkeley, CA, 15-19 August, pp. 206-213.

Salton, G., McGill, M.J. (1983), Introduction to Modern Information Retrieval, McGraw-Hill, New York, NY.

SanJuan, E., Dowdall, J., Ibekwe-SanJuan, F., Rinaldi, F. (2005), “A symbolic approach to automatic multiword term structuring”, Computer Speech and Language (CSL), Special issue on Multiword Expressions, Elsevier, 20p. [Forthcoming].

Schneider, J.W, Borlund, P. (2004), “Introduction to bibliometrics for construction and maintenance of thesauri”, Journal of Documentation, Vol. 60 No. 5, pp. 524-549.

Smadja, F. (1993), “Retrieving collocations from text : Xtract”, Computational Linguistics 19 (1), 143-177.

Small, H. (1999), “Visualizing science by citation mapping”, Journal of the American society for Information Science, Vol. 50 No. 9, pp. 799-813.

Suárez, M., Cabré M.T. (2002), “Terminological variation in specialized texts: linguistic traces for automatic retrieval”, Proceedings VIII IberoAmerican symposium on Terminology, October 28-31, 10p

White, H.D., Mccain K.W. (1989), “Bibliometrics”, in M.E. Williams (ed.), Annual Review of Information Science and Technology, New York : Elsevier Science Publishers, pp. 119-186.

Woods, W.A. (1997), “Conceptual indexing: a better way to organize knowledge”, Sun Labs Technical Report: TR-97-61, Sun Microsystems Laboratories, Mountain View, CA.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item