E-LIS, Eprints in Library and Information Science Homepage E-LIS, Eprints in Library and Information Science
   home   |   about   |   search   |   browse   |   register   |   registered users area   |   help   |   FAQ   |   JITA   

Mining textual data through term variant clustering : the TermWatch system

Ibekwe-SanJuan, Fidelia and SanJuan, Eric (2004) Mining textual data through term variant clustering : the TermWatch system. In Proceedings RIAO 2004 Coupling approaches, coupling media and coupling languages for information retrieval, pp. 487-503, Avignon (France).

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

We present a system for mapping the structure of research topics in a corpus. TermWatch portrays the "aboutness" of a corpus of scientific and technical publications by bridging the gap between pure statistical approaches and symbolic techniques. In the present paper, an experiment on unsupervised textmining is performed on a corpus of scientific titles and abstracts from 16 prominent IR journals. The preliminary results showed that TermWatch was able to capture low occurring phenomena which the usual clustering methods based on co-occurrence may not highlight. The results also reflect the expressive power of terminological variations as a means to capture the structure of research topics contained in a corpus.

Keywords:Thematic mapping, term clustering, information visualization, domain maps, knowledge representation
Subjects:I. Information treatment for information services > IB. Content analysis (A and I, class.)
I. Information treatment for information services > ID. Knowledge representation.
B. Information use and sociology of information. > BB. Bibliometric methods.
ID Code:4488
Deposited By:Ibekwe-SanJuan, Fidelia
Deposited On:10 August 2005
All fields:Show all fields

Berry A., Kaba B., Nadif M., SanJuan E., Sigayret A. (2004). Classification et désarticulation de graphes de termes in JADT 2004, Leuven, Belgique, 10-12 mars, 2004, 12p.

Baeza-Yates, Ribeiro –Neto B. (1999) Query operations, in Modern Information retrieval. ACM Press, 117-139.

Barthelemy J.P., Leclerc B., Monjardet B. (1984) Ensembles ordonnés et taxonomie mathématique. Annals of Discrete Mathematics, 23 523-548.

Callon M., Courtial J-P., Turner W., Bauin S. (1983). From translation to network : The co-word analysis. Scientometrics, 5(1).

Chen C., Cribbin T., Macredie R., Morar S. (2002). Visualizing and tracking the growth of competing paradigms : Two case studies. Journal of the American society for Information Science, 53(2002), n° 8, 678-689.

Crithchley F., Fichet B. (1994). Partial order of the principal clusters of dissimilarity. In Van Cutsem B. (eds.) Classification and dissimilarity analysis, Lecture Notes in Statistics, n° 93, Springer- Verlag, 1994, 5-65.

Dowdall J., Rinaldi F., Ibekwe-SanJuan F., SanJuan E. (2003). Complex structuring of term variants for Question Answering. Workshop on Multiword expressions : Analysis, Acquisition and Treatment. In 41st Meeting of the Association for Computational Linguistics (ACL, 2003), Sapporo, Japan, 12 July, 2003, 8p.

Hearst M.A. (1992). Automatic acquisition of hyponyms from large text corpora. Proceedings of the COLING'92, Nantes, 539-545.

Hearst M.A. (1999). Untangling Text Data Mining. Proceedings of the 37th Annual meeting of the Association for Computational Linguistics, Maryland, June 20-26, 1999. [Invited paper].

Ibekwe-SanJuan F., Dubois C. (2002). Can Syntactic variations highlight semantic links between domain topics ? 6th International Conference on Terminology and Knowledge engineering (TKE 2002), Nancy, 28-30 August 2002, 57-63.

Ibekwe-SanJuan, F. (1998). A linguistic and mathematical method for mapping thematic trends from texts. Proceedings of the 13th European Conference on Artificial Intelligence (ECAI’98), Brighton UK, 23-28 August 1998, 170-174.

Jacquemin C., Spotting and discovering terms through Natural Language Processing, MIT Press, 2001, 378p.

Morin E, Jacquemin C. (2003). Automatic acquisition and expansion of hypernym links. Computer and the humanities. Kluwer Academic press. 36p.

Polanco X., Grivel L., Royauté J. (1995). How to do things with terms in informetrics : terminological variation and stabilization as science watch indicators. Proceedings of the 5th International Conference of the International Society for Scientometrics and Informetrics, Illinois USA, 7-10 June 1995, 435-444.

Small H. (1973). Cocitation in the scientific literature : A new measure of the relationship between two documents. Journal of the American society for Information Science, 24, 265-269.

Small H. (1999). Visualizing science by citation mapping. Journal of the American society for Information Science, 50(1999), n° 9, 799-813.

Salton, G., Singhal, A., Buckley, C., Mitra M. (1996). Automatic text decomposition using text segments and text themes. Proceedings of Hypertext, 53-65.

Silberztein M. (1993) Dictionnaire électronique et analyse automatique des textes. Le système INTEX. Masson, Paris.

Archive Staff Only: edit this record