Mining textual data through term variant clustering : the TermWatch system

Ibekwe-SanJuan, Fidelia and SanJuan, Eric Mining textual data through term variant clustering : the TermWatch system., 2004 . In RIAO 2004 Coupling approaches, coupling media and coupling languages for information retrieval, Avignon (France), 26-28 April 2004. [Conference paper]

[thumbnail of riao-04-ibesan.pdf]
Preview
PDF
riao-04-ibesan.pdf

Download (620kB) | Preview

English abstract

We present a system for mapping the structure of research topics in a corpus. TermWatch portrays the "aboutness" of a corpus of scientific and technical publications by bridging the gap between pure statistical approaches and symbolic techniques. In the present paper, an experiment on unsupervised textmining is performed on a corpus of scientific titles and abstracts from 16 prominent IR journals. The preliminary results showed that TermWatch was able to capture low occurring phenomena which the usual clustering methods based on co-occurrence may not highlight. The results also reflect the expressive power of terminological variations as a means to capture the structure of research topics contained in a corpus.

Item type: Conference paper
Keywords: Thematic mapping, term clustering, information visualization, domain maps, knowledge representation
Subjects: I. Information treatment for information services > IB. Content analysis (A and I, class.)
I. Information treatment for information services > ID. Knowledge representation.
B. Information use and sociology of information > BB. Bibliometric methods
Depositing user: Fidelia Ibekwe-SanJuan
Date deposited: 10 Aug 2005
Last modified: 02 Oct 2014 12:01
URI: http://hdl.handle.net/10760/6642

References

Berry A., Kaba B., Nadif M., SanJuan E., Sigayret A. (2004). Classification et désarticulation de graphes de termes in JADT 2004, Leuven, Belgique, 10-12 mars, 2004, 12p.

Baeza-Yates, Ribeiro –Neto B. (1999) Query operations, in Modern Information retrieval. ACM Press, 117-139.

Barthelemy J.P., Leclerc B., Monjardet B. (1984) Ensembles ordonnés et taxonomie mathématique. Annals of Discrete Mathematics, 23 523-548.

Callon M., Courtial J-P., Turner W., Bauin S. (1983). From translation to network : The co-word analysis. Scientometrics, 5(1).

Chen C., Cribbin T., Macredie R., Morar S. (2002). Visualizing and tracking the growth of competing paradigms : Two case studies. Journal of the American society for Information Science, 53(2002), n° 8, 678-689.

Crithchley F., Fichet B. (1994). Partial order of the principal clusters of dissimilarity. In Van Cutsem B. (eds.) Classification and dissimilarity analysis, Lecture Notes in Statistics, n° 93, Springer- Verlag, 1994, 5-65.

Dowdall J., Rinaldi F., Ibekwe-SanJuan F., SanJuan E. (2003). Complex structuring of term variants for Question Answering. Workshop on Multiword expressions : Analysis, Acquisition and Treatment. In 41st Meeting of the Association for Computational Linguistics (ACL, 2003), Sapporo, Japan, 12 July, 2003, 8p.

Hearst M.A. (1992). Automatic acquisition of hyponyms from large text corpora. Proceedings of the COLING'92, Nantes, 539-545.

Hearst M.A. (1999). Untangling Text Data Mining. Proceedings of the 37th Annual meeting of the Association for Computational Linguistics, Maryland, June 20-26, 1999. [Invited paper].

Ibekwe-SanJuan F., Dubois C. (2002). Can Syntactic variations highlight semantic links between domain topics ? 6th International Conference on Terminology and Knowledge engineering (TKE 2002), Nancy, 28-30 August 2002, 57-63.

Ibekwe-SanJuan, F. (1998). A linguistic and mathematical method for mapping thematic trends from texts. Proceedings of the 13th European Conference on Artificial Intelligence (ECAI’98), Brighton UK, 23-28 August 1998, 170-174.

Jacquemin C., Spotting and discovering terms through Natural Language Processing, MIT Press, 2001, 378p.

Morin E, Jacquemin C. (2003). Automatic acquisition and expansion of hypernym links. Computer and the humanities. Kluwer Academic press. 36p.

Polanco X., Grivel L., Royauté J. (1995). How to do things with terms in informetrics : terminological variation and stabilization as science watch indicators. Proceedings of the 5th International Conference of the International Society for Scientometrics and Informetrics, Illinois USA, 7-10 June 1995, 435-444.

Small H. (1973). Cocitation in the scientific literature : A new measure of the relationship between two documents. Journal of the American society for Information Science, 24, 265-269.

Small H. (1999). Visualizing science by citation mapping. Journal of the American society for Information Science, 50(1999), n° 9, 799-813.

Salton, G., Singhal, A., Buckley, C., Mitra M. (1996). Automatic text decomposition using text segments and text themes. Proceedings of Hypertext, 53-65.

Silberztein M. (1993) Dictionnaire électronique et analyse automatique des textes. Le système INTEX. Masson, Paris.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item