Document stream clustering : experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends

Lelu, Alain, Cadot, Martine and Cuxac, Pascal Document stream clustering : experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends., 2006 . In International Workshop on Webometrics, Informetrics and Scientometrics & Seventh COLLNET Meeting, Nancy (France), May 10 - 12, 2006. (Unpublished) [Conference paper]

[thumbnail of Collnet.pdf]
Preview
PDF
Collnet.pdf

Download (230kB) | Preview

English abstract

We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the data-vectors stream, 2) the cognitive challenge: we have implemented a stringent selection process of association rules between clusters at time t-1 and time t for directly generating the main conclusions about the dynamics of a data-stream. We illustrate these points with an application to a two years and 2600 documents scientific information database.

Item type: Conference paper
Keywords: data mining, data-stream clustering
Subjects: B. Information use and sociology of information
Depositing user: Heather G Morrison
Date deposited: 19 Apr 2006
Last modified: 02 Oct 2014 12:03
URI: http://hdl.handle.net/10760/7434

References

M. Gaber, A. Zaslavsky and S. Krishnaswamy, Mining Data Streams: A Review, SIGMOD Record, 34(2), 2005.

A. Lelu, P. Cuxac and J. Johansson, Classification dynamique d’un flux documentaire : une évaluation statique préalable de l’algorithme GERMEN - JADT’06, Besançon, 19-21 avril 2006.

A. Lelu, Clustering dynamique d’un flot de données : un algorithme incrémental et optimal de détection des maxima de densité – 8e Journées EGC 2006 (Extraction et Gestion de Connaissances), Lille, 17-20 janvier 2006.

P. Cuxac, M. Cadot and C. François, Analyse comparative de classifications : apport des règles d'association floues. EGC 2005 (Paris). pages 519-530, 2005.

H. Binztock and P. Gallinari, Un algorithme en ligne pour la détection de nouveauté dans un flux de documents. JADT’2002, A. Morin, P. Sébillot eds., IRISA, Saint Malo, 2002

C.C. Chen, Y.T. Chen, Y.S. Sun and M.C. Chen, Life Cycle Modeling of News Events Using Aging Theory. ECML 2003, pages47-59, 2003

W.L. Buntine, Variational Extensions to EM and Multinomial PCA. ECML 2002, pages 23-34, 2002

R.C. Trémolières, The percolation method for an efficient grouping of data. Pattern Recognition, 11(4), 1979

R.C. Trémolières, Percolation and multimodal data structuring - New Approaches in Classification and Data Analysis, Diday E. et al. (eds.), pages 263-268, Springer Verlag, Berlin, 1994

M. Ester, H.P. Kriegel, J. Sander and X. Xu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96 ). AAAI Press, Menlo Park, CA, pages 226-231, 1996.

J. Moody, Identifying dense clusters in large networks, Social Networks, vol. 23, pages 261-283, 2001.

A. Guénoche, Clustering by vertex density in a Graph. Meeting of the International Federation of the Classification Societies. Chicago, Classification, Clustering and Data Mining, D. Banks et al. (Eds.), Springer, pages 15-23, 2004.

S. Hader and F.A. Hamprecht, Efficient density clustering using basin spanning trees. Between Data Science and Applied Data Analysis, pages 39-48, Springer editor, 2003.

V. Batagelj and M. Zaversnik, An o(m) algorithm for cores decomposition of networks, University of Ljubljana, preprint series Vol. 40, 799, 2002.

L. Ertöz, M. Steinbach, and V. Kumar, Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data.. SIAM International Conference on Data Mining (SDM '03), 2003.

N. Mitton, A. Busson and E. Fleury, Self-organization in large scale ad hoc networks. The Third Annual Mediterranean Ad Hoc Networking Workshop, (MED-HOC-NET 04). Bodrum, Turkey, 2004.

D. Hand, H. Mannila and P. Smyth, Principles of Data Mining, Cambridge, Massachussets, USA: The MIT Press, 2001.

J. Han and M. Kamber, Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers, 2001.

M. Cadot, J.B. Maj and T. Ziadé, Association Rules and Statistics, dans Encyclopedia of Data Warehousing and Mining, Edited By: John Wang, Montclair State University, USA, pages 74-77, 2005

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A.I. Verkamo,A.I, Fast discovery of association rules. In Fayyad, U.M., 1996.

R.Y. Bastide, R. Taouil, N. Pasquier, G. Stumme and L. Lakhal, Pascal : un algorithme d'extraction des motifs fréquents", Technique et science informatiques, 21(1), pages 65-75, 2002.

F. Guillet, Mesure de qualité des connaissances en ECD. Tutorial of EGC 2004, Clermont-Ferrand, France, 2004.

M. Cadot and A. Napoli, RA et codage flou des données. SFC'04. (Bordeaux). Pages 130-133, 2004

M. Cadot, P. Cuxac and C. François, Règles d'association avec une prémisse composée : mesure du gain d'information. EGC 2006 (Lille). pages 599-600, 2006.

G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, eds., Advances in Knowledge Discovery and Data Mining. Menlo Park, California : AAAI Press , MIT Press. pp. 307-328.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item