Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering

Zhang, Chengzhi and Song, Wei and Li, Chenghua and Yu, Wei Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering., 2008 . In International Conference on Natural Language Processing and Knowledge Engineering, Beijing (China), 19-22 November 2008. [Conference paper]

[img]
Preview
PDF
56.pdf

Download (219Kb) | Preview

English abstract

As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance.

Item type: Conference paper
Keywords: Clustering; ontology; latent semantic analysis; semantic similarity measure; genetic algorithm
Subjects: L. Information technology and library technology. > LP. Intelligent agents.
Depositing user: Chengzhi Zhang
Date deposited: 21 Oct 2008
Last modified: 02 Oct 2014 12:13
URI: http://hdl.handle.net/10760/12400

References

"SEEK" links will first look for possible matches inside E-LIS and query Google Scholar if no results are found.

Koontz. W.L.G, Narendra. P.M, and Fukunaga. K, “A Branch and Bound Clustering Algorithm”, IEEE Trans on Computers, pp. 908-915, 1975.

Frigui. H, and Krishnapuram. R, “A Robust Competitive Clustering Algorithm with Application in Computer Vision”, IEEE Trans, Pattern Analysis and Machine Intelligence, vol. 21, no. 1, pp. 450-465, 1999.

Koontz. W.L.G, Narendra. P.M, and Fucunaga. K, “A Graph Theoretic Approach to Nonparametric Cluster Analysis”, IEEE Trans, Comput, C-25, pp. 936-944, 1975.

Selim. S.Z, and Ismail. M.A, “K-means-type Algorithm: Generalized Convergence Theorem and Characterization of Local Optimality”, IEEE Trans on Pattern Anal, Intell. 6, pp. 81-87, 1984.

Maulik. U, and Bandyopadhyay. S, “Genetic Algorithm-based Clustering Technique”, Pattern Recognition, vol. 33, no.9, pp. 1455-1465, 2000.

Bandyopadhyay. S, Pal. S.K, and Aruna. B, “Multi-objective GAs, Quantitative Indices and Pattern Classification”, IEEE Trans on Systems, Man and Cybernetics-B, vol. 34, no. 5, 2004.

Miller. G.A, “WordNet: A lexical Database for English”, Comn. ACM, vol. 38, no. 11, pp. 39-41, 1995.

Hotho. A, Staab. S, and Stumme. G, “Wordnet Improves Text Document Clustering”, Proc of the Semantic Web Workshop of the 26th Annual International ACM SIGIR Conference, 2003.

Li. Y.H, Bandar. Z.A, and Mclean. D, “An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources”, IEEE Trans on Knowledge and Data Engineering, vol. 15, no. 4, 2003.

Shepard. R.N, “Towards a Universal Law of Generalization for Psychological Science”, Science, vol. 237, pp. 1317-1323, 1987.

Resnik. P, “Using Information Content to Evaluate Semantic Similarity in a Taxonomy”, Proc. 14th Int’l Joint Conf. Artificial Intelligence, 1995.

Hotho. A, and Stumme. G, “Conceptual Clustering of Text Clusters”, proc of FGML Workshop, 2002.

Francis. W.N, and Kucera. H, “Brown Corpus Manual-Revised and Amplified”, Dept. of Linguistics, Brown Univ, Providence, R. I, 1997.

Bellegarda. J.R, Butzberger. J.W, and Chow. Y.L, “A Novel Word Clustering Algorithm Based on Latent Semantic Analysis”, Proc. ICASSP, pp. 172-175, 1996.

Yao. X, Liu. Y, and Lin. G.M, “Evolutionary Programming Made Faster”, IEEE Trans on Evolutionary Computation, vol. 3, no. 2, 1999.

Lee. C.Y, and Yao. X, “Evolutionary Programming Using Mutations Based on the Levy Probability Distribution”, IEEE Trans on Evolutionary Computation, vol. 8, no. 1, 2004.

Davies. D.L, and Bouldin. D.W, “A Cluster Separation Measure”, IEEE Trans. Patt. Anal. Mach. Intell. 1, pp. 224-227, 1979.

Bandyopadhyay. S, and Mauilk. U, “Nonparametric Genetic Clustering: Comparison of Validity Indices”, IEEE Transactions on System, Man and Cybernetics-Part C Applications and Reviews, vol. 31, no. 1, 2001.

Song. W, and Park. S.C, Genetic algorithm-based text clustering technique, LNCS 4221, pp. 779-782, 2006.


Actions (login required)

View Item View Item