Journal of System Simulation ›› 2018, Vol. 30 ›› Issue (10): 3835-3842.doi: 10.16182/j.issn1004731x.joss.201810029

Previous Articles     Next Articles

Optimizing Initial Cluster Centroids by SVD in K-means Algorithm for Chinese Text Clustering

Dai Yueming*, Wang Minghui, Zhang Ming, Wang Yan   

  1. Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China
  • Received:2016-09-22 Revised:2017-01-11 Online:2018-10-10 Published:2019-01-04

Abstract: In process of clustering with traditional K-means algorithm, it is difficult to identify the value of the number of clusters K and its clustering results are influenced by initial centers. It has the weakness of sensitivity to noise and instability. Meanwhile, to solve the problems for the high dimensions, sparse spatial and latent semantic structure of the text data, an algorithm for Chinese text clustering was proposed. This new algorithm uses the physical significance of Singular Value Decomposition (SVD) to firstly classify the data rough, and then uses K-means for text clustering. It applies SVD to decompose and keep semantic features, remove noise, make smoothing process of text data, meanwhile, it takes the advantage of physical significance of SVD to have rough set classification, and then regard classification results as initial centers of K-means. Experiment results demonstrate that the F-Measure of cluster quality has been improved compared with other K-means algorithms.

Key words: SVD, text clustering, K-means, initial center point

CLC Number: