Journal of System Simulation ›› 2020, Vol. 32 ›› Issue (6): 1038-1050.doi: 10.16182/j.issn1004731x.joss.18-0722

Previous Articles     Next Articles

Study of Parallelized Graph Clustering Algorithm Based on Spark

Liu Dongjiang1,2, Li Jianhui1   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;
    2. University of Chinese Academy of Sciences, Beijing 100190, China
  • Received:2018-10-29 Revised:2019-03-02 Published:2020-06-25

Abstract: The parallelized graph clustering algorithm is researched. A new parallelized graph clustering algorithm is proposed based on Spark. As the top operation of Spark occupies a lot of memory space, a new algorithm which is used to substitute the top operation is proposed to reduce the memory consumption. By improving bottom up hierarchical clustering algorithm, the speed of the proposed algorithm is improved. A new data filtering method based on the feature of graph data is proposed. By the method, the running time and memory space comsuption is reduced greatly. The reason of the high efficiency of this filtering method is explained. Simulation result indicates that the proposed algorithm is better than other parallelized graph clustering algorithms.

Key words: graph clustering, graph data, Spark, algorithm, parallelize

CLC Number: