基于图数据模型的聚类方法及可信度检测

doi:10.16182/j.issn1004731x.joss.201806013

系统仿真学报 ›› 2018, Vol. 30 ›› Issue (6): 2102-2109.doi: 10.16182/j.issn1004731x.joss.201806013

基于图数据模型的聚类方法及可信度检测

程艳云¹, 边荟凇¹, 边长生²

1. 南京邮电大学自动化学院,江苏南京 210023;
2. 南京信息职业技术学院,江苏南京 210023

收稿日期:2016-07-20 修回日期:2016-09-17 出版日期:2018-06-08 发布日期:2018-06-14
作者简介:程艳云(1974-),女,江苏姜堰,硕士,副教授,研究方向为大数据在移动通信网络中的应用及网络优化;边荟凇(1989-),男,山东德州,硕士生,研究方向为聚类分析与张量异常块检测。
基金资助:
江苏省省级现代服务业(软件产业)发展专项引导资金(SJ214038)

Clustering Method Based on Graph Data Model and Reliability Detection

Cheng Yanyun¹, Bian Huisong¹, Bian Changsheng²

1. Nanjing University Of Posts And Telecommunications, Nanjing 210023, China;
2. Nanjing College of Information Technology, Nanjing 210023, China

Received:2016-07-20 Revised:2016-09-17 Online:2018-06-08 Published:2018-06-14

摘要/Abstract

摘要： 对于特征空间中的数据,传统聚类算法通常直接在特征空间中进行聚类分析,因此高维空间数据无法在二维平面实现直观有效的聚类结果图形可视化,图数据可以明确反映对象之间的相似性关系,根据数据对象之间的距离,通过迭代将特征空间的数据建模成图数据。并对建模得到的图数据模型进行基于模块性的聚类分析,实现对非凸球分布数据集的聚类及对聚类结果实现二维空间的图形可视化。提出了聚类结果关于类间邻近边界的可信度概念,并提出了一种利用PageRank算法实现对聚类结果可信度计算的方法。

关键词: 数据挖掘, 聚类, 图数据建模, 模块性, PageRank算法

Abstract: For the data in feature space, traditional clustering algorithm can take clustering analysis directly. High-dimensional spatial data cannot achieve intuitive and effective graphical visualization of clustering results in 2D plane. Graph data can clearly reflect the similarity relationship between objects. According to the distance of the data objects, the feature space data are modeled as graph data by iteration. Cluster analysis based on modularity is carried out on the modeling graph data. The two-dimensional visualization of non-spherical-shape distribution data cluster and result is achieved. The concept of credibility of the clustering result is proposed, and a method is proposed, which the Page Rank algorithm is used to calculate the reliability of clustering results.

Key words: data mining, clustering, graph data modeling, modularity, Page Rank algorithm

中图分类号:

TP391.9

程艳云, 边荟凇, 边长生. 基于图数据模型的聚类方法及可信度检测[J]. 系统仿真学报, 2018, 30(6): 2102-2109.

Cheng Yanyun, Bian Huisong, Bian Changsheng. Clustering Method Based on Graph Data Model and Reliability Detection[J]. Journal of System Simulation, 2018, 30(6): 2102-2109.

参考文献

[1] Park H S, Jun C H.A simple and fast algorithm for K-medoids clustering[J]. Expert Systems with Applications (S0957-4174), 2009, 36(2): 3336-3341.
[2] Ng R T, Han J.CLARANS: A Method for Clustering Objects for Spatial Data Mining[J]. IEEE Transactions on Knowledge & Data Engineering (S1041-4347), 2002, 14(5): 1003-1016.
[3] Guha S, Rastogi R, Shim K.ROCK: A Robust Clustering Algorithm for Categorical Attributes[C]// International Conference on Data Engineering, 1999. Proceedings. 2002: 345-366.
[4] Karypis G, Han E H, Kumar V.Chameleon: hierarchical clustering using dynamic modeling[J]. Computer (S0018-9162), 1999, 32(8): 68-75.
[5] Ankerst M, Breunig M M, Kriegel H P, et al.OPTICS: ordering points to identify the clustering structure[C]// Proceedings ACM SIGMOD International Conference on Management of Data, 1999, 28(2): 49-60.
[6] Sheikholeslami G, Chatterjee S, Zhang A.WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases[C]// Proceedings of the 24rd International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 1998: 428-439.
[7] Fisher D H.Knowledge acquisition via incremental conceptual clustering[J]. Machine Learning (S0885-6125), 1987, 2(2): 139-172.
[8] Leski J.Towards a robust fuzzy clustering[J]. Fuzzy Sets & Systems (S0165-0114), 2003, 137(2): 215-233.
[9] Carvalho F D A T D. Fuzzy c -means clustering methods for symbolic interval data[J]. Pattern Recognition Letters (S0167-8655), 2007, 28(4): 423-437.
[10] 赵京胜, 孙梦丹, 张丽. 一种有效的K-means初始中心优化算法[J]. 信息技术与信息化, 2016(5): 77-79.
Zhao J S, Sun M D, Zhang L.An effective k-means algorithm with initial center optimization[J], Information Technology and Information. 2016(5): 77-79.
[11] 蔡宇浩, 梁永全, 樊建聪, 等. 加权局部方差优化初始簇中心的K-means算法[J]. 计算机科学与探索, 2016, 10(5): 732-741.
Cai Y H, Liang Y Q, Fan J C, et al.Optimizing Initial Cluster Centroidss by Weighted Local Variance in K-means Algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(5): 732-741.
[12] Newman M E, Girvan M.Finding and evaluating community structure in networks[J]. Physical Review E Statistical Nonlinear & Soft Matter Physics (S1539-3755), 2004, 69(2 Pt 2): 026113.
[13] Clauset A, Newman M E, Moore C.Finding community structure in very large networks[J]. Physical Review E (S2470-0045), 2010, 70(6 Pt 2): 264-277.
[14] Brin S, Page L.The anatomy of a large-scale hypertextual Web search engine[J]. Computer Networks and ISDN Systems (S0169-7552), 1998, 30: 107-117.
[15] 董骐瑞. k-均值聚类算法的改进与实现[D]. 吉林: 吉林大学软件学院, 2015: 43-46.
Dong Q R.Improvements and Implementation of k-means Clustering Algorithm[D]. Jilin: School of Software, Jilin University, 2015: 43-46.

基于图数据模型的聚类方法及可信度检测

Clustering Method Based on Graph Data Model and Reliability Detection

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李元, 耿泽伟. 基于LLE与K均值聚类算法的工业过程故障诊断[J]. 系统仿真学报, 2021, 33(9): 2066-2073.
[2]	黄颖坤, 金炜东, 颜康, 朱劼昊. 基于距离特征的雷达辐射源信号识别方法[J]. 系统仿真学报, 2021, 33(12): 2959-2966.
[3]	刘茂山, 纪志成, 王艳, 王建锋. 核块对角表达子空间聚类及收敛性分析[J]. 系统仿真学报, 2021, 33(11): 2533-2544.
[4]	刘兴, 王艳, 纪志成. 基于随机森林的风电功率短期预测方法[J]. 系统仿真学报, 2021, 33(11): 2606-2614.
[5]	韩驰, 熊伟. 航天侦察装备体系指标关联信息挖掘研究[J]. 系统仿真学报, 2021, 33(10): 2372-2380.
[6]	李牡丹, 王印松. 基于ASW-FCM算法的风电场动态等效建模与仿真[J]. 系统仿真学报, 2020, 32(8): 1606-1616.
[7]	张惠娟, 郭欣琪, 王冬青, 贾金原. 基于DR预测的大规模Web3D场景预加载机制[J]. 系统仿真学报, 2020, 32(7): 1341-1348.
[8]	刘东江, 黎建辉. 基于Spark的并行图聚类算法研究[J]. 系统仿真学报, 2020, 32(6): 1038-1050.
[9]	陈胜, 陈纯毅, 邢琦玮, 杨超智. 基于可视区域间接光照聚类的虚拟点光源采样[J]. 系统仿真学报, 2020, 32(6): 1085-1093.
[10]	吴敬兵, 唐汉卿, 胥军. 水泥窑协同处置生活垃圾的燃烧特性分析优化[J]. 系统仿真学报, 2020, 32(1): 35-43.
[11]	王建敏, 吴云洁. 基于聚类云模型的小样本数据可信度评估[J]. 系统仿真学报, 2019, 31(7): 1263-1271.
[12]	黄子赫, 高尚兵, 潘志庚, 惠浩, 廖麒羽, 赵锋锋. 基于快速密度聚类的载客热点可视化分析方法[J]. 系统仿真学报, 2019, 31(7): 1429-1438.
[13]	王杰, 王艳. 基于量子遗传聚类算法的质量控制方法[J]. 系统仿真学报, 2019, 31(12): 2591-2599.
[14]	林雨谷, 王艳. 离散车间能效数据挖掘及调度优化[J]. 系统仿真学报, 2019, 31(12): 2702-2711.
[15]	张芳, 陈彬, 汤杨华, 董健, 艾川, 邱晓刚. 基于兴趣点聚类的无桩共享单车时空模式分析[J]. 系统仿真学报, 2019, 31(12): 2829-2836.