系统仿真学报 ›› 2022, Vol. 34 ›› Issue (10): 2204-2212.doi: 10.16182/j.issn1004731x.joss.21-0484

• 仿真建模理论与方法 • 上一篇    下一篇

一种基于最大信息系数预处理的k-modes聚类方法

李明媚1(), 文成林1,2(), 胡绍林2   

  1. 1.杭州电子科技大学,浙江 杭州 310018
    2.广东石油化工学院,广东 茂名 525000
  • 收稿日期:2021-05-26 修回日期:2021-08-06 出版日期:2022-10-30 发布日期:2022-10-18
  • 通讯作者: 文成林 E-mail:851628184@qq.com;wencl@hdu.edu.cn
  • 作者简介:李明媚(1997-),女,硕士生,研究方向为数据挖掘。E-mail:851628184@qq.com
  • 基金资助:
    国家自然科学基金(61933013)

A K-modes Clustering Method Based on Maximal Information Coefficient Data Preprocessing

Mingmei Li1(), Chenglin Wen1,2(), Shaolin Hu2   

  1. 1.Hangzhou Dianzi University, Hangzhou 310018, China
    2.Guangdong Institute of Petrochemical Technology, Maoming 525000, China
  • Received:2021-05-26 Revised:2021-08-06 Online:2022-10-30 Published:2022-10-18
  • Contact: Chenglin Wen E-mail:851628184@qq.com;wencl@hdu.edu.cn

摘要:

为解决现有k-modes聚类方法因忽略了变量属性之间的弱相关性,常造成其在实际应用中聚类性能不佳的问题,提出一种包含属性弱相关性的新k-modes聚类方法。引入最大信息系数(maximum information coefficient, MIC)度量数据集中变量属性之间的相关性;将得到的MIC值与原有距离进行融合,建立包含属性弱相关性信息的新度量方法,以增强变量属性间相关信息的完备性,建立更加精细的k-modes聚类方法;调用3种不同的数据集,将新方法与原有的k-modes聚类方法和其他改进k-modes聚类方法的性能进行对比,并通过仿真结果表明了新方法的有效性。

关键词: 聚类方法, k-modes, 最大信息系数, 距离度量, 变量属性

Abstract:

The existing k-modes clustering method ignores the weak correlation of variable attributes, which often results in poor clustering performance in practical applications. A new k-modes clustering method that includes the weak correlation of attributes is proposed. Maximum information coefficient (MIC) is introduced to measure the correlation of variable attributes in the data set. The obtained MIC value is merged with the original distance to establish a new measurement method containing weak attribute correlation information to enhance the completeness of related information of variable attributes, and a more refined k-modes clustering method is established. Three different data sets are used to compare the performance of the new method with the existing k-modes clustering and other improved k-modes clustering methods, the simulation results shows the effectness of the new method.

Key words: clustering algorithm, k-modes, maximum information coefficient(MIC), distance metric, variable attribute

中图分类号: