系统仿真学报 ›› 2018, Vol. 30 ›› Issue (6): 2315-2320.doi: 10.16182/j.issn1004731x.joss.201806039

• 仿真应用工程 • 上一篇    下一篇

在线合成增量式数据流分类算法

刘三民1, 刘余霞2   

  1. 1. 安徽工程大学计算机与信息学院,安徽 芜湖 241000;
    2. 安徽工程大学现代教育技术中心,安徽 芜湖 241000
  • 收稿日期:2017-07-13 修回日期:2017-09-07 出版日期:2018-06-08 发布日期:2018-06-14
  • 作者简介:刘三民(1978-),男,安徽岳西,博士,副教授,研究方向为机器学习。
  • 基金资助:
    国家自然科学基金(61300170,71371012),安徽省自然科学基金(1608085MF147),安徽省教育厅提升计划一般项目(TSKJ2016B05)

Online Synthesis Incremental Data Streams Classification Algorithm

Liu Sanmin1, Liu Yuxia2   

  1. 1. College of Computer and Information, Anhui Polytechnic University, Wuhu 241000, China;
    2. Center of Modern Education Technology, Anhui Polytechnic University, Wuhu 241000, China
  • Received:2017-07-13 Revised:2017-09-07 Online:2018-06-08 Published:2018-06-14

摘要: 在线学习是解决数据流分类挖掘样本不可再现性的有效手段,如何解决在线学习过程中样本量不足问题是提高在线学习质量的关键点。基于分类模型参数估计的均方误差分解理论,结合聚类思想利用类中心和样本线性合成样本,增加样本分布信息,降低参数估计的下界值;在此基础上进行在线合成增量学习,并根据样本系列信息不断修正类中心位置。经理论分析与仿真实验结果表明所提方案是有效的,在噪声环境内相比其他算法更具优势。

关键词: 在线学习, 数据流分类, 聚类, 增量学习

Abstract: Online learning is the effective way to solve the sample's non-recurrence in data streams classification, and how to deal with the problem of sample deficiency is the critical point for improving online learning efficiency. According to the mean square error decomposition theory of the model's parameter estimation and the idea of cluster, the new samples are constructed by linear synthesis with the class center and the sample, which can improve the distribution information of sample and reduce the lower bound of parameter value. The online incremental learning is executed and the class center point is continuously updated. Through theory analysis and simulation experiment, it is suggested that the provided schema is feasible and has superiority over other algorithm.

Key words: online learning, data streams classification, cluster, incremental learning

中图分类号: