在线合成增量式数据流分类算法

doi:10.16182/j.issn1004731x.joss.201806039

系统仿真学报 ›› 2018, Vol. 30 ›› Issue (6): 2315-2320.doi: 10.16182/j.issn1004731x.joss.201806039

在线合成增量式数据流分类算法

刘三民¹, 刘余霞²

1. 安徽工程大学计算机与信息学院,安徽芜湖 241000;
2. 安徽工程大学现代教育技术中心,安徽芜湖 241000

收稿日期:2017-07-13 修回日期:2017-09-07 出版日期:2018-06-08 发布日期:2018-06-14
作者简介:刘三民(1978-),男,安徽岳西,博士,副教授,研究方向为机器学习。
基金资助:
国家自然科学基金(61300170,71371012),安徽省自然科学基金(1608085MF147),安徽省教育厅提升计划一般项目(TSKJ2016B05)

Online Synthesis Incremental Data Streams Classification Algorithm

Liu Sanmin¹, Liu Yuxia²

1. College of Computer and Information, Anhui Polytechnic University, Wuhu 241000, China;
2. Center of Modern Education Technology, Anhui Polytechnic University, Wuhu 241000, China

Received:2017-07-13 Revised:2017-09-07 Online:2018-06-08 Published:2018-06-14

摘要/Abstract

摘要： 在线学习是解决数据流分类挖掘样本不可再现性的有效手段,如何解决在线学习过程中样本量不足问题是提高在线学习质量的关键点。基于分类模型参数估计的均方误差分解理论,结合聚类思想利用类中心和样本线性合成样本,增加样本分布信息,降低参数估计的下界值;在此基础上进行在线合成增量学习,并根据样本系列信息不断修正类中心位置。经理论分析与仿真实验结果表明所提方案是有效的,在噪声环境内相比其他算法更具优势。

关键词: 在线学习, 数据流分类, 聚类, 增量学习

Abstract: Online learning is the effective way to solve the sample's non-recurrence in data streams classification, and how to deal with the problem of sample deficiency is the critical point for improving online learning efficiency. According to the mean square error decomposition theory of the model's parameter estimation and the idea of cluster, the new samples are constructed by linear synthesis with the class center and the sample, which can improve the distribution information of sample and reduce the lower bound of parameter value. The online incremental learning is executed and the class center point is continuously updated. Through theory analysis and simulation experiment, it is suggested that the provided schema is feasible and has superiority over other algorithm.

Key words: online learning, data streams classification, cluster, incremental learning

中图分类号:

TP393

刘三民, 刘余霞. 在线合成增量式数据流分类算法[J]. 系统仿真学报, 2018, 30(6): 2315-2320.

Liu Sanmin, Liu Yuxia. Online Synthesis Incremental Data Streams Classification Algorithm[J]. Journal of System Simulation, 2018, 30(6): 2315-2320.

参考文献

[1] 孙大为, 张广艳, 郑纬民. 大数据流式计算: 关键技术及系统实例[J]. 软件学报, 2014, 25(4): 839-862.
Sun D W, Zhang G Y, Zheng W M.Big data stream computing: Technologies and instances[J]. Journal of Software, 2014, 25(4): 839-862.
[2] DM Farid, Z Li, A Hossain, et al. An adaptive ensemble classifier for mining concept drifting data streams[J]. Expert Systems with Applications (S0957-4174), 2013, 40(15): 5895-5906.
[3] Peng Zhang, Xingquan Zhu, Yong Shi, et al.Robust ensemble learning for mining noisy data streams[J]. Decision Support Systems (S0167-9236), 2011, 50: 469-479.
[4] Peipei Li, Xindong Wu, Xuegang Hu, et al.concept-drifting data streams with random ensemble decision trees[J]. Neurocomputing (S0925-2312), 2015, 166: 68-83.
[5] Shuo Wang, Leandro L M, Xin Yao.Resampling based ensemble methods for online class imbalance learning[J]. IEEE Transactions on Knowledge and Data Engineering (S1041-4347), 2015, 27(5): 1356-1367.
[6] Yu Sun, Ke Tang, LL Mink, et al.Online ensemble learning of data streams with gradually evolved classes[J]. IEEE Transactions on Knowledge and Data Engineering (S1041-4347), 2016, 28(6): 1532-1545.
[7] 文益民, 唐诗淇, 冯超, 等. 基于在线迁移学习的重现概念数据流分类[J]. 计算机研究与发展, 2016, 53(8): 1781-1791.
Wen Yimin, Tang Shiqi, Feng chao, et al. Online transfer learning for mining recurring concept in data stream classification[J]. Journal of Computer Research and Development, 2016, 53(8): 1781-1791.
[8] Oza N C.Online bagging and boosting[C]// IEEE International Conference on Systems, Man and Cybernetics, 2005: 1-6.
[9] Crammer K, Dekel O, Keshet J, et al.Online passive-aggressive algorithms[J]. Journal of machine learning research (S1532-4435), 2006, 7(3): 551-585.
[10] 易磊, 潘志松, 邱俊洋, 等. 在线学习的大规模网络流量分类研究[J]. 智能系统学报, 2016, 11(3): 318-327.
Yi Lei, Pan Zhisong, Qiu Junyang, et al.Large-scale network traffic classification based on online learning[J]. CAAI Transaction on Intelligent Systems, 2016, 11(3): 318-327.
[11] Imran K han, Joshua Z Huang, Kamen Ivanov. Incremental density-based ensemble clustering over evolving data streams[J]. Neurocomputing (S0925-2312), 2016, 191: 34-43.
[12] 李志杰, 李元香, 王峰, 等. 面向大数据分析的在线学习算法综述[J]. 计算机研究与发展, 2015, 52(8): 1707-1721.
Li Zhijie, Li Yuan xiang, Wang Feng, et al. Online learning algorithms for big data analytics: a survey[J]. Journal of Computer Research and Development, 2015, 52(8): 1707-1721.
[13] Chawla N V, Bowyer K W, Hall L O, et al.SMOTE: Synthetic Minority Over-sampling Technique[J]. Journal of Artificial Intelligence Research (S1076-9757), 2002, 16: 321-357.
[14] Holmes G, Kirkby R, Pfahringer B. MOA: massive online analysis[OL]: http://sourceforge.net/projects/moa- datastream, 2010.

[1]	李元, 耿泽伟. 基于LLE与K均值聚类算法的工业过程故障诊断[J]. 系统仿真学报, 2021, 33(9): 2066-2073.
[2]	黄颖坤, 金炜东, 颜康, 朱劼昊. 基于距离特征的雷达辐射源信号识别方法[J]. 系统仿真学报, 2021, 33(12): 2959-2966.
[3]	刘茂山, 纪志成, 王艳, 王建锋. 核块对角表达子空间聚类及收敛性分析[J]. 系统仿真学报, 2021, 33(11): 2533-2544.
[4]	刘兴, 王艳, 纪志成. 基于随机森林的风电功率短期预测方法[J]. 系统仿真学报, 2021, 33(11): 2606-2614.
[5]	李牡丹, 王印松. 基于ASW-FCM算法的风电场动态等效建模与仿真[J]. 系统仿真学报, 2020, 32(8): 1606-1616.
[6]	宋晓瑞, 邹玲, 吴玲达, 徐万朋. 基于背景重建的高光谱图像异常检测[J]. 系统仿真学报, 2020, 32(7): 1287-1293.
[7]	张惠娟, 郭欣琪, 王冬青, 贾金原. 基于DR预测的大规模Web3D场景预加载机制[J]. 系统仿真学报, 2020, 32(7): 1341-1348.
[8]	刘东江, 黎建辉. 基于Spark的并行图聚类算法研究[J]. 系统仿真学报, 2020, 32(6): 1038-1050.
[9]	陈胜, 陈纯毅, 邢琦玮, 杨超智. 基于可视区域间接光照聚类的虚拟点光源采样[J]. 系统仿真学报, 2020, 32(6): 1085-1093.
[10]	王建敏, 吴云洁. 基于聚类云模型的小样本数据可信度评估[J]. 系统仿真学报, 2019, 31(7): 1263-1271.
[11]	黄子赫, 高尚兵, 潘志庚, 惠浩, 廖麒羽, 赵锋锋. 基于快速密度聚类的载客热点可视化分析方法[J]. 系统仿真学报, 2019, 31(7): 1429-1438.
[12]	王杰, 王艳. 基于量子遗传聚类算法的质量控制方法[J]. 系统仿真学报, 2019, 31(12): 2591-2599.
[13]	张芳, 陈彬, 汤杨华, 董健, 艾川, 邱晓刚. 基于兴趣点聚类的无桩共享单车时空模式分析[J]. 系统仿真学报, 2019, 31(12): 2829-2836.
[14]	肖迪, 张小咏, 胡杨. 基于手机大数据的城市功能区识别方法[J]. 系统仿真学报, 2019, 31(11): 2281-2288.
[15]	李菊, 曹明伟, 余烨, 夏瑜, 周立凡. 抗遮挡自适应的粒子滤波算法[J]. 系统仿真学报, 2018, 30(9): 3552-3557.

在线合成增量式数据流分类算法

Online Synthesis Incremental Data Streams Classification Algorithm

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价