系统仿真学报 ›› 2022, Vol. 34 ›› Issue (3): 452-460.doi: 10.16182/j.issn1004731x.joss.20-0788

• 仿真建模理论与方法 • 上一篇    下一篇

基于协方差矩阵的主动学习方法及应用研究

周博文1(), 熊伟丽1,2()   

  1. 1.江南大学 轻工过程先进控制教育部重点实验室,江苏 无锡 214122
    2.江南大学物联网工程学院,江苏 无锡 214122
  • 收稿日期:2020-10-15 修回日期:2020-11-13 出版日期:2022-03-18 发布日期:2022-03-22
  • 通讯作者: 熊伟丽 E-mail:zbw18101536910@163.com;greenpre@163.com
  • 作者简介:周博文(1996-),男,硕士生,研究方向为数据挖掘,工业过程建模。Email:zbw18101536910@163.com
  • 基金资助:
    国家自然科学基金(61773182);国家重点研发计划子课题(2018YFC1603705-03)

Research on Active Learning Method and Application Based on Covariance Matrix

Bowen Zhou1(), Weili Xiong1,2()   

  1. 1.Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi 214122
    2.School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China.
  • Received:2020-10-15 Revised:2020-11-13 Online:2022-03-18 Published:2022-03-22
  • Contact: Weili Xiong E-mail:zbw18101536910@163.com;greenpre@163.com

摘要:

由于工业过程采集的数据中常包含大量的无标签样本,而有标签样本数量少且人工标记成本较高,因此,提出一种基于协方差矩阵的主动学习方法。利用有标签样本建立高斯过程回归模型,并构建无标签样本之间的协方差矩阵,以协方差矩阵行列式的值作为评价指标。在挑选信息量较大的无标签样本的同时,衡量样本间的相似性,避免样本的冗余添加,最终在相同标记代价下提升模型预测精度。基于工业过程数据进行算法的应用仿真,验证了所提方法的可行性和有效性。

关键词: 主动学习, 高斯过程回归, 协方差矩阵, 相似性

Abstract:

Since the data collected from industrial processes often contain a large number of unlabeled samples, while the number of labeled samples is small and the cost of manual labeling is high, an active learning method based on covariance matrix is proposed. This method uses labeled samples to establish a Gaussian process regression model, and constructs the covariance matrix between the unlabeled samples, using the value of the determinant of the covariance matrix as an evaluation indicator. While selecting informative unlabeled samples, the similarity between samples is measured to avoid redundant addition of samples, which finally improves model prediction accuracy at the same labeling cost. The application simulation of the algorithm based on industrial process data verifies the feasibility and effectiveness of the proposed method.

Key words: active learning, Gaussian process regression, covariance matrix, similarity words

中图分类号: