系统仿真学报 ›› 2015, Vol. 27 ›› Issue (11): 2714-2721.

• 人工智能与仿真 • 上一篇    下一篇

改进的支持向量预选取方法在语音识别中的应用

郝瑞1, 牛砚波2, 修磊3   

  1. 1.山西财经大学信息管理学院,太原 030006;
    2.太原理工大学信息工程学院,太原 030024;
    3.山西财经大学统计学院,太原 030006
  • 收稿日期:2014-12-24 修回日期:2015-03-30 出版日期:2015-11-08 发布日期:2020-08-05

Improved Support Vector Pre-extracting Algorithm in Speech Recognition Application

Hao Rui1, Niu Yanbo2, Xiu Lei3   

  1. 1. College of Information Management, Shanxi University of Finance & Economics, Taiyuan 030006, China;
    2. College of Information Engineering, Taiyuan University of Technology, Taiyuan 030024, China;
    3. College of Statistics, Shanxi University of Finance & Economics, Taiyuan 030006, China
  • Received:2014-12-24 Revised:2015-03-30 Online:2015-11-08 Published:2020-08-05
  • About author:Hao Rui (1978-), W, PhD, Lecturer. Research interests: Artificial Intelligence.
  • Supported by:
    Shanxi Scholarship Council of China (2009-28); Natural Science Foundation of Shanxi Province (2009011022-2)

摘要: 对于大规模数据量的语音识别问题,支持向量机的训练成为一个难题。预选取支持向量是解决这一难题的方法之一。提出一种新的支持向量预选取算法.一方面对原数据集的每类数据分别进行核模糊C均值聚类,将所有的聚类中心作为每类数据的表征集;另一方面根据支持向量的几何分布含义并借鉴支持向量机的多类分类算法中一对一方法的思路提取原数据集的边界样本作为预选取支持向量进行训练和预测,并将该算法应用于嵌入式语音识别系统中,实验结果表明:该方法提高了语音识别系统的训练效率,降低了计算代价,同时保持了较高的识别率。

关键词: 支持向量, 多类分类, 核模糊C聚类, 样本预选取算法, 语音识别系统仿真

Abstract: Support vector machine (SVM) training is difficult for large-scale data set of speech recognition. A new SVM pre-extracting algorithm was proposed. On the one hand, kernel Fuzzy C-Means clustering was separately performed on each class of original data set. All the cluster centers were as a representative set of each class. On the other hand, according to the geometric distribution of support vectors and combined with the classification strategy of one-versus-one for SVM multi-class classification algorithm, boundary samples were extracted as support vectors for SVM to training and prediction. The algorithm was applied to embedded speech recognition system. Experiments indicate that this method improves the efficiency of training but also maintains the high recognition rate.

Key words: support vector, multi-class classification, kernel fuzzy C-Means clustering, sample pre- extracting, speech recognition system simulation

中图分类号: