Journal of System Simulation ›› 2023, Vol. 35 ›› Issue (9): 1948-1964.doi: 10.16182/j.issn1004731x.joss.22-0554

• Papers • Previous Articles     Next Articles

Data Generation Model-based Synthetic Sample Imputation Method

He Yulin1,2(), Chen Jiaqi2, Xu Hepeng2, Huang Zhexue1,2, Yin Jianfei2   

  1. 1.Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen 518107, China
    2.College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
  • Received:2022-05-25 Revised:2022-07-28 Online:2023-09-25 Published:2023-09-19

Abstract:

In order to solve the problem of inconsistent probability distribution between synthetic samples by imputation and real samples, a data generation model-based synthetic sample imputation (DGM-SSI) method is proposed. The data generation model of real samples is constructed based on the Gaussian mixture model, and the number of corresponding components of the Gaussian mixture model is determined by the multi-model fusion strategy. The synthetic samples required for model imputation are generated by using the data obtained from the real samples. Specifically, the components of the data generation model and their weights are used to control the generation of synthetic samples. The feasibility and effectiveness of the DGM-SSI method are verified on 20 multi-model and multi-dimensional mixed distributions. The experiment result shows that compared with random sample imputation, synthetic minority over-sampling technique (SMOTE), and its two latest variants, the proposed method can obtain synthetic samples with a more consistent probability distribution, which proves that this method is a reasonable synthetic sample imputation method.

Key words: synthetic sample imputation, data generation model, Gaussian mixture model, synthetic minority over-sampling technique, probability distribution consistency

CLC Number: