系统仿真学报 ›› 2021, Vol. 33 ›› Issue (9): 2261-2269.doi: 10.16182/j.issn1004731x.joss.20-0372

• 国民经济仿真 • 上一篇    下一篇

基于非线性赋权XGBoost算法的航班延误分类预测

唐红, 王栋, 宋博, 褚文奎, 何林远   

  1. 空军工程大学 航空工程学院,陕西 西安 710038
  • 收稿日期:2020-06-17 修回日期:2020-08-05 出版日期:2021-09-18 发布日期:2021-09-17
  • 作者简介:唐红(1967-),女,硕士,副教授,研究方向为军事航空通信与导航。E-mail:th118th@163.com
  • 基金资助:
    国家自然科学基金(61701524)

Classification of Flight Delay Based on Nonlinear Weighted XGBoost

Tang Hong, Wang Dong, Song Bo, Chu Wenkui, He Linyuan   

  1. Aeronautics Engineering College, Air Force Engineering University, X'an 710038, China
  • Received:2020-06-17 Revised:2020-08-05 Online:2021-09-18 Published:2021-09-17

摘要: 针对数据不平衡背景下的航班延误分类预测问题,提出一种非线性赋权的极限梯度提升(eXtreme Gradient Boosting, XGBoost)算法。基于航班延误数据的不平衡特性及数据不平衡对分类预测性能的影响分析,提出基于样本比例的启发式非线性赋权方法,改进负对数似然损失函数,采用网格搜索和交叉检验法确定最优参数,并采用真实的航班延误数据集进行分类预测。实验结果表明:非线性赋权XGBoost算法能够在保持整体分类准确率的同时,能够提高对延误情况的分类预测准确率,统计指标和性能曲线均优于传统算法。

关键词: 极限梯度提升, 梯度提升, 航班延误, 数据不平衡

Abstract: Aiming at the classification of flight delay under imbalance data, a novel method based on nonlinear weighted XGBoost (extreme gradient boosting) is proposed. The imbalance of flight delay data and the influence for classification performance caused by the data imbalance are analyzed. A heuristic nonlinear weighting method based on sample proportion is proposed, and the negative log likelihood loss function is optimized. The real flight delay dataset is used to validate the performance of the classification algorithm. The experiment results show that the proposed nonlinear weighted XGBoost algorithm can improve the classification accuracy of flight delay, while ensuing a high overall classification accuracy. Compared to traditional methods, the proposed algorithm has good performance of statistical metrics and performance curves.

Key words: extreme gradient boosting, gradient boosting, flight delay, data imbalance

中图分类号: