系统仿真学报 ›› 2016, Vol. 28 ›› Issue (3): 705-710.

• 仿真应用工程 • 上一篇    下一篇

基于变精度粗糙集决策树垃圾邮件过滤

王靖1, 王兴伟1,2, 赵悦3   

  1. 1.东北大学计算机科学与工程学院,沈阳 110819;
    2.东北大学软件学院,沈阳 110819;
    3.辽宁大学信息化中心,沈阳 110036
  • 收稿日期:2014-09-16 修回日期:2014-10-27 发布日期:2020-07-02
  • 作者简介:王靖(1991-),男,江苏泰州,硕士生,研究方向为移动网络数据处理;王兴伟(1968-),男,辽宁盖州,博士,教授,博导,研究方向为未来互联网、云计算、网络安全和信息安全。
  • 基金资助:
    国家自然科学基金(61572123);国家杰出青年科学基金(61225012, 71325002);辽宁省百千万人才工程项目(2013921068)

Spam Filtering Based on Variable Precision Rough Set Decision Tree

Wang Jing1, Wang Xingwei1,2, Zhao Yue3   

  1. 1. College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;
    2. College of Software, Northeastern University, Shenyang 110819, China;
    3. Network Service, Liaoning University, Shenyang 110036, China
  • Received:2014-09-16 Revised:2014-10-27 Published:2020-07-02

摘要: 电子邮件以方便快捷、收费低廉的特点,深受人们青睐,成为最常用的通信手段之一。近年来,电子邮件被恶意利用,导致网络上垃圾邮件泛滥,浪费了网络资源,干扰邮件系统的正常运行,给用户的日常生活带来影响。为了过滤垃圾邮件,决策树算法被引入,根据提取出的邮件头部信息进行分析训练,并构建一棵决策树用于垃圾邮件的过滤。为了减少正常邮件被当作垃圾邮件情况的发生,降低给用户造成的损失,变精度粗糙集模型被引入,将少数特定实例或噪声数据分到合适的类别中。实验结果表明,该机制可用于垃圾邮件过滤,降低了正常邮件被判定为垃圾邮件的误报率。

关键词: 垃圾邮件, 过滤, 特征信息, 变精度粗糙集, 决策树

Abstract: Email is favored by people and has become one of the most universal information communication methods, inspired by its convenience and low cost. However, E-mail has also been abused by malicious people. As a result, Internet has been polluted by spam. Spam not only wastes bandwidth resource, disrupting people's normal life and work, but also influences routine application of mail servers and poses a serious threat to Internet security. Decision tree algorithm was utilized to train extracted mail head feature information to construct a decision tree, which could be further used to filter spam. Variable precision rough set model was introduced to classify some specific instance or noise data into appropriate class, avoiding the situation when normal mails were regarded as junk mails, reducing losses bringing to users. As is shown in the result, the proposed algorithm is feasible in spam filtering and decreases the rate of normal mails being regarded as junk.

Key words: spam, filtering, feature information, variable precision rough set, decision tree

中图分类号: