系统仿真学报 ›› 2019, Vol. 31 ›› Issue (5): 1010-1018.doi: 10.16182/j.issn1004731x.joss.17-0163

• 短文 • 上一篇    下一篇

泰语人名、地名、机构名实体识别研究

王红斌1,2, 郜洪奎1,2, 沈强1,2, 线岩团1,2   

  1. 1. 昆明理工大学信息工程与自动化学院,云南 昆明 650500;
    2. 昆明理工大学智能信息处理重点实验室,云南 昆明 650500
  • 收稿日期:2017-04-19 修回日期:2017-07-28 出版日期:2019-05-08 发布日期:2019-11-20
  • 作者简介:王红斌(1983-),男,云南曲靖,博士,副教授,研究方向为自然语言处理。
  • 基金资助:
    国家自然科学基金(61462054, 61363044),云南省科技厅面上项目(2015FB135), 昆明理工大学省级人培项目(KKSY201403028)

Thai Language Names, Place Names and Organization Names Entity Recognition

Wang Hongbin1,2, Gao Hongkui1,2, Shen Qiang1,2, Xian Yantuan1,2   

  1. 1.College of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;
    2. Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500, China
  • Received:2017-04-19 Revised:2017-07-28 Online:2019-05-08 Published:2019-11-20

摘要: 泰语命名实体识别是把泰语文本中的人名、地名、机构名等识别出来。由于泰语构词方法和语法规则复杂,针对这一问题,将泰语命名实体识别任务转化为对泰语句子中的词汇序列进行标记。结合泰语语言特点,选择合适的泰语上下文特征,分别使用隐马尔科夫模型和条件随机场模型在泰语实体识别训练语料上进行了模型构建,并在测试语料上对所构建的序列标注模型进行了实验验证。实验结果表明使用隐马尔科夫模型和条件随机场模型进行泰语人名、地名、机构名实体识别是可行的,并取得了较好的效果。

关键词: 命名实体识别, 隐马尔科夫统计模型, 条件随机场统计模型, 序列标注

Abstract: Named entity recognition in Thai language is aimed to identify the names of a person, a locality,an organization or an institution,and so on. Due to the complexity of Thai word formation method and grammar rules, to solve this problem, the idea of the approach proposed is to treat the task of named entity recognition in Thai language as labeling the sign of a series of words in Thai sentence. Given the characteristics of Thai language itself, certain features in the context of the samples in the Thai entity recognition corpus are extracted to train the hidden Markov model and the conditional random field model respectively, and then the labeling model is built based on the training corpus. We verify the labeling model on the test corpus through experiments. The experiment result shows that the method adopting the hidden Markov model and the conditional random field model is feasible to accomplish the task of recognizing the identification of the person, the location, and the organization or the institution; and the recognition effectiveness is well.

Key words: named entity recognition, hidden Markov statistical model, conditional random field statistical model, sequence labeling

中图分类号: