计算机应用

• 人工智能 • 上一篇    下一篇

面向短文本的命名实体识别

王丹 樊兴华   

  1. 重庆邮电大学计算机科学与技术研究所 重庆邮电大学
  • 收稿日期:2008-07-14 修回日期:1900-01-01 发布日期:2009-01-01 出版日期:2009-01-01
  • 通讯作者: 王丹

Named entity recognition for short text

<a href="http://www.joca.cn/EN/article/advancedSearchResult.do?searchSQL=(((Xin-Hua FAN[Author]) AND 1[Journal]) AND year[Order])" target="_blank">Xin-Hua FAN</a>   

  • Received:2008-07-14 Revised:1900-01-01 Online:2009-01-01 Published:2009-01-01

摘要: 针对短文本命名实体识别这项紧缺任务,提出了一种面向短文本的快速有效的命名实体识别方法。该方法主要分成三步:第一步,针对短文本表达不规范特性对命名实体识别的干扰,采取去干扰字符,化繁为简等规范化操作。第二步,针对短文本语意不完整特性,提出用HMM(隐马尔可夫模型)以词性做观察值进行初步命名实体识别。第三步,据初步识别结果,构建拼音同指关系库来识别潜在实体。在由8464篇短文本构成的测试集上运行的实验表明,该方法能较好地进行短文本命名实体识别。

关键词: 短文本, 隐马尔可夫模型, 命名实体识别, 拼音同指关系库, 词性

Abstract: Aiming at the urgent task of named entity recognition for short text, a fast and effective method was proposed. The method comprised three steps: Firstly, according to the disturbance of non-standard expression in short text, the elimination of interferential characters and text simplification were adopted. Secondly, according to the non-integrity of short text, Hidden Markov Model (HMM) was employed to preliminarily name entity recognition, in which the part of speech was used as observed value. In the end, by means of the preliminary recognition result, a pinyin co-referential relation library was established to identify the potential entity. The experiment on the test-set including 8464 short texts shows that this method has better performance to named entity recognition for short text.

Key words: short text, HMM, named entity recognition, pinyin co-referential relation library, part of speech