计算机应用 ›› 2012, Vol. 32 ›› Issue (09): 2488-2490.DOI: 10.3724/SP.J.1087.2012.02488

• 数据库技术 • 上一篇    下一篇

基于查询扩展的人名消歧

杨欣欣1,2*,李培峰1,2,朱巧明1,2   

  1. 1.苏州大学 计算机科学与技术学院,江苏 苏州 215006;
    2.江苏省计算机信息处理技术重点实验室,江苏 苏州 215006
  • 收稿日期:2012-03-16 修回日期:2012-05-03 发布日期:2012-09-01 出版日期:2012-09-01
  • 通讯作者: 杨欣欣
  • 作者简介:杨欣欣(1988-),男,江苏淮安人,硕士研究生,主要研究方向:自然语言处理; 李培峰(1971-),男,江苏吴江人,副教授,主要研究方向:自然语言处理、分布式信息系统、网格计算; 朱巧明(1963-),男,江苏昆山人,教授,博士生导师,主要研究方向:自然语言处理、Web信息处理、嵌入式系统。
  • 基金资助:

    国家自然科学基金资助项目(61070123,61003155);江苏省自然科学基金资助项目(BK2011282)

Name disambiguation based on query expansion

YANG Xin-xin1,2*,LI Pei-feng1,2,ZHU Qiao-ming1,2   

  1. 1.School of Computer Science and Technology,Soochow University,Suzhou Jiangsu 215006,China;
    2.Key Laboratory of Computer Information Processing Technology of Jiangsu Province,Suzhou Jiangsu 215006,China
  • Received:2012-03-16 Revised:2012-05-03 Online:2012-09-01 Published:2012-09-01
  • Contact: Xin-Xin Yang

摘要: 针对现有很多基于特征的人名消歧方法不适用于文档本身特征稀疏的问题,提出一种借助丰富的互联网资源,使用搜索引擎查询并扩展出更多与文档相关特征的方法。首先根据搜索引擎的特性构建了四类查询规则,然后通过这些查询规则进行搜索并返回前k个文档,最后对这些文档使用文档频率(DF)方法进行特征选择,并将选择的特征加入到原文档中。实验证明,该方法能显著提高人名消歧系统的性能,平均F值由76%增加到81%。

关键词: 查询扩展, 搜索引擎, 人名消歧, 命名实体

Abstract: Taking into account that many existing name disambiguation approaches are not suitable for documents with few features, this paper put forward a method that can get more features related to documents through search engine extension with the help of rich Internet resources. Firstly, four types of queries were constructed according to the characteristics of the search engine rules, and then search was done based on those query rules and the first k documents were returned. Lastly, a feature selection method of Document Frequency (DF) was used in these documents and the selected features were added to the original documents. The experimental results show that the query expansion approach can improve the performance of name disambiguation and make F-measure increase from 76% to 81%.

Key words: query expansion, search engine, name disambiguation, named entity

中图分类号: