计算机应用 ›› 2017, Vol. 37 ›› Issue (4): 1056-1060.DOI: 10.11772/j.issn.1001-9081.2017.04.1056

• 人工智能 • 上一篇    下一篇

融合《知网》和搜索引擎的词汇语义相似度计算

张硕望, 欧阳纯萍, 阳小华, 刘永彬, 刘志明   

  1. 南华大学 计算机科学与技术学院, 湖南 衡阳 421001
  • 收稿日期:2016-09-23 修回日期:2016-10-26 出版日期:2017-04-10 发布日期:2017-04-19
  • 通讯作者: 欧阳纯萍
  • 作者简介:张硕望(1993-),男,湖南湘潭人,硕士研究生,主要研究方向:自然语言处理;欧阳纯萍(1979-),女,湖南衡阳人,副教授,博士,CCF会员,主要研究方向:语义Web、情感分析;阳小华(1963-),男,湖南衡阳人,教授,博士,CCF会员,主要研究方向:信息检索、舆情分析;刘永彬(1978-),男,河北邯郸人,讲师,博士,CCF会员,主要研究方向:知识图谱、自然语言处理;刘志明(1972-),男,湖南浏阳人,教授,博士,CCF会员,主要研究方向:信息检索、大数据分析。
  • 基金资助:
    国家自然科学基金资助项目(61402220,61502221);湖南省教育厅科研项目(16C1378,14B153,15C1186);湖南省哲学社会科学基金资助项目(14YBA335)。

Word semantic similarity computation based on integrating HowNet and search engines

ZHANG Shuowang, OUYANG Chunping, YANG Xiaohua, LIU Yongbin, LIU Zhiming   

  1. College of Computer Science and Technology, University of South China, Hengyang Hunan 421001, China
  • Received:2016-09-23 Revised:2016-10-26 Online:2017-04-10 Published:2017-04-19
  • Supported by:
    This work is partially supported by National Natural Science Foundation of China (61402220, 61502221), the Scientific Research Project of Hunan Provincial Education Department (16C1378, 14B153, 15C1186), the Philosophy and Social Science Foundation of Hunan Province (14YBA335).

摘要: 针对当前《知网》的词语语义描述与人们对词汇的主观认知之间存在诸多不匹配的问题,在充分利用丰富的网络知识的背景下,提出了一种融合《知网》和搜索引擎的词汇语义相似度计算方法。首先,考虑了词语与词语义原之间的包含关系,利用改进的概念相似度计算方法得到初步的词语语义相似度结果;然后,利用基于搜索引擎的相关性双重检测算法和点互信息法得出进一步的语义相似度结果;最后,设计了拟合函数并利用批量梯度下降法学习权值参数,融合前两步的相似度计算结果。实验结果表明,与单纯的基于《知网》和基于搜索引擎的改进方法相比,融合方法的斯皮尔曼系数和皮尔逊系数均提升了5%,同时提升了具体词语义描述与人们对词汇的主观认知之间的匹配度,验证了将网络知识背景融入到概念相似度计算方法中能有效提高中文词汇语义相似度的计算性能。

关键词: 语义相似度, 知网, 搜索引擎, 权重, 网络

Abstract: According to mismatch between word semantic description of "HowNet" and subjective cognition of vocabulary, in the context of making full use of rich network knowledge, a word semantic similarity calculation method combining "HowNet" and search engine was proposed. Firstly, considering the inclusion relation between word and word sememes, the preliminary semantic similarity results were obtained by using improved concept similarity calculation method. Then the further semantic similarity results were obtained by using double correlation detection algorithm and point mutual information method based on search engines. Finally, the fitting function was designed and the weights were calculated by using batch gradient descent method, and the similarity calculation results of the first two steps were fused. The experimental results show that compared with the method simply based on "HowNet" or search engines, the Spearman coefficient and Pearson coefficient of the fusion method are both improved by 5%. Meanwhile, the match degree of the semantic description of the specific word and subjective cognition of vocabulary is improved. It is proved that it is effective to integrate network knowledge background into concept similarity calculation for computing Chinese word semantic similarity.

Key words: semantic similarity, HowNet, search engine, weight, network

中图分类号: