融合《知网》和搜索引擎的词汇语义相似度计算

doi:10.11772/j.issn.1001-9081.2017.04.1056

计算机应用 ›› 2017, Vol. 37 ›› Issue (4): 1056-1060.DOI: 10.11772/j.issn.1001-9081.2017.04.1056

融合《知网》和搜索引擎的词汇语义相似度计算

张硕望, 欧阳纯萍, 阳小华, 刘永彬, 刘志明

南华大学计算机科学与技术学院, 湖南衡阳 421001

收稿日期:2016-09-23 修回日期:2016-10-26 发布日期:2017-04-19 出版日期:2017-04-10
通讯作者: 欧阳纯萍
作者简介:张硕望(1993-),男,湖南湘潭人,硕士研究生,主要研究方向:自然语言处理;欧阳纯萍(1979-),女,湖南衡阳人,副教授,博士,CCF会员,主要研究方向:语义Web、情感分析;阳小华(1963-),男,湖南衡阳人,教授,博士,CCF会员,主要研究方向:信息检索、舆情分析;刘永彬(1978-),男,河北邯郸人,讲师,博士,CCF会员,主要研究方向:知识图谱、自然语言处理;刘志明(1972-),男,湖南浏阳人,教授,博士,CCF会员,主要研究方向:信息检索、大数据分析。
基金资助:
国家自然科学基金资助项目（61402220，61502221）；湖南省教育厅科研项目（16C1378，14B153，15C1186）；湖南省哲学社会科学基金资助项目（14YBA335）。

Word semantic similarity computation based on integrating HowNet and search engines

ZHANG Shuowang, OUYANG Chunping, YANG Xiaohua, LIU Yongbin, LIU Zhiming

College of Computer Science and Technology, University of South China, Hengyang Hunan 421001, China

Received:2016-09-23 Revised:2016-10-26 Online:2017-04-19 Published:2017-04-10
Supported by:
This work is partially supported by National Natural Science Foundation of China (61402220, 61502221), the Scientific Research Project of Hunan Provincial Education Department (16C1378, 14B153, 15C1186), the Philosophy and Social Science Foundation of Hunan Province (14YBA335).

摘要/Abstract

摘要： 针对当前《知网》的词语语义描述与人们对词汇的主观认知之间存在诸多不匹配的问题，在充分利用丰富的网络知识的背景下，提出了一种融合《知网》和搜索引擎的词汇语义相似度计算方法。首先，考虑了词语与词语义原之间的包含关系，利用改进的概念相似度计算方法得到初步的词语语义相似度结果；然后，利用基于搜索引擎的相关性双重检测算法和点互信息法得出进一步的语义相似度结果；最后，设计了拟合函数并利用批量梯度下降法学习权值参数，融合前两步的相似度计算结果。实验结果表明，与单纯的基于《知网》和基于搜索引擎的改进方法相比，融合方法的斯皮尔曼系数和皮尔逊系数均提升了5%，同时提升了具体词语义描述与人们对词汇的主观认知之间的匹配度，验证了将网络知识背景融入到概念相似度计算方法中能有效提高中文词汇语义相似度的计算性能。

关键词: 语义相似度, 知网, 搜索引擎, 权重, 网络

Abstract: According to mismatch between word semantic description of "HowNet" and subjective cognition of vocabulary, in the context of making full use of rich network knowledge, a word semantic similarity calculation method combining "HowNet" and search engine was proposed. Firstly, considering the inclusion relation between word and word sememes, the preliminary semantic similarity results were obtained by using improved concept similarity calculation method. Then the further semantic similarity results were obtained by using double correlation detection algorithm and point mutual information method based on search engines. Finally, the fitting function was designed and the weights were calculated by using batch gradient descent method, and the similarity calculation results of the first two steps were fused. The experimental results show that compared with the method simply based on "HowNet" or search engines, the Spearman coefficient and Pearson coefficient of the fusion method are both improved by 5%. Meanwhile, the match degree of the semantic description of the specific word and subjective cognition of vocabulary is improved. It is proved that it is effective to integrate network knowledge background into concept similarity calculation for computing Chinese word semantic similarity.

Key words: semantic similarity, HowNet, search engine, weight, network

中图分类号:

TP391.1

张硕望, 欧阳纯萍, 阳小华, 刘永彬, 刘志明. 融合《知网》和搜索引擎的词汇语义相似度计算[J]. 计算机应用, 2017, 37(4): 1056-1060.

ZHANG Shuowang, OUYANG Chunping, YANG Xiaohua, LIU Yongbin, LIU Zhiming. Word semantic similarity computation based on integrating HowNet and search engines[J]. Journal of Computer Applications, 2017, 37(4): 1056-1060.

参考文献

[1] 董强, 董振东.知网简介[EB/OL].[2013-01-29]. http://www.keenage.com/zhiwang/c_zhiwang.html. (DONG Q, DONG Z D. HowNet knowledge database[EB/OL].[2013-01-29]. http://www.keenage.com/zhiwang/c_zhiwang.html.)
[2] 刘群, 李素建.基于《知网》的词汇语义相似度的计算[EB/OL].[2015-01-12]. http://www.nlp.org.cn/Admin/kindeditor/attached/file/20130508/20130508094157_16839.pdf. (LIU Q, LI S J. Word similarity computing based on HowNet[EB/OL].[2015-01-12]. http://www.nlp.org.cn/Admin/kindeditor/attached/file/20130508/20130508094157_16839.pdf.)
[3] 王小林, 王义.改进的基于知网的词语相似度算法[J]. 计算机应用, 2011, 31(11):3075-3077.(WANG X L, WANG Y. Improved word similarity algorithm based on HowNet[J]. Journal of Computer Applications, 2011, 31(11): 3075-3077.)
[4] 夏天.汉语词语语义相似度计算研究[J]. 计算机工程, 2007, 33(6):191-194.(XIA T. Study on Chinese words semantic similarity computation[J]. Computer Engineering, 2007, 33(6): 191-194.)
[5] 朱征宇, 孙俊华.改进的基于《知网》的词汇语义相似度计算[J]. 计算机应用, 2013, 33(8):2276-2279.(ZHU Z Y, SUN J H. Improved vocabulary semantic similarity calculation based on HowNet[J]. Journal of Computer Applications, 2013, 33(8): 2276-2279.)
[6] 朱新华, 马润聪, 孙柳, 等.基于知网与词林的词语语义相似度计算[J]. 中文信息学报, 2016, 30(4):29-36.(ZHU X H, MA R C, SUN L, et al. Word semantic similarity computation based on HowNet and CiLin[J]. Journal of Chinese Information Processing, 2016, 30(4): 29-36.)
[7] 吴奎, 周献中, 王建宇, 等.基于贝叶斯估计的概念语义相似度算法[J]. 中文信息学报, 2010, 24(2):52-57.(WU K, ZHOU X Z, WANG J Y, et al. A concept semantic similarity algorithm based on Bayesian estimation[J]. Journal of Chinese Information Processing, 2010, 24(2): 52-57.)
[8] 张春红.中文维基百科的结构化信息抽取及词语相关度计算[D]. 武汉:华中师范大学, 2011.(ZHANG C H. Extracting structured information from the Chinese Wikipedia and measuring relatedness between words[D]. Wuhan: Central China Normal University, 2011.)
[9] CHEN H H, LIN M S, WEI Y C. Novel association measures using Web search with double checking[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2006: 1009-1016.
[10] CILIBRASI R L, VITANYI P M B. The Google similarity distance[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383.
[11] 高国强, 黄吕威, 陈丰钰.使用网络搜索引擎计算汉语词汇的语义相似度[J]. 计算机技术与发展, 2014, 24(7):84-87.(GAO G Q, HUANG L W, CHEN F Y. Calculation of Chinese words semantic similarity using network search engines[J]. Computer Technology and Development, 2014, 24(7): 84-87.)
[12] 陈海燕.基于搜索引擎的词汇语义相似度计算方法[J]. 计算机科学, 2015, 42(1):261-267.(CHEN H Y. Measuring semantic similarity between words using Web search engines[J]. Computer Science, 2015, 42(1):261-267.)
[13] BOLLEGALA D, MATSUO Y, ISHIZUKA M. A Web search engine-based approach to measure semantic similarity between words[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(7): 977-990.
[14] 李峰, 李芳.中文词语语义相似度计算——基于《知网》2000[J]. 中文信息学报, 2007, 21(3):99-105.(LI F, LI F. An new approach measuring semantic similarity in HowNet 2000[J]. Journal of Chinese Information Processing, 2007, 21(3): 99-105.)
[15] LIN D. An information theoretic definition of similarity semantic distance in WordNet[C]//ICML 1998: Proceedings of the 15th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 1998: 296-304.
[16] FIRTH J R. A synopsis of linguistic theory 1930-1955[J]. Studies in Linguistic Analysis (Special Volume of the Philological Society), 1957, 41(4): 1-32.

融合《知网》和搜索引擎的词汇语义相似度计算

Word semantic similarity computation based on integrating HowNet and search engines

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	方介泼, 陶重犇. 应对零日攻击的混合车联网入侵检测系统[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2763-2769.
[3]	杨航, 李汪根, 张根生, 王志格, 开新. 基于图神经网络的多层信息交互融合算法用于会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2719-2725.
[4]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[5]	姚光磊, 熊菊霞, 杨国武. 基于神经网络优化的花朵授粉算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2829-2837.
[6]	庞川林, 唐睿, 张睿智, 刘川, 刘佳, 岳士博. D2D通信系统中基于图卷积网络的分布式功率控制算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2855-2862.
[7]	黄颖, 杨佳宇, 金家昊, 万邦睿. 用于RGBT跟踪的孪生混合信息融合算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2878-2885.
[8]	徐志刚, 张创. 基于门控位置编码的壁画图像多级色彩还原[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2931-2937.
[9]	杜郁, 朱焱. 构建预训练动态图神经网络预测学术合作行为消失[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2726-2731.
[10]	王娜, 蒋林, 李远成, 朱筠. 基于图形重写和融合探索的张量虚拟机算符融合优化[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2802-2809.
[11]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[12]	唐廷杰, 黄佳进, 秦进. 基于图辅助学习的会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2711-2718.
[13]	张睿, 张鹏云, 高美蓉. 自优化双模态多通路非深度前庭神经鞘瘤识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2975-2982.
[14]	李金金, 桑国明, 张益嘉. APK-CNN和Transformer增强的多域虚假新闻检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2674-2682.
[15]	薛桂香, 王辉, 周卫峰, 刘瑜, 李岩. 基于知识图谱和时空扩散图卷积网络的港口交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2952-2957.