计算机应用 ›› 2013, Vol. 33 ›› Issue (08): 2276-2279.

• 人工智能 • 上一篇    下一篇

改进的基于《知网》的词汇语义相似度计算

朱征宇1,2,孙俊华1,2   

  1. 1. 软件工程重庆市重点实验室,重庆 400044
    2. 重庆大学 计算机学院,重庆 400044;
  • 收稿日期:2013-01-31 修回日期:2013-03-07 出版日期:2013-08-01 发布日期:2013-09-11
  • 通讯作者: 孙俊华
  • 作者简介:朱征宇(1959-),男,重庆人,教授,博士,CCF高级会员,主要研究方向:Web智能检索、智能交通、数据库;
    孙俊华(1987-),男,河南驻马店人,硕士研究生,主要研究方向:数据挖掘、文本分析、自然语言处理。
  • 基金资助:
    国家科技支撑计划项目

Improved vocabulary semantic similarity calculation based on HowNet

ZHU Zhengyu1,2,SUN Junhua1,2   

  1. 1. Chongqing Key Laboratory of Software Engineering, Chongqing 400044, China
    2. College of Computer Science, Chongqing University, Chongqing 400044, China
  • Received:2013-01-31 Revised:2013-03-07 Online:2013-09-11 Published:2013-08-01
  • Contact: SUN Junhua

摘要: 针对当前基于《知网》的词汇语义相似度计算方法没有充分考虑知识库描述语言对概念描述的线性特征的情况,提出一种改进的词汇语义相似度计算方法。首先,充分考虑概念描述式中各义原之间的线性关系,提出一种位置相关的权重分配策略;然后,将所提出的策略结合二部图最大权匹配进行概念相似度计算。实验结果表明,采用改进方法得到的聚类结果F值较对比方法平均提高了5%,从而验证了改进方法的合理性和有效性。

关键词: 知网, 义原, 概念, 权重, 语义相似度

Abstract: The present HowNet-based vocabulary semantic similarity calculation method fails to give due attention to the linear feature of conceptual description in knowledge database mark-up language. To resolve this shortcoming, an improved vocabulary semantic similarity calculation method was proposed. Firstly, fully considering the linear relationship between the sememes in the conceptual description formula, a position-related weight distribution strategy was proposed. Then concept similarity was calculated by combining the strategy above with bigraph maximum weight matching. The experimental results show that, compared with the contrast method, the F-measure of text clustering using improved method increases by 5% on average, thus verifying the rationality and validity of the improved method.

Key words: HowNet, sememe, concept, weight, semantic similarity

中图分类号: