计算机应用 ›› 2010, Vol. 30 ›› Issue (06): 1668-1670.

• 软件过程技术与中文信息处理 • 上一篇    下一篇

计算术语间语义相似度的混合方法

魏韡1,向阳2,陈千3   

  1. 1. 同济大学电子与信息工程学院;井冈山大学信息与传媒学院
    2. 上海同济大学嘉定校区电信学院计算机系
    3. 同济大学
  • 收稿日期:2010-01-04 修回日期:2010-02-23 发布日期:2010-06-01 出版日期:2010-06-01
  • 通讯作者: 魏韡
  • 基金资助:
    国家自然科学基金资助项目;国家高科技计划项目;上海市科委制造业信息化专项基金

Combined measurement approach for semantic similarity of terms

  • Received:2010-01-04 Revised:2010-02-23 Online:2010-06-01 Published:2010-06-01

摘要: 提出一种基于有向无环图和内在信息量的计算语义相似度的方法。首先计算出两个术语基于所在有向无环图的子图,再分别计算两个子图的交集和并集。用内在信息量方法计算出两个子图的交集和并集包含的节点的内在信息量,再计算出交集的节点内在信息量之和以及并集的节点内在信息量之和,将两者的比值作为两个术语的语义相似度。实验结果表明,该方法具有较高的准确度。

关键词: 语义相似度, 内在信息量, 有向无环图

Abstract: Measuring semantic similarities of terms is a key issue in many research fields. This paper proposed a method based on the Directed Acyclic Graphs (DAG) of terms and the intrinsic information content of terms to measure the semantic similarities of terms. It first calculated the sub-graphs of two terms based on the directed acyclic graph, and then calculated the intersection and union of the sub-graphs. The semantic similarity of two terms is the ratio of the total intrinsic information content of terms in the intersection to the total intrinsic information content of terms in the union. The experimental results show that the method has a higher degree of accuracy.

Key words: Semantic similarity, intrinsic information content, DAG