计算机应用 ›› 2011, Vol. 31 ›› Issue (04): 1070-1073.DOI: 10.3724/SP.J.1087.2011.01070

• 人工智能 • 上一篇    下一篇

面向术语抽取的双阈值互信息过滤方法

陈士超,郁滨   

  1. 信息工程大学 电子技术学院, 郑州 450004
  • 收稿日期:2010-10-29 修回日期:2010-11-17 发布日期:2011-04-08 出版日期:2011-04-01
  • 通讯作者: 陈士超
  • 作者简介:陈士超(1982-),男,河南虞城人,博士研究生,主要研究方向:知识工程;
    郁滨(1964-),男,河南郑州人,教授,博士生导师,博士,主要研究方向:知识工程、信息安全。

Method of mutual information filtration with dual-threshold for term extraction

Shi-chao CHEN,Bin YU   

  1. Institute of Electronic Technology, Information Engineering University, Zhengzhou Henan 450004, China
  • Received:2010-10-29 Revised:2010-11-17 Online:2011-04-08 Published:2011-04-01
  • Contact: Shi-chao CHEN

摘要: 为了降低互信息方法固有问题对术语过滤效果的影响,提出一种双阈值互信息过滤方法,给出了一种基于局部评价指标的阈值确定算法,通过数据抽样、统计和计算,能够快速精确地给出最优上下限阈值。相比单阈值互信息过滤方法,在不更改互信息计算公式的前提下,通过设置双阈值的方法进行候选术语过滤与抽取。实验结果表明,在相同条件下,该方法能够显著提高准确率和F-测度值。

关键词: 术语抽取, 术语过滤, 互信息, 阈值, 评价指标

Abstract: In order to reduce the impact of problems inherent in the mutual information method on the filtering effect, a method of candidate term filtration and extraction was proposed. And a determination algorithm based on partial evaluating indicator was given, which can give the best upper and lower thresholds fast and accurately through data sampling, statistics and computation. Compared with the method of mutual information filtration with single threshold, the proposed method filtered and extracted candidate terms by setting two thresholds in the premise of not changing the calculating formula of mutual information. The experimental results show that the proposed method can improve the precision rate and F-measurement significantly under the same conditions.

Key words: term extraction, term filtration, mutual information, threshold, evaluating indicator

中图分类号: