计算机应用 ›› 2015, Vol. 35 ›› Issue (7): 1950-1954.DOI: 10.11772/j.issn.1001-9081.2015.07.1950

• 人工智能 • 上一篇    下一篇

基于LIBSVM的“就是”句句间关系判别方法

周建成1, 吴铤2, 王荣波1, 常若愚1   

  1. 1. 杭州电子科技大学 认知与智能计算研究所, 杭州 310018;
    2. 杭州电子科技大学 浙江保密学院, 杭州 310018
  • 收稿日期:2015-01-22 修回日期:2015-03-22 出版日期:2015-07-10 发布日期:2015-07-17
  • 通讯作者: 周建成(1988-),男,湖南邵阳人,硕士研究生,主要研究方向:中文信息处理,596599029@qq.com
  • 作者简介:吴铤(1972-),男,浙江杭州人,教授,博士,主要研究方向:密码学、信息安全; 王荣波(1978-),男,浙江绍兴人,副教授,博士,主要研究方向:中文信息处理; 常若愚(1986-),男,河南平顶山人,硕士研究生,主要研究方向:中文信息处理。
  • 基金资助:

    国家自然科学基金资助项目(61202281);教育部人文社会科学研究项目青年基金资助项目(12YJCZH201)。

LIBSVM-based relationship recognition method for adjacent sentences containing "jiushi"

ZHOU Jiancheng1, WU Ting2, WANG Rongbo1, CHANG Ruoyu1   

  1. 1. Institute of Cognitive and Intelligent Computing, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China;
    2. College of Zhejiang Secrecy, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China
  • Received:2015-01-22 Revised:2015-03-22 Online:2015-07-10 Published:2015-07-17

摘要:

针对使用规则和机器学习方法判别句间关系时出现因机器学习多次迭代而导致规则权值削弱现象,进而导致判别正确率偏低的问题,提出了在规则和机器学习相结合过程中对导入的明显规则特征进行加强处理的方法。首先,抽取依存词汇、语义、句子结构等具有明显规则的特有特征;然后,基于一些句间关系指示词提取普适的特征;其次,将特征写入待输入的数据向量,并且增加一维向量用来存储出现的明显规则特征;最后,运用LIBSVM模型结合规则和机器学习进行实验。实验结果表明,加强后的实验正确率较之加强前平均提高了两个百分点,各句间关系准确率、召回率、F1值整体上都取得了较好的结果,平均值达到了82.02%、88.95%、84.76%。实验思路和方法对研究句子间联系紧密度具有重要价值。

关键词: 句间关系, LIBSVM, 机器学习, kappa值, 依存词汇

Abstract:

Aiming at the low accuracy caused by the phenomenon of rule weight weakening from iterations of machine learning when judging the sentence relationships by applying rules and machine learning methods, the method of strengthening the imported obvious rule characteristics in the process of combining rules and machine learning was proposed. Firstly, these specific characteristics that having obvious rules such as dependency vocabulary, syntax and semantics information were extracted; secondly, universal characteristics were extracted based on these words that could indicate relationships; then, the characteristics were written into the data vector that to be input, and another dimensional vector was added to store the obvious rule characteristics; Finally, rules and machine learning methods were combined with LIBSVM model to perform the experiment. The experimental results show that the accuracy rate is averagely 2% higher than that before strengthening the characteristics, and all kinds of relationships' accurate rate, recall rate and F1 value show good results as a whole, their average values achieved 82.02%, 88.95% and 84.76%. The experimental ideas and methods are important for studying the compactness of adjacent sentences.

Key words: relationship between sentences, LIBSVM, machine learning, kappa value, dependency vocabulary

中图分类号: