基于LIBSVM的“就是”句句间关系判别方法

doi:10.11772/j.issn.1001-9081.2015.07.1950

计算机应用 ›› 2015, Vol. 35 ›› Issue (7): 1950-1954.DOI: 10.11772/j.issn.1001-9081.2015.07.1950

基于LIBSVM的“就是”句句间关系判别方法

周建成¹, 吴铤², 王荣波¹, 常若愚¹

1. 杭州电子科技大学认知与智能计算研究所, 杭州 310018;
2. 杭州电子科技大学浙江保密学院, 杭州 310018

收稿日期:2015-01-22 修回日期:2015-03-22 出版日期:2015-07-10 发布日期:2015-07-17
通讯作者: 周建成(1988-),男,湖南邵阳人,硕士研究生,主要研究方向:中文信息处理,596599029@qq.com
作者简介:吴铤(1972-),男,浙江杭州人,教授,博士,主要研究方向:密码学、信息安全; 王荣波(1978-),男,浙江绍兴人,副教授,博士,主要研究方向:中文信息处理; 常若愚(1986-),男,河南平顶山人,硕士研究生,主要研究方向:中文信息处理。
基金资助:
国家自然科学基金资助项目(61202281);教育部人文社会科学研究项目青年基金资助项目(12YJCZH201)。

LIBSVM-based relationship recognition method for adjacent sentences containing "jiushi"

ZHOU Jiancheng¹, WU Ting², WANG Rongbo¹, CHANG Ruoyu¹

1. Institute of Cognitive and Intelligent Computing, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China;
2. College of Zhejiang Secrecy, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China

Received:2015-01-22 Revised:2015-03-22 Online:2015-07-10 Published:2015-07-17

摘要/Abstract

摘要：

针对使用规则和机器学习方法判别句间关系时出现因机器学习多次迭代而导致规则权值削弱现象,进而导致判别正确率偏低的问题,提出了在规则和机器学习相结合过程中对导入的明显规则特征进行加强处理的方法。首先,抽取依存词汇、语义、句子结构等具有明显规则的特有特征;然后,基于一些句间关系指示词提取普适的特征;其次,将特征写入待输入的数据向量,并且增加一维向量用来存储出现的明显规则特征;最后,运用LIBSVM模型结合规则和机器学习进行实验。实验结果表明,加强后的实验正确率较之加强前平均提高了两个百分点,各句间关系准确率、召回率、F1值整体上都取得了较好的结果,平均值达到了82.02%、88.95%、84.76%。实验思路和方法对研究句子间联系紧密度具有重要价值。

关键词: 句间关系, LIBSVM, 机器学习, kappa值, 依存词汇

Abstract:

Aiming at the low accuracy caused by the phenomenon of rule weight weakening from iterations of machine learning when judging the sentence relationships by applying rules and machine learning methods, the method of strengthening the imported obvious rule characteristics in the process of combining rules and machine learning was proposed. Firstly, these specific characteristics that having obvious rules such as dependency vocabulary, syntax and semantics information were extracted; secondly, universal characteristics were extracted based on these words that could indicate relationships; then, the characteristics were written into the data vector that to be input, and another dimensional vector was added to store the obvious rule characteristics; Finally, rules and machine learning methods were combined with LIBSVM model to perform the experiment. The experimental results show that the accuracy rate is averagely 2% higher than that before strengthening the characteristics, and all kinds of relationships' accurate rate, recall rate and F1 value show good results as a whole, their average values achieved 82.02%, 88.95% and 84.76%. The experimental ideas and methods are important for studying the compactness of adjacent sentences.

Key words: relationship between sentences, LIBSVM, machine learning, kappa value, dependency vocabulary

中图分类号:

TP399
TP181

周建成, 吴铤, 王荣波, 常若愚. 基于LIBSVM的“就是”句句间关系判别方法[J]. 计算机应用, 2015, 35(7): 1950-1954.

ZHOU Jiancheng, WU Ting, WANG Rongbo, CHANG Ruoyu. LIBSVM-based relationship recognition method for adjacent sentences containing "jiushi"[J]. Journal of Computer Applications, 2015, 35(7): 1950-1954.

参考文献

[1] HUANG H, CHEN H. Chinese discourse relation recognition [C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. Chiang Mai: Asian Federation of Natural Language Processing. 2011: 1442-1446.
[2] WU W, TIAN X. Chinese sentence group [M]. Beijing: The Commercial Press, 2000: 32-49.(吴为章,田小琳.汉语句群[M].北京:商务印书馆,2000:32-49.)
[3] PRASAD R, HUSAIN S,SHARMA D M, et al. Towards an annotated corpus of discourse relations in Hindi [C]// Proceedings of the 6th Workshop on Asian Language Resources. Hyderabad: [s.n.], 2008: 73-80.
[4] AL-SAIF A, MARKERT K. The Leeds Arabic discourse treebank: annotating discourse connectives for Arabic [C]// LREC 2010: Proceedings of the 2010 International Conference on Language Resources and Evaluation. Valletta: European Language Resources Association, 2010: 2046-2053.
[5] ZEYREK D, WEBBER B. A discourse resource for Turkish: annotating discourse connectives in the METU corpus [C]// Proceedings of the 6th Workshop on Asian Language Resources. Hyderabad: [s.n.], 2008:65-72.
[6] JIA N, ZHANG Q. Chinese ellipsis recovering based on relationship between sentences [J]. Journal of Chinese Information Processing, 2008, 22(6): 33-37.(贾宁,张全.基于句间关系的汉语语义块省略恢复[J].中文信息学报,2008,22(6):33-37.)
[7] ZHANG M, SONG Y, QIN B, et al. Chinese discourse relation rec-ognition [J]. Journal of Chinese Information Processing, 2013, 27(6): 51-57.(张牧宇,宋原,秦兵,等.中文篇章级句间语义关系识别[J].中文信息学报,2013.27(6):51-57.)
[8] CHEN Y, ZHOU C. Automatic partition of Chinese sentence group [J]. Journal of Donghua University: English Edition, 2010, 27(2): 177-180.
[9] XU F, ZHU Q, ZHOU G. Implicit discourse relation recognition based on tree kernel [J]. Journal of Software, 2013, 24(5): 1022-1035.(徐凡,朱巧明,周国栋.基于树核的隐式篇章关系识别[J].软件学报,2013,24(5):1022-1035.)
[10] LIU C, CHEN J. Implicit discourse relation identification based on combined features and self-training learning [J]. Journal of Xiamen University: Natural Science, 2014, 53(2): 182-189.(刘初, 陈锦秀.基于组合特征的自训练隐式篇章关系的识别技术[J].厦门大学学报:自然科学版,2014,53(2):182-189.)
[11] SUN J, LI Y, ZHOU G, et al. Research of Chinese implicit discourse relation recognition [J]. Acta Scientiarum Naturalium Universitatis Pekinesnsis, 2014, 50(1): 111-117. (孙静,李艳翠,周国栋,等.汉语隐式篇章关系识别[J].北京大学学报:自然科学版,2014,50(1):111-117.)
[12] CHANG C, LIN C. LIBSVM—A library for support vector machines [EB/OL]. [2014-12-20]. http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html#download.

[1]	郭棉, 张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645.
[2]	毛铭泽, 曹芮浩, 闫春钢. 基于权值多样性的半监督分类算法[J]. 计算机应用, 2021, 41(9): 2473-2480.
[3]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[4]	秦静, 左长青, 汪祖民, 季长清, 王宝凤. 基于堆叠分类器的心电异常监测模型设计[J]. 计算机应用, 2021, 41(3): 887-890.
[5]	姜倩玉, 王凤英, 贾立鹏. 基于感知哈希算法和特征融合的恶意代码检测方法[J]. 计算机应用, 2021, 41(3): 780-785.
[6]	孟祥瑞, 杨文忠, 王婷. 基于图文融合的情感分析研究综述[J]. 计算机应用, 2021, 41(2): 307-317.
[7]	楼豪杰, 郑元林, 廖开阳, 雷浩, 李佳. 基于Siamese-YOLOv4的印刷品缺陷目标检测[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3206-3212.
[8]	刘晓龙, 王士同. 渐进式分离的开放集模糊域自适应算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3127-3131.
[9]	王雅辉, 钱宇华, 刘郭庆. 基于模糊优势互补互信息的有序决策树算法[J]. 计算机应用, 2021, 41(10): 2785-2792.
[10]	蒋阳升, 王胜男, 涂家祺, 李莎, 王红军. 面向高铁站的热舒适度和能耗综合预测[J]. 计算机应用, 2021, 41(1): 249-257.
[11]	朱琳, 于海涛, 雷新宇, 刘静, 王若凡. 基于MRI图像的阿尔茨海默症患者脑网络特征识别算法[J]. 计算机应用, 2020, 40(8): 2455-2459.
[12]	梁登高, 周安民, 郑荣锋, 刘亮, 丁建伟. 基于大小突发块划分的微信支付行为识别模型[J]. 计算机应用, 2020, 40(7): 1970-1976.
[13]	徐周波, 杨健, 刘华东, 黄文文. 基于XGBoost与拓扑结构信息的蛋白质复合物识别算法[J]. 计算机应用, 2020, 40(5): 1510-1514.
[14]	张俊升, 徐晶晶, 余伟. 面部美化图像质量无参考评价方法[J]. 计算机应用, 2020, 40(4): 1184-1190.
[15]	王杨, 赵红东. 基于改进粒子群优化的支持向量机与情景感知的人体活动识别[J]. 计算机应用, 2020, 40(3): 665-671.

基于LIBSVM的“就是”句句间关系判别方法

LIBSVM-based relationship recognition method for adjacent sentences containing "jiushi"

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics