Community question answering-oriented Chinese question classification

doi:10.11772/j.issn.1001-9081.2016.04.1060

Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (4): 1060-1065.DOI: 10.11772/j.issn.1001-9081.2016.04.1060

Previous Articles Next Articles

Community question answering-oriented Chinese question classification

DONG Caizheng¹, LIU Baisong²

1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo Zhejiang 315211, China;
2. Library and Information Center, Ningbo University, Ningbo Zhejiang 315211, China

Received:2015-08-18 Revised:2015-10-29 Online:2016-04-10 Published:2016-04-08
Supported by:
This work is partially supported by the Scientific Research Fund of Zhejiang Provincial Education Department (20071008), the Open Funds for the Provincial and Ministerial Laboratory (B2014).

面向问答社区的中文问题分类

董才正¹, 刘柏嵩²

1. 宁波大学信息科学与工程学院, 浙江宁波 315211;
2. 宁波大学图书馆与信息中心, 浙江宁波 315211

通讯作者: 董才正
作者简介:董才正(1989-),男,湖北黄冈人,硕士研究生,主要研究方向:自然语言处理、数据挖掘; 刘柏嵩(1971-),男,安徽安庆人,研究员,博士,CCF会员,主要研究方向:网络信息智能处理、移动互联网、大数据分析。
基金资助:
浙江省教育厅(文)/科研计划项目(20071008);浙江省部级实验室开放基金资助项目(B2014)。

Abstract

Abstract: There are many questions without interrogative words in the Community Question Answering (CQA), where non-factoid questions make up a high proportion. In order to solve a specific case that the traditional categories for question classification is based on the factoid questions and the traditional methods for question classification largely depend on the interrogative words, a coarse-grained classification category and a novel hierarchical structure question classification method based on the interrogative words were proposed. The Support Vector Machine (SVM) model was used to classify the questions which contained interrogative words. As for the questions without interrogative words, a classifier based on focus word was constructed. The comparison experiment with method based on SVM was conducted on the dataset of Chinese questions crawled from Zhihu, and the proposed method improved the accuracy by 4.7 percentage points. The experimental results illustrate that the proposed method which selects different classifier according to whether a question contains interrogative words can effectively reduce the dependence on interrogative word, and make more accurate classification for Chinese questions.

Key words: Chinese question classification, Community Question Answering (CQA), hierarchical classification, Support Vector Machine (SVM), focus word

摘要： 传统的问题分类体系大都基于事实类问题,传统的问题分类方法也比较依赖于疑问词这一分类特征,但问答社区(CQA)中非事实类问题居多,且许多问题并不包含疑问词,为此,提出一种面向问答社区的粗粒度分类体系,并在此基础上提出一种基于疑问词的层次化结构问题分类方法。该方法首先自动识别问题中的疑问词,若疑问词存在,则用支持向量机(SVM)模型进行分类;而对没有疑问词的问题,则用所构造的基于焦点词的分类器进行分类。通过在从中文问答社区知乎中所爬取的问题数据集上进行实验,与传统的基于SVM模型的分类方法相比,该方法的分类准确率提高了4.7个百分点。实验结果表明,这种根据问题是否含有疑问词而选择不同分类器的方法,减轻了分类方法对疑问词的依赖,能有效提高问答社区中问题分类的准确率。

关键词: 中文问题分类, 问答社区, 层次分类, 支持向量机, 焦点词

CLC Number:

TP391.4

DONG Caizheng, LIU Baisong. Community question answering-oriented Chinese question classification[J]. Journal of Computer Applications, 2016, 36(4): 1060-1065.

董才正, 刘柏嵩. 面向问答社区的中文问题分类[J]. 计算机应用, 2016, 36(4): 1060-1065.

References

[1] TOBA H, MING Z Y, ADRIANI M, et al. Discovering high quality answers in community question answering archives using a hierarchy of classifiers[J]. Information Sciences, 2014, 261(5):101-115.
[2] BISWAS P, SHARAN A, KUMAR R. Question classification using syntactic and rule based approach[C]//Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics. Piscataway, NJ:IEEE, 2014:1033-1038.
[3] LIU X M, LIU L. Question classification based on focus[C]//Proceedings of the 2012 International Conference on Communication Systems and Network Technologies. Washington, DC:IEEE Computer Society, 2012:512-516.
[4] RAY S K, SINGH S, JOSHI B P. A semantic approach for question classification using WordNet and Wikipedia[J]. Pattern Recognition Letters, 2010, 31(13):1935-1943.
[5] HACIOGLU K, WARD W. Question classification with support vector machines and error correcting codes[C]//Proceedings of the 2003 North American Chapter of the Association for Computational Linguistics on Human Language Technology. Stroudsburg, PA:Association for Computational Linguistics, 2003, 2:28-30.
[6] ZHANG D, LEE W S. Question classification using support vector machines[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2003:26-32.
[7] 文勖,张宇,刘挺,等. 基于句法结构分析的中文问题分类[J]. 中文信息学报, 2006,20(2):33-39.(WEN X, ZHANG Y, LIU T, et al. Syntactic structure parsing based chinese question classification[J]. Journal of Chinese Information Processing, 2006, 20(2):33-39.)
[8] 袁晓洁,师建兴,宁华,等.问题分类中基于句法和语义信息的特征选择[J].计算机工程与应用,2008,44(33):144-147.(YUAN X J, SHI J X, NING H, et al. Feature selection using syntactic and semantic information in question classification[J]. Computer Engineering and Applications, 2008, 44(33):144-147.)
[9] SHERKAT E, FARHOODI M. A hybrid approach for question classification in Persian automatic question answering systems[C]//Proceedings of the 2014 4th International Conference on Computer and Knowledge Engineering. Piscataway, NJ:IEEE, 2014:279-284.
[10] XIE Z W, PAN S L. Chinese question classification based on multi strategy method[C]//Proceedings of the 2011 International Conference on Transportation, Mechanical, and Electrical Engineering. Piscataway, NJ:IEEE, 2011:1605-1609.
[11] LI X, ROTH D. Learning question classifiers:the role of semantic information[J]. Natural Language Engineering, 2006,12(3):229-249.
[12] 孙景广,蔡东风,吕德新,等. 基于知网的中文问题自动分类[J].中文信息学报,2007,21(1):90-95.(SUN J G, CAI D F, LYU D X, et al. HowNet based Chinese question automatic classification[J]. Journal of Chinese Information Processing, 2007,21(1):90-95.)
[13] LONI B. A survey of state-of-the-art methods on question classification[EB/OL].[2015-02-14]. https://www.researchgate.net/publication/241886726_A_Survey_of_State-of-the-Art_Methods_on_Question_Classification.
[14] 李正华.依存句法分析统计模型及树库转化研究[D].哈尔滨:哈尔滨工业大学,2008:1-5.(LI Z H. Research on statistical model and Treebank conversation for dependency parsing[D]. Harbin:Harbin Institute of Technology, 2008:1-5.)
[15] 朱征宇,孙俊华.改进的基于《知网》的词汇语义相似度计算[J].计算机应用,2013,33(8):2276-2279.(ZHU Z Y, SUN J H. Improved vocabulary semantic similarity calculation based on HowNet[J]. Journal of Computer Applications, 2013, 33(8):2276-2279.)
[16] CHANG C C, LIN C J. LIBSVM:a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011,2(3):Article No. 27.

Community question answering-oriented Chinese question classification

面向问答社区的中文问题分类

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	JIA Heming, JIANG Zichao, LI Yao, SUN Kangjian. Simultaneous feature selection optimization based on improved spotted hyena optimizer algorithm [J]. Journal of Computer Applications, 2021, 41(5): 1290-1298.
[2]	YUAN Qianqian, DENG Hongmin, WANG Xiaohang. Citrus disease and insect pest area segmentation based on superpixel fast fuzzy C-means clustering and support vector machine [J]. Journal of Computer Applications, 2021, 41(2): 563-570.
[3]	Kai LI, Jie LI. Structure-fuzzy multi-class support vector machine algorithm based on pinball loss [J]. Journal of Computer Applications, 2021, 41(11): 3104-3112.
[4]	TONG Lin, GUAN Zheng. Fuzzy granulation prediction of traffic flow based on improved whale optimization support vector machine [J]. Journal of Computer Applications, 2021, 41(10): 2919-2927.
[5]	ZHANG Jianming, SHI Yuanhao, XU Zhengyi, WEI Jianming. Adaptive UWB/PDR fusion positioning algorithm based on error prediction [J]. Journal of Computer Applications, 2020, 40(6): 1755-1762.
[6]	WANG Yang, ZHAO Hongdong. Human activity recognition based on improved particle swarm optimization-support vector machine and context-awareness [J]. Journal of Computer Applications, 2020, 40(3): 665-671.
[7]	LI Hui, YANG Zhixia. Multiple birth support vector machine based on Rescaled Hinge loss function [J]. Journal of Computer Applications, 2020, 40(11): 3139-3145.
[8]	NIU Xiaoke, HUANG Yixin, XU Huaxing, JIANG Zhenyang. Speaker recognition in strong noise environment based on auditory cortical neuronal receptive field [J]. Journal of Computer Applications, 2020, 40(10): 3034-3040.
[9]	BAI Dongying, YI Yaxing, WANG Qingchao, YU Zhiyong. Gradual multi-kernel learning method for concept drift [J]. Journal of Computer Applications, 2019, 39(9): 2494-2498.
[10]	HE Hailin, ZHENG Jianbin, YU Fangli, YU Lie, ZHAN Enqi. Exoskeleton robot gait detection based on improved whale optimization algorithm [J]. Journal of Computer Applications, 2019, 39(7): 1905-1911.
[11]	KONG Jing, GUO Yuanbo, LIU Chunhui, WANG Yifeng. Gait feature identification method based on motion sensor in smartphone [J]. Journal of Computer Applications, 2019, 39(6): 1747-1752.
[12]	PAN Jianguo, LI Hao. Intrusion detection approach for IoT based on practical Byzantine fault tolerance [J]. Journal of Computer Applications, 2019, 39(6): 1742-1746.
[13]	XU Tao, WANG Xiaoming. Generalization error bound guided discriminative dictionary learning [J]. Journal of Computer Applications, 2019, 39(4): 940-948.
[14]	CHEN Wanzhi, XU Dongsheng, ZHANG Jing, TANG Yu. Intrusion detection method for industrial control system with optimized support vector machine and K-means++ [J]. Journal of Computer Applications, 2019, 39(4): 1089-1094.
[15]	YANG Shuo, PU Baoming, LI Xiangze, WANG Shuai, CHANG Zhanguo. Cardiac arrhythmia detection algorithm based on deep long short-term memory neural network model [J]. Journal of Computer Applications, 2019, 39(3): 930-934.