Short question classification based on semantic extensions

doi:10.11772/j.issn.1001-9081.2015.03.792

Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (3): 792-796.DOI: 10.11772/j.issn.1001-9081.2015.03.792

Previous Articles Next Articles

Short question classification based on semantic extensions

YE Zhonglin¹, YANG Yan¹, JIA Zhen¹, YIN Hongfeng²

1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 610031, China;
2. DOCOMO Innovations Incorporation, Palo Alto CA, 94304 USA

Received:2014-10-16 Revised:2014-11-18 Online:2015-03-10 Published:2015-03-13

基于语义扩展的短问题分类

冶忠林¹, 杨燕¹, 贾真¹, 尹红风²

1. 西南交通大学信息科学与技术学院, 成都 610031;
2. DOCOMO Innovations公司, 美国加州帕罗奥图, 94304

通讯作者: 杨燕
作者简介:冶忠林(1989-),男(回族),青海西宁人,硕士研究生,主要研究方向:自然语言处理;杨燕(1964-),女,安徽合肥人,教授,博士生导师,博士,主要研究方向:数据挖掘、计算智能、集成学习;贾真(1975-),女,河南开封人,讲师,博士,主要研究方向:信息抽取、知识工程;尹红风(1967-),男,河南夏邑人,教授,博士,主要研究方向:语义搜索、大数据
基金资助:
国家自然科学基金资助项目(61170111,61262058)

Abstract

Abstract:

Question classification is one of the tasks in question answering system. Since questions often have rare words and colloquial expressions, especially in the application of voice interaction, the traditional text classifications perform poorly in short question classification. Thus a short question classification algorithm was proposed, which was based on semantic extensions and used the search engine to extend knowledge for short questions, the question's category was got by selecting features with the topic model and calculating the word similarity. The experimental results show that the proposed method can get F-measure value of 0.713 in a set of 1365 real problems, which is higher than that of Support Vector Machine (SVM), K-Nearest Neighbor (KNN) algorithm and maximum entropy algorithm. Therefore, the accuracy of the question classification can be improved by above method in question answering system.

Key words: topic model, question classification, search engine, question answering system

摘要：

问题分类是问答系统任务之一。特别是语音交互方式中,用户的提问较短,具有口语化特征,利用传统文本分类方法对问题进行分类的效果不佳。为此提出一种基于语义扩展的短问题分类方法,该方法使用搜索引擎对问题进行知识扩展;然后,使用主题模型进行特征词选择;最后,利用词语相似度计算获取问题的类别。实验结果表明,所提方法在1365条真实问题集上平均F-measure值达到0.713,其值高于支持向量机(SVM)、K近邻(KNN)算法和最大熵方法。因此,该方法在问答系统中可以帮助系统提升问题分类的准确率。

关键词: 主题模型, 问题分类, 搜索引擎, 问答系统

CLC Number:

TP391.1

YE Zhonglin, YANG Yan, JIA Zhen, YIN Hongfeng. Short question classification based on semantic extensions[J]. Journal of Computer Applications, 2015, 35(3): 792-796.

冶忠林, 杨燕, 贾真, 尹红风. 基于语义扩展的短问题分类[J]. 计算机应用, 2015, 35(3): 792-796.

References

[1] LEE K-S, OH J-H, HUANG J-X, et al. TREC-9 experiments at KAIST: QA, CLIR and batch filtering[C]//Proceedings of the 9th Text Retrieval Conference (TREC-9). Gaithersburg: NIST, 2000:303-316.
[2] PASCA M A, HARABAGIU S M. High performance question/answering[J]. Research and Development in Information Retrieval, 2001,11(3):366-374.
[3] PRAGER J, REDEV D, BROWN E, et al. The use of predictive annotation for question answering in TREC[C]//Proceedings of the 8th Text Retrieval Conference (TREC-8). Gaithersburg: NIST, 1999:107-111.
[4] HACIOGLU K, WARD W. Question classification with support vector machines and error correcting codes[C]//Proceedings of the 2003 HLT-NAACL. Stroudsburg: Association for Computational Linguistics, 2003: 28-30.
[5] ZHANG D, LEE W S. Question classification using support vector machines[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003:26-32.
[6] LI X, ROTH D. Learning question classifiers[C]//Proceedings of the 19th International Conference on Computational Linguistics (COLING). Stroudsburg: Association for Computational Linguistics, 2002:556-562.
[7] METZLER D, CROFT W B. Analysis of statistical question classification for fact-based questions[J]. Journal of Information Retrieval, 2005,8(3):481-504.
[8] MENG Y, LI S, ZHAO T. A comparative study of four primary statistical models in Chinese parsing[J]. Journal of Chinese Information Processing, 2003,17(3):1-8. (孟遥,李生,赵铁军.四种基本统计句法分析模型在汉语句法分析中的性能比较[J].中文信息学报,2003,17(3):1-8.)
[9] NGUYEN M L, NGUYEN T T, SHIMAZU A. Subtree mining for question classification problem[C]//Proceedings of the 20th International Conference on Artificial Intelligence. Pittsburgh: Pennsylvania, 2007: 1695-1700.
[10] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003,3(1):993-1022.
[11] ZHANG Y, HUANG G, SU W. Chinese word sense disambiguation based on latent maximum entropy principle[J]. Journal of Chinese Information Processing, 2012,26(3):72-78.(张仰森,黄改娟,苏文杰.基于隐最大熵原理的汉语词义消歧方法[J].中文信息学报,2012,26(3):72-78.)
[12] LI F, LI F. An new approach measuring semantic similarity in Hownet 2000[J]. Journal of Chinese Information Processing, 2007,21(3):99-105.(李峰,李芳.中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007,21(3):99-105.)
[13] WANG B, HUANG Y, YANG W, et al. Short text classification based on strong feature thesaurus[J]. Journal of Zhejiang University Science C: Computer and Electronics, 2012,13(9):649-659.

[1]	YANG Fengrui, HUO Na, ZHANG Xuhong, WEI Wei. Topic-expanded emotional conversation generation based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(4): 1078-1083.
[2]	QIN Li, HAO Zhigang, LI Guoliang. Construction and correlation analysis of national food safety standard graph [J]. Journal of Computer Applications, 2021, 41(4): 1005-1011.
[3]	YANG Weiya, YU Zhengtao, GAO Shengxiang, SONG Ran. Chinese-Vietnamese news topic discovery method based on cross-language neural topic model [J]. Journal of Computer Applications, 2021, 41(10): 2879-2884.
[4]	ZHU Simiao, Wei Shiwei, WEI Siheng, YU Dunhui. Video recommendation algorithm based on danmaku sentiment analysis and topic model [J]. Journal of Computer Applications, 2021, 41(10): 2813-2819.
[5]	YIN Chunyong, ZHANG Sun. End-to-end adversarial variational Bayes method for short text sentiment classification [J]. Journal of Computer Applications, 2020, 40(9): 2536-2542.
[6]	TIAN Baojun, LIU Shuang, FANG Jiandong. Hybrid recommendation algorithm by fusion of topic information and convolution neural network [J]. Journal of Computer Applications, 2020, 40(7): 1901-1907.
[7]	YANG Guofeng, YANG Yong. Question classification of common crop disease question answering system based on BERT [J]. Journal of Computer Applications, 2020, 40(6): 1580-1586.
[8]	WANG Shuman, LI Aiping, DUAN Liguo, FU Jia, CHEN Yongle. Service discovery method for Internet of Things based on Biterm topic model [J]. Journal of Computer Applications, 2020, 40(2): 459-464.
[9]	YANG Fei, LUO Jianqiao, LI Bailin. Railway fastener classification model based on sLDA combined with global and local constraints [J]. Journal of Computer Applications, 2019, 39(3): 888-893.
[10]	XU Hongyan, WANG Dan, WANG Fuhai, WANG Rongbing. User relevance measure method combining latent Dirichlet allocation and meta-path analysis [J]. Journal of Computer Applications, 2019, 39(11): 3288-3292.
[11]	YU Hui, FENG Xupeng, LIU Lijun, HUANG Qingsong. Identification method of user's medical intention in chatting robot [J]. Journal of Computer Applications, 2018, 38(8): 2170-2174.
[12]	LU Qiang, LIU Xingyu. Semantic matching model of knowledge graph in question answering system based on transfer learning [J]. Journal of Computer Applications, 2018, 38(7): 1846-1852.
[13]	LIU Zhanghu, CHENG Chunling. Variance reduced stochastic variational inference algorithm for topic modeling of large-scale data [J]. Journal of Computer Applications, 2018, 38(6): 1675-1681.
[14]	XU Yinjie, SUN Chunhua, LIU Yezheng. Joint sentiment/topic model integrating user characteristics [J]. Journal of Computer Applications, 2018, 38(5): 1261-1266.
[15]	LI Yan, LIU Jiayong. User location prediction model based on author topic model and radiation model [J]. Journal of Computer Applications, 2018, 38(4): 939-944.

Short question classification based on semantic extensions

基于语义扩展的短问题分类

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics