Abstract:Since term frequency is not considered by traditional information gain in Bag-of-Words (BoW) model, a new visual dictionary constructing method based on improved information gain was proposed to improve the human actions recognition accuracy. Firstly, spatio-temporal interest points of human action video were extracted by using 3D Harris, then clustered by K-means to construct initial visual dictionary. Secondly, concentration of term frequency within cluster and dispersion of term frequency between clusters were introduced to improve the information gain, which was used to compute the initial dictionary; then the visual words with larger information gain were selected to build a new visual dictionary. Finally, the human actions were recognized based on Support Vector Machine (SVM) using the improved information gain. The proposed method was verified by human actions recognition of KTH and Weizmann databases. Compared with the traditional information gain, the actions recognition accuracy was increased by 1.67% and 3.45% with the dictionary constructed by improved information gain. Experimental results show that the visual dictionary of human actions based on improved information gain increases the accuracy of human actions recognition by selecting more discriminate visual words.
吴峰, 王颖. 基于改进信息增益的人体动作识别视觉词典建立[J]. 计算机应用, 2017, 37(8): 2240-2243.
WU Feng, WANG Ying. Visual dictionary construction for human actions recognition based on improved information gain. Journal of Computer Applications, 2017, 37(8): 2240-2243.
[1] 石祥滨,刘拴朋,张德园.基于关键帧的人体动作识别方法[J]. 系统仿真学报,2015,27(10):2401-2408. (SHI X B, LIU S P, ZHANG D Y. Human action recognition method based on key frames[J]. Journal of System Simulation, 2015, 27(10):2401-2408.) [2] KHAN R, BARAT C, MUSELET D, et al. Spatial orientations of visual word pairs to improve bag-of-visual-words model[C]//BMVC 2012:Procedings of the 2012 British Machine Vision Conference. Durham, UK:BMVA Press, 2012:1-11. [3] FARAKI M, PALHANG M, SANDERSON C. Log-Euclidean bag of words for human action recognition[J]. IET Computer Vision, 2016, 9(3):331-339. [4] LAZEBNIK S, SCHMID C, PONCE J. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories[C]//CVPR' 06:Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2006, 2:2169-2178. [5] LIU J, SHAH M. Learning human actions via information maximization[C]//CVPR' 08:Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society. Washington, DC:IEEE Computer Society, 2008:2971-2978. [6] LI Z, LU W, SUN Z, et al. A parallel feature selection method study for text classification[J]. Neural Computing & Applications, 2016, 27:1-12. [7] 贾隆嘉,孙铁利,杨凤芹,等.基于类空间密度的文本分类特征加权算法[J]. 吉林大学学报(信息科学版),2017,35(1):92-97. (JIA L J, SUN T L, YANG F Q, et al. Class space density based weighting scheme for automated text categorization[J]. Journal of Jilin University (Information Science Edition), 2017, 35(1):92-97.) [8] UYSAL A K. An improved global feature selection scheme for text classification[J]. Expert Systems with Applications, 2016, 43(C):82-92. [9] KIM S, KWEON I S, LEE C W. Visual categorization robust to large intra-class variations using entropy-guided codebook[C]//Proceedings of the 2007 IEEE International Conference on Robotics and Automation. Piscataway, NJ:IEEE, 2007:3793-3798. DOI:10.1109/ROBOT.2007.364060 https://doi.org/10.1109/ROBOT.2007.364060 [10] YANG J, JIANG Y-G, HAUPTMANN A G, et al. Evaluating bag-of-visual-words representations in scene classification[C]//MIR' 07:Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval. New York:ACM, 2007:197-206. doi>10.1145/1290082.1290111 [11] LAPTEV I, LINDEBERG T. On space-time interest points[J]. International Journal of Computer Vision, 2005, 64(2/3):107-123. DOI:10.1007/s11263-005-1838-7 [12] 李学明,李海瑞,薛亮,等.基于信息增益与信息熵的TFIDF算法[J].计算机工程,2012,38(8):37-40. (LI X M, LI H R, XUE L, et al. TFIDF algorithm based on information gain and information entropy[J]. Computer Engineering, 2012, 38(8):37-40.) [13] KLÄSER A, MARSZALEK M, SCHMID C. A spatio-temporal descriptor based on 3D-gradients[C]//BMVC 2008:Procedings of the 2008 British Machine Vision Conference. Durham, UK:BMVA Press, 2008:995-1004. DOI:10.5244/C.22.99 [14] LAPTEV I, MARSZALEK M, SCHMID C, et al. Learning realistic human actions from movies[C]//CVPR' 08:Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2008:1-8. DOI:10.1109/CVPR.2008.4587756 [15] LERTNIPHONPHAN K, ARAMVITH S, CHALIDABHONGSE T H. Human action recognition using direction histograms of optical flow[C]//ISCIT 2011:Proceedings of the 201111th International Symposium on Communications and Information Technologies. Piscataway, NJ:IEEE, 2011:574-579.