Abstract:Latent Semantic Indexing (LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, feature subspace selected by LSI is probably not the most appropriate for text classification, since the method orders extracted features according to their variance without considering the classification capability. The high generalization ability of Support Vector Machine (SVM) makes it especially suitable for the classification of high-dimension data such as term-document. Thus, a feature extraction method based on SVM was proposed to select the LSI features fit for classification. Making use of the high generalization ability of SVM, contribution value of the reverse side of the square decomposition of the k-th feature was estimated by each classifier parameter trained under the rules. The experimental results indicate that the method improves classification performance with more compact representation when less time of training and testing is required than that of LSI.
李旻松 段琢华. 基于支持向量机的隐含语意特征选择方法[J]. 计算机应用, 2011, 31(09): 2429-2431.
LI Min-song DUAN Zhuo-hua. Latent semantic features selection based on support vector machine. Journal of Computer Applications, 2011, 31(09): 2429-2431.