Text classification model framework based on social annotation quality
LI Jin1,2,ZHANG Hua3,WU Hao-xiong3,XIANG Jun3,GU Xi-wu4
1. Information Management Department,Central China Normal University, Wuhan Hubei 430074, China
2. School of Information Engineering, Hubei University for Nationalities, Enshi Hubei 445000,China
3. School of Information Engineering, Hubei University for Nationalities, Enshi Hubei 445000, China
4. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan Hubei 430074, China
Abstract:Social annotation is a form of folksonomy, which allows Web users to categorize Web resource with text tags freely. It usually implicates fundamental and valuable semantic information of Web resources. Consequently, social annotation is helpful to improve the quality of information retrieval when applied to information retrieval system. This paper investigated and proposed an improved text classification algorithm based on social annotation. Because social annotation is a kind of folksonomy and social tags are usually generated arbitrarily without any control or expertise knowledge, there has been significant variance in the quality of social tags. Under this consideration, the paper firstly proposed a quantitative approach to measure the quality of social tags by utilizing the semantic similarity between Web pages and social tags. After that, the social tags with relatively low quality were filtered out based on the quality measurement and the remained social tags with high quality were applied to extend traditional vector space model. In the extended vector space model, a Web page was represented by a vector in which the components were the words in the Web page and tags tagged to the Web page. At last, the support vector machine algorithm was employed to perform the classification task. The experimental results show that the classification result can be improved after filtering out the social tags with low quality and embedding those high quality social tags into the traditional vector space model. Compared with other classification approaches, the classification result of F1 measurement has increased by 6.2% on average when using the proposed algorithm.
李劲 张华 吴浩雄 向军 辜希武. 基于社会标注质量的文本分类模型框架[J]. 计算机应用, 2012, 32(05): 1335-1339.
LI Jin ZHANG Hua WU Hao-xiong XIANG Jun GU Xi-wu. Text classification model framework based on social annotation quality. Journal of Computer Applications, 2012, 32(05): 1335-1339.