计算机应用 ›› 2005, Vol. 25 ›› Issue (03): 664-665.DOI: 10.3724/SP.J.1087.2005.0664

• 人工智能 • 上一篇    下一篇

基于语义空间的支持向量机的文本过滤

沈丽虹1,周昌乐2   

  1. 1. 浙江大学人工智能研究所; 2.厦门大学人工智能研究所
  • 出版日期:2005-03-01 发布日期:2005-03-01
  • 基金资助:

    福建省科技计划重点资助项目(001J005)

Text filtering based on support vector machine of semantic space

SHEN Li-hong1,ZHOU Chang-le2   

  1. 1. Institute of Artificial Intelligence, Zhejiang University, Hangzhou Zhejiang 310027, China; 2. Institute of Artificial Intelligence, Xiamen University, Xiamen Fujian 361005, China
  • Online:2005-03-01 Published:2005-03-01

摘要: 传统的基于支持向量机的文本过滤,用向量空间模型来表示文本和用户模板,向量空间模型假设特征项之间是线性无关的,该假设引入了许多因具体用词变化不定而带来的词汇噪音信息,影响了基于支持向量机的文本过滤的过滤性能。提出基于语义空间的支持向量机的文本过滤,用语义来表示文本和用户模板。该方法主要通过奇异值分解提取文本的潜在语义空间,在语义空间上训练支持向量机得到用户模板和过滤阈值,文本流上的文本映射到语义空间上,在语义空间上计算用户模板和新文本的相似度。实验表明:该方法的过滤性能可以达到 98. 67%。

关键词: 文本过滤, 奇异值分解, 支持向量机, 语义空间

Abstract: Traditionally, text filtering based on support vector machine uses the vector space model to represent the text and user profile. Vector space model draws the noise into the system because it assumes that the word in the text is independent and it influences the performance of the filtering. The proposed method was based on vector support machine of semantic space in which text and user profile were represented by the semantic space. The proposed approach used the singular-value decomposition to derive a latent semantic space. User profile and filtering threshold could been got by training the support vector machine in the semantic space. And the similarity between the user profile and new text was computed by cosine measure, after the new text was mapped into the semantic space. Experimental results show that the filtering rate of our approach can get 98.67%.

Key words: text filtering, singular value decomposition, support vector machine, semantic space

中图分类号: