计算机应用

• 典型应用 • 上一篇    下一篇

中文垃圾邮件的索引分词法的研究与设计

强永妍 杨庚   

  1. 南京邮电大学 南京邮电大学
  • 收稿日期:2007-03-26 修回日期:2007-05-21 发布日期:2007-09-01 出版日期:2007-09-01
  • 通讯作者: 强永妍

Research and design of Chinese-spam's phrase segmentation based on indexing

Yong-Yan QIANG Geng Yang   

  • Received:2007-03-26 Revised:2007-05-21 Online:2007-09-01 Published:2007-09-01
  • Contact: Yong-Yan QIANG

摘要: 为了提高中文垃圾邮件预处理阶段的性能,加快查找分词的速度,基于哈希函数的算法思想创造性的构造了索引词典,设计了一种针对中文垃圾邮件的中文索引分词方法。通过实验,表明该方法提高了传统机械分词法的效率和准确率,改善了邮件预处理阶段的性能,并且可以广泛地应用于中文分词领域。

关键词: 反垃圾邮件, 中文分词, 哈希函数

Abstract: To improve the preprocessing performance for anti-spam and to search for phrases more efficiently, this paper creatively constructed an indexing dictionary based on hash algorithm, and designed a method of Chinese phrase segmentation based on this indexing dictionary aiming at anti-Chinese-spam. Through the study of the experimental data, this method is proved to be more efficient and accurate compared with the traditional mechanical classification, and it does improve the preprocessing performance and can be widely utilized in the field of Chinese phrase segmentation.

Key words: anti-spam, Chinese phrase segmentation, hash algorithm