计算机应用 ›› 2014, Vol. 34 ›› Issue (1): 208-212.DOI: 10.11772/j.issn.1001-9081.2014.01.0208

• 人工智能 • 上一篇    下一篇

基于Sunday算法的改良单模式匹配算法

朱永强1,2,秦志光2,江雪1,2   

  1. 1. 成都网安科技发展有限公司,成都 610092
    2. 电子科技大学 计算机科学与工程学院, 成都 611731
  • 收稿日期:2013-07-25 修回日期:2013-09-16 出版日期:2014-01-01 发布日期:2014-02-14
  • 通讯作者: 朱永强
  • 作者简介:朱永强(1987-),男,吉林四平人,硕士研究生,主要研究方向:中文信息处理、网络安全;秦志光(1956-),男,四川隆昌人,教授,博士生导师,CCF高级会员,主要研究方向:信息安全、网络安全;江雪(1983-),男,四川资阳人,硕士研究生,主要研究方向:网络安全。
  • 基金资助:

    科技部科技型中小企业技术创新基金资助项目

Improved single pattern matching algorithm based on Sunday algorithm

ZHU Yongqiang1,2,QIN Zhiguang2,JIANG Xue1,2   

  1. 1. Chengdu Wangan Technology Company Limited, Chengdu Sichuan 610092, China
    2. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 611731, China;
  • Received:2013-07-25 Revised:2013-09-16 Online:2014-01-01 Published:2014-02-14
  • Contact: ZHU Yongqiang

摘要: Unicode编码的中文环境下应用Sunday算法时,如直接使用中文字符生成失效跳转表,将造成空间膨胀,而将中文字符拆分为两个字节进行处理,虽可以降低空间消耗,但匹配的执行速度又会受影响。针对Sunday算法应用于Unicode编码的字符拆分环境时所产生的时间性能降低问题,结合Unicode中文单元的内部关联性,优化了原Sunday算法的辅助跳转表与匹配规则,从而在解决Unicode下算法空间膨胀问题的同时,提升了Sunday算法在此环境下的时间性能,并利用模拟实验对改良算法的时间与空间性能进行了实验证明。

关键词: 模式匹配, Unicode编码, KMP算法, B-M算法, Sunday算法

Abstract: When Sunday algorithm is applied into the Chinese version of Unicode, there are some problems. On one hand, it causes the expansion of space if using Chinese characters directly to generate a failed jump table. On the other hand, it can reduce the space consumption at the cost of matching speed when Chinese characters are split into two bytes. Concerning the degradation of time performance produced by applying Sunday algorithm to the character-splitting environment of Chinese version of Unicode, in combination with internal relevance of Chinese unit in Unicode, the improved algorithm in this paper optimized the auxiliary jump table and matching rules in original Sunday algorithm in the character-splitting environment. Consequently, the proposed algorithm not only solves the problem of space expansion, but also improves time performance of Sunday algorithm in this environment. Finally, the improved time and space performance of the algorithm gets proved via simulation.

Key words: pattern matching, Unicode encoding, Knuth-Morris-Pratt (KMP) algorithm, Boyer-Moore (BM) algorithm, Sunday algorithm

中图分类号: