Journal of Computer Applications ›› 2014, Vol. 34 ›› Issue (3): 865-868.DOI: 10.11772/j.issn.1001-9081.2014.03.0865

Previous Articles     Next Articles

Improvement of Boyer-Moore string matching algorithm

HAN Guanghui1,ZENG Cheng2   

  1. 1. Department of Information Engineering, Wuhan Business University, Wuhan Hubei 430056, China;
    2. College of Mathematics and Computer Science, Hubei University, Wuhan Hubei 430062, China
  • Received:2013-09-04 Revised:2013-11-12 Online:2014-03-01 Published:2014-04-01
  • Contact: HAN Guanghui
  • Supported by:

    National Natural Science Foundation

Boyer-Moore串匹配算法的改进

韩光辉1,曾诚2   

  1. 1. 武汉商学院 信息工程系,武汉430056;
    2. 湖北大学 数学与计算机科学学院,武汉430062
  • 通讯作者: 韩光辉
  • 作者简介:韩光辉(1956-),男,湖北武汉人,副教授,硕士,主要研究方向:计算理论、软件形式化;曾诚(1976-),男,湖北武汉人,副教授,博士,主要研究方向:网络化软件工程。
  • 基金资助:

    国家自然科学基金资助项目;湖北省教育厅科学技术研究重点项目

Abstract:

A new variant of Boyer-Moore (BM) algorithm was proposed on the basis of analyzing BM algorithm. The basic idea of the improvement was to form match heuristic (i.e. good-suffix rule) for the expanded pattern Pa in preprocessing phase, where P was the pattern and a was an arbitrary character that belonged to the alphabet, so both to increase length of the matched suffix and to imply Sunday's occurrence heuristic (i.e. bad-character rule), therefore a larger shift distance of scanning window was obtained. The theoretical analyses show that the improvement has linear time complexity even in the worst case and sublinear behavior on the average case, and space complexity of O(m(σ+1)). The experimental results also show that implementation performance of the improved one is significantly improved, especially in the case of small alphabet.

Key words: string matching, Boyer-Moore (BM) algorithm, complexity analysis

摘要:

在分析Boyer-Moore (BM)算法的基础上,提出了BM算法的一个新的变形。其基本思想是在算法的预处理阶段,对扩展模式串Pa建立好后缀规则,其中:P是模式串,a是字母表中的任一字符,既加大了已匹配后缀的长度,同时隐含了Sunday算法的坏字符规则,从而获得更大的窗口跳跃距离。理论分析证明,该算法具有线性最差时间复杂度和亚线性平均时间复杂度,空间复杂度为O(m(σ+1))。实验结果表明,该算法的实际性能与BM算法相比有明显改善,尤其适合小字母表的情形。

关键词: 串匹配, BM算法, 复杂度分析

CLC Number: