Multi-character DFA-based high speed regular expression matching algorithm

Journal of Computer Applications ›› 2013, Vol. 33 ›› Issue (08): 2370-2374.

• Typical applications • Previous Articles Next Articles

Multi-character DFA-based high speed regular expression matching algorithm

HE Wei,GUO Yunfei,MO Han,HU Hongchao

China National Digital Switching System Engineering and Technological R&D Center, Zhengzhou Henan 450002, China

Received:2013-02-20 Revised:2013-03-12 Online:2013-09-11 Published:2013-08-01
Contact: HE Wei

基于多字符DFA的高速正则表达式匹配算法

贺炜,郭云飞,莫涵,扈红超

国家数字交换系统工程技术研究中心，郑州 450002

通讯作者: 贺炜
作者简介:
贺炜(1988-)，男，山西吕梁人，硕士研究生，主要研究方向：网络安全中的特征匹配;
郭云飞(1963-)，男，河南郑州人，教授，博士生导师，主要研究方向：三网融合;
莫涵(1988-)，女，河南郑州人，硕士研究生，主要研究方向：可重构网络、组播传输;
扈红超(1983-)，男，河南郑州人，研究员，博士，主要研究方向：网络协议分析。
基金资助:
国家科技支撑计划项目;国家863计划项目

Abstract

Abstract: Traditional Deterministic Finite Automata (DFA) based regular expression matching can only process one character per cycle, which is the speed bottleneck. A new algorithm named Multi-Character DFA (MC-DFA) was proposed for high throughput matching and precise positioning. It combined the one character transition in traditional DFA together to handle multi-character processing per cycle. A new transition matrix compress algorithm was also proposed to reduce the redundancy introduced by MC-DFA. The result demonstrates that MC-DFA can improve the throughput efficiently while requiring acceptable memory. For a set of 300 regexes, 8C-DFA obtains a throughput of 7.88Gb/s, memory usage less than 6MB and 19.24s preprocessing time, better than traditional methods.

Key words: regular expression, high throughput, multi-character, precise positioning, matrix compress

摘要： 基于确定性有限自动机(DFA)的传统正则表达式匹配方法存在单周期处理单字符的速度瓶颈。为提升处理速率，提出一种单周期处理多字符的匹配算法MC-DFA，该算法基于DFA实现，支持匹配位置的精确定位。MC-DFA将传统DFA中的单字符跳转合并为多字符跳转，实现了单周期处理多个输入字符。通过状态转移矩阵二阶压缩算法，MC-DFA分别对矩阵行内以及行间冗余进行消除，减少了内存使用。300条规则下，单周期处理8字符时，MC-DFA吞吐率能够达到7.88Gb/s，内存占用小于6MB，预处理时间为19.24s。实验结果表明，MC-DFA能够有效提升系统吞吐率，并且保证内存占用在可接受范围之内，性能优于现有正则表达式匹配算法。

关键词: 正则表达式, 高速, 多字符, 精确定位, 矩阵压缩

CLC Number:

TP393

HE Wei GUO Yunfei MO Han HU Hongchao. Multi-character DFA-based high speed regular expression matching algorithm[J]. Journal of Computer Applications, 2013, 33(08): 2370-2374.

贺炜郭云飞莫涵扈红超. 基于多字符DFA的高速正则表达式匹配算法[J]. 计算机应用, 2013, 33(08): 2370-2374.

[1]	CHANG Zheng, LYU Yong. Mass data clean system based on regular expression [J]. Journal of Computer Applications, 2019, 39(10): 2942-2947.
[2]	MAI Taotao, PAN Xiaozhong, WANG Yaqi, SU Yang. Predefined-class based algorithm for compact regular expression matching [J]. Journal of Computer Applications, 2017, 37(2): 397-401.
[3]	YU Yihan, FU Yu, WU Xiaoping. Audit scheme for intranet behavior based on improved regular expression rule grouping [J]. Journal of Computer Applications, 2016, 36(8): 2241-2245.
[4]	ZHUO Yannan, LIU Qiang, JIANG Lei, DAI Qiong. High-performance regular expressions matching algorithm based on improved FPGA circuit [J]. Journal of Computer Applications, 2016, 36(4): 927-930.
[5]	LI Juan. Application of formal language in automatic marking system of webpage making operation questions [J]. Journal of Computer Applications, 2013, 33(03): 882-885.
[6]	TANG Qiu JIANG Lei TAN Jian-long LIU Jin-gang. Survey on FPGA-based regular expressions matching algorithms [J]. Journal of Computer Applications, 2011, 31(11): 2943-2946.
[7]	ZHOU Tian-yang ZHU Jun-hu WANG Qing-xian. Hidden process detection method based on multi-characteristics matching [J]. Journal of Computer Applications, 2011, 31(09): 2362-2366.
[8]	. Compression algorithm of state transition table based on sparse matrix [J]. Journal of Computer Applications, 2010, 30(8): 2157-2160.
[9]	. Survey on storage-oriented regular expressions matching algorithms [J]. Journal of Computer Applications, 2009, 29(12): 3171-3173.
[10]	Jun PENG Xing-yuan CHEN Bei WU Xiang-dong DAI Yong-liang WANG. Policy monitor model based on Mealy automata [J]. Journal of Computer Applications, 2008, 28(11): 2832-2834.
[11]	. Deep packet inspection algorithm based on regular expressions [J]. Journal of Computer Applications, 2007, 27(9): 2184-2186.
[12]	. Acquiring testing factors in component software integration based on DFA model [J]. Journal of Computer Applications, 2007, 27(4): 857-859.
[13]	. Compression and application of transition probability matrix in Markov prediction model [J]. Journal of Computer Applications, 2007, 27(11): 2746-2749.

Multi-character DFA-based high speed regular expression matching algorithm

基于多字符DFA的高速正则表达式匹配算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 13

Recommended Articles

Metrics