Journal of Computer Applications ›› 2013, Vol. 33 ›› Issue (08): 2370-2374.
• Typical applications • Previous Articles Next Articles
HE Wei,GUO Yunfei,MO Han,HU Hongchao
Received:
Revised:
Online:
Published:
Contact:
贺炜,郭云飞,莫涵,扈红超
通讯作者:
作者简介:
贺炜(1988-),男,山西吕梁人,硕士研究生,主要研究方向:网络安全中的特征匹配;郭云飞(1963-),男,河南郑州人,教授,博士生导师,主要研究方向:三网融合;莫涵(1988-),女,河南郑州人,硕士研究生,主要研究方向:可重构网络、组播传输;扈红超(1983-),男,河南郑州人,研究员,博士,主要研究方向:网络协议分析。
基金资助:
Abstract: Traditional Deterministic Finite Automata (DFA) based regular expression matching can only process one character per cycle, which is the speed bottleneck. A new algorithm named Multi-Character DFA (MC-DFA) was proposed for high throughput matching and precise positioning. It combined the one character transition in traditional DFA together to handle multi-character processing per cycle. A new transition matrix compress algorithm was also proposed to reduce the redundancy introduced by MC-DFA. The result demonstrates that MC-DFA can improve the throughput efficiently while requiring acceptable memory. For a set of 300 regexes, 8C-DFA obtains a throughput of 7.88Gb/s, memory usage less than 6MB and 19.24s preprocessing time, better than traditional methods.
Key words: regular expression, high throughput, multi-character, precise positioning, matrix compress
摘要: 基于确定性有限自动机(DFA)的传统正则表达式匹配方法存在单周期处理单字符的速度瓶颈。为提升处理速率,提出一种单周期处理多字符的匹配算法MC-DFA,该算法基于DFA实现,支持匹配位置的精确定位。MC-DFA将传统DFA中的单字符跳转合并为多字符跳转,实现了单周期处理多个输入字符。通过状态转移矩阵二阶压缩算法,MC-DFA分别对矩阵行内以及行间冗余进行消除,减少了内存使用。300条规则下,单周期处理8字符时,MC-DFA吞吐率能够达到7.88Gb/s,内存占用小于6MB,预处理时间为19.24s。实验结果表明,MC-DFA能够有效提升系统吞吐率,并且保证内存占用在可接受范围之内,性能优于现有正则表达式匹配算法。
关键词: 正则表达式, 高速, 多字符, 精确定位, 矩阵压缩
CLC Number:
TP393
HE Wei GUO Yunfei MO Han HU Hongchao. Multi-character DFA-based high speed regular expression matching algorithm[J]. Journal of Computer Applications, 2013, 33(08): 2370-2374.
贺炜 郭云飞 莫涵 扈红超. 基于多字符DFA的高速正则表达式匹配算法[J]. 计算机应用, 2013, 33(08): 2370-2374.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/
https://www.joca.cn/EN/Y2013/V33/I08/2370