计算机应用 ›› 2017, Vol. 37 ›› Issue (2): 397-401.DOI: 10.11772/j.issn.1001-9081.2017.02.0397

• 第十届中国可信计算与信息安全学术会议 • 上一篇    下一篇

基于预定义类的紧凑型正则表达式匹配算法

麦涛涛1,2, 潘晓中1,2, 王亚奇1,2, 苏阳1,2   

  1. 1. 武警工程大学 电子技术系, 西安 710086;
    2. 武警部队网络与信息安全保密重点实验室, 西安 710086
  • 收稿日期:2016-08-15 修回日期:2016-09-12 出版日期:2017-02-10 发布日期:2017-02-11
  • 通讯作者: 麦涛涛,wj_maitao@126.com
  • 作者简介:麦涛涛(1992-),男,河南洛阳人,硕士研究生,主要研究方向:网络安全、嵌入式系统;潘晓中(1964-),男,陕西西安人,教授,硕士,主要研究方向:网络安全、嵌入式系统;王亚奇(1982-),男,河南周口人,讲师,博士,主要研究方向:网络空间安全;苏阳(1986-),男,宁夏银川人,助教,硕士,主要研究方向:嵌入式系统。
  • 基金资助:
    国家自然科学基金资助项目(61402531);陕西省自然科学基金资助项目(2014JQ8307,2015JQ6231)。

Predefined-class based algorithm for compact regular expression matching

MAI Taotao1,2, PAN Xiaozhong1,2, WANG Yaqi1,2, SU Yang1,2   

  1. 1. Department of Electronic Technology, Engineering College of the Chinese Armed Police Force, Xi'an Shaanxi 710086, China;
    2. Key Laboratory of Network and Information Security of the Chinese Armed Police Force, Xi'an Shaanxi 710086, China
  • Received:2016-08-15 Revised:2016-09-12 Online:2017-02-10 Published:2017-02-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61402531), the Natural Science Foundation of Shaanxi Province (2014JQ8307, 2015JQ6231).

摘要: 针对目前硬件正则表达式匹配算法在存储空间以及吞吐量等方面面临的挑战,结合扩展有限自动机(XFA)正则表达式匹配算法,提出了一种预定义类的压缩自动机匹配算法(Pre-Class CFA)。通过预定义类,算法既可以实现正则表达式中类字符匹配,又能够通过优先级的设定匹配特殊字符集,并在XFA消除确定性有限状态机(DFA)状态爆炸问题的基础上进一步压缩了迁移边数目;同时算法根据现场可编程门阵列(FPGA)和迁移边的特征,设计了一种基于并联只读存储器(ROM)结构的迁移边存取方法,可以实现同一状态多条迁移边的并行读取和匹配。在中低性能FPGA平台ALTERA DE2-70上对算法进行测试,实验中系统吞吐量为1.3 Gb/s,可实现千兆网络下的入侵检测和垃圾过滤。

关键词: 正则表达式匹配, 扩展有限自动机, 现场可编程门阵列

Abstract: For hardware regular expression matching algorithms are faced with the challenges in memory space and throughput, a Predefined-class Compact Finite Automaton (Pre-class CFA) matching algorithm based on Extended Finite Automaton (XFA) regular expression matching algorithm was proposed. By using the predefined classes, the algorithm not only can achieve class character matching in regular expression, but also can match the special character set by priority setting; besides, the migrating edges were reduced when eliminating the Deterministic Finite Automaton (DFA) state space exploration problem. At the same time, based on the characteristics of Field-Programmable Gate Array (FPGA) and migration edge, a migration edge access method based on parallel Read-Only Memory (ROM) was designed to realize the parallel reading and matching of the same state with mutiple migration edges. The algorithm was tested on ALTERA DE2-70 FPGA platform, which is a low-performance platform. The throughput of the experimental system is 1.30 Gb/s, which can realize intrusion detection and garbage filtering under gigabit network.

Key words: regular expression matching, Extended Finite Automaton (XFA), Field-Programmable Gate Array (FPGA)

中图分类号: