计算机应用 ›› 2014, Vol. 34 ›› Issue (7): 1922-1928.DOI: 10.11772/j.issn.1001-9081.2014.07.1922

• 计算机安全 • 上一篇    下一篇

基于指纹和Bloom滤波器的数据泄漏检测方案

黄伟文1,罗佳2   

  1. 1. 宁波职业技术学院 信息资源中心,浙江 宁波 315800
    2. 浙江大学 计算机科学与技术学院, 杭州 310058
  • 收稿日期:2014-01-20 修回日期:2014-03-06 出版日期:2014-07-01 发布日期:2014-08-01
  • 通讯作者: 黄伟文
  • 作者简介:黄伟文(1968-),男,浙江宁波人,讲师,硕士,主要研究方向:算法设计、信息检索;罗佳(1983-),女,浙江杭州人,博士研究生,主要研究方向:分布式计算、数据挖掘。
  • 基金资助:

    国家自然科学基金面上项目

Data-leakage detection scheme based on fingerprint and Bloom filters

HUANG Weiwen1,LUO Jia2   

  1. 1. Information Resource Center, Ningbo Polytechnic, Ningbo Zhejiang 315800, China;
    2. School of Computer Science and Technology, Zhejiang University, Hangzhou Zhejiang 310058, China
  • Received:2014-01-20 Revised:2014-03-06 Online:2014-07-01 Published:2014-08-01
  • Contact: HUANG Weiwen

摘要:

针对当前的数据防泄漏方案主要依赖在外传数据中进行关键词通用搜索,导致数据流控制不够精细、虚警率较高的问题。首先设计了一种基于白名单的数据防泄漏(DLP)架构,在此基础上,提出了一种基于文件指纹和Bloom滤波器的数据泄露检测算法。该算法通过使用动态规划来计算最优检测位置,最大限度地降低了内存开销,并支持高速部署。仿真实验结果表明,所提算法可以用较低的代价,实现大量数据的在线指纹检测。例如,对1TB的文件,该方案只需340MB内存就可实现1000字节的最差检测延时期望(泄露的长度)。

Abstract:

Aiming at the problems that the existing Data-Leakage Prevention (DLP) solutions are based on generic search for keywords in outgoing data, and hence severely lack the ability to control data flow at a fine granularity with low false probability. In this paper, an DLP architecture based on the white-listing was firstly designed, which used a white-listing for providing the strong security of data transmission. On this basis, a data leakage detection algorithm by combining document fingerprinting with Bloom filters was proposed. This algorithm computed the optimal locations by using dynamic programming to minimize the memory overhead and enable high-speed implementation. The simulation results show that the proposed algorithm for checking the fingerprints for a large amount of documents at very low cost. For example, for 1TB of documents, the proposed solution only requires 340MB memory to achieve worst case expected detection lag (i.e. leakage length) of 1000Bytes.

中图分类号: