计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 806-811.DOI: 10.11772/j.issn.1001-9081.2017082068

• 计算机软件技术 • 上一篇    下一篇

基于多特征的静态软件胎记提取算法

王曙燕, 赵鹏飞, 孙家泽   

  1. 西安邮电大学 计算机学院, 西安 710121
  • 收稿日期:2017-08-24 修回日期:2017-10-10 出版日期:2018-03-10 发布日期:2018-03-07
  • 通讯作者: 赵鹏飞
  • 作者简介:王曙燕(1964-),女,河南南阳人,教授,博士,CCF会员,主要研究方向:软件测试、数据挖掘;赵鹏飞(1993-),男,陕西西安人,硕士研究生,主要研究方向:计算机软件与理论;孙家泽(1980-),男,河南南阳人,副教授,博士研究生,CCF会员,主要研究方向:智能优化算法。
  • 基金资助:
    陕西省工业攻关项目(2017GY-092)。

Software birthmark extraction algorithm based on multiple features

WANG Shuyan, ZHAO Pengfei, SUN Jiaze   

  1. School of Computer Science & Technology, Xi'an University of Posts & Telecommunications, Xi'an Shaanxi 710121, China
  • Received:2017-08-24 Revised:2017-10-10 Online:2018-03-10 Published:2018-03-07
  • Supported by:
    This work is partially supported by Science and Technology Department of Shaanxi Province (2017GY-092).

摘要: 针对使用现有软件胎记进行程序抄袭检测结果不准确的问题,提出一种新的静态软件胎记提取算法。该算法产生的程序胎记由软件的两个方面特征综合生成。算法对源程序和可疑程序进行预处理得到程序元信息,然后通过元信息获取程序的应用程序接口(API)调用集合和指令序列作为两个特征,综合这两项特征生成软件胎记;接着,计算源程序与可疑程序的软件胎记之间的相似度,通过相似度判定两程序之间是否存在抄袭行为。实验验证了该算法得到的软件胎记具有可信性和弹性,与传统的k-gram软件胎记相比更具有弹性。

关键词: 静态软件胎记, 抄袭检测, 应用程序接口调用集合, 指令序列, 特征综合

Abstract: Concerning the low accuracy of existing software birthmark extraction algorithms in detecting code theft problem, a new static software birthmark extraction algorithm was proposed. The birthmark generated by this algorithm covered two kinds of software features. The source program and the suspicious program were preprocessed to get the program meta data, which was used to generate Application Programming Interface (API) call set and instruction sequence as two features. These two features were synthesized to generate software birthmarks. Finally, the similarity of source program and suspicious program was calculated to determine whether there was code theft between the two programs. The experimental result verifies that the birthmark combined by API call set and instruction sequence has credibility and resilience, and has stronger resilience compared with k-gram birthmark.

Key words: static software birthmark, software theft detection, Application Programming Interface (API) call set, instruction sequence, feature synthesis

中图分类号: