Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (9): 2636-2640.DOI: 10.11772/j.issn.1001-9081.2019020247

• Cyber security • Previous Articles     Next Articles

Privacy preserving Hamming distance computing problem of DNA sequences

MA Minyao<sup>1,2</sup>, XU Yi<sup>1,2</sup>, LIU Zhuo<sup>1,2</sup>   

  1. 1. School of Mathematics and Big Data, Guizhou Education University, Guiyang Guizhou 550018, China;
    2. Key Laboratory of Cyberspace Security, Guizhou Education University, Guiyang Guizhou 550018, China
  • Received:2019-02-25 Revised:2019-05-12 Online:2019-09-10 Published:2019-06-17
  • Supported by:

    This work is partially supported by the Science and Technology Foundation Project of Guizhou Province (QianKeHeJiChu[2016]1115, QianKeHeJiChu[2019]1249), the Youth Science and Technology Talent Project of Guizhou Provincial Education Department (QianJiaoHe-KY-Zi[2016]220, QianJiaoHe-KY-Zi[2017]210, QianJiaoHe-KY-Zi [2018]260).

隐私保护DNA序列汉明距离计算问题

马敏耀1,2, 徐艺1,2, 刘卓1,2   

  1. 1. 贵州师范学院 数学与大数据学院, 贵阳 550018;
    2. 贵州师范学院 网络空间安全重点实验室, 贵阳 550018
  • 通讯作者: 马敏耀
  • 作者简介:马敏耀(1979-),男,贵州威宁人,副教授,博士,CCF会员,主要研究方向:密码学、信息安全;徐艺(1986-),女,贵州石阡人,讲师,博士研究生,主要研究方向:密码学、信息安全;刘卓(1987-),女,河南驻马店人,讲师,博士研究生,主要研究方向:密码学、信息安全。
  • 基金资助:

    贵州省科学技术基金计划项目(黔科合基础[2016]1115,黔科合基础[2019]1249);贵州省教育厅青年科技人才成长项目(黔教合KY字[2016]220,黔教合KY字[2017]210,黔教合KY字[2018]260)。

Abstract:

DNA sequences carry important biological information of human bodies, how to compare multiple DNA sequences correctly with privacy preserving is an important problem. To a certain extent, Hamming distance characterizes the similarity between two DNA sequences. Therefore, the privacy preserving Hamming distance computing problem of DNA sequences was researched. First of all, the "0-1 Coding" of the DNA sequence was defined, which codes the DNA sequence with length n to a 0-1 string with length 4n, proving that the Hamming distance of two DNA sequences is a half of the Hamming distance of their "0-1 Coding" strings. Then, with the help of this conclusion, with the Goldwasser-Micali (GM) encryption algorithm taken as main encryption tool, a secure two-party computation protocol for computing the Hamming distance of two DNA sequences was proposed. It was shown that the protocol is both secure and correct under semi-honest attacker model. The proof of security based on a simulator was given. After then, the efficiency of the protocol was analyzed.

Key words: Hamming distance, DNA sequence, privacy preserving, secure multi-party computation, homomorphic encryption

摘要:

DNA序列承载着人体重要的生物学信息,如何在保护隐私的情况下正确地对不同的DNA序列进行比对,成为亟待研究的科学问题。汉明距离在一定程度上刻画了两个DNA序列的相似程度,在保护隐私的情况下,研究DNA序列的汉明距离计算问题。首先定义了DNA序列的0-1编码规则,该规则将长度为n的DNA序列编码成长度为4n的0-1串,证明了两个DNA序列的汉明距离等于它们的0-1编码串的汉明距离的一半。以此结论为基础,以GM加密算法为主要密码学工具,构造了计算DNA序列汉明距离的一个安全两方计算协议。在半诚实攻击者模型下,证明了协议的正确性,给出了基于模拟器的安全性证明,并对协议的效率进行了分析。

关键词: 汉明距离, DNA序列, 隐私保护, 安全多方计算, 同态加密

CLC Number: