计算机应用 ›› 2020, Vol. 40 ›› Issue (1): 136-142.DOI: 10.11772/j.issn.1001-9081.2019061080

• 网络空间安全 • 上一篇    下一篇

面向用户隐私保护的高效基因比对方案

李功丽1,2, 李钰1,2, 张恩1,2, 尹天宇1,2   

  1. 1. 河南师范大学 计算机与信息工程学院, 河南 新乡 453007;
    2. "智慧商务与物联网技术" 河南省工程实验室(河南师范大学), 河南 新乡 453007
  • 收稿日期:2019-06-24 修回日期:2019-08-15 出版日期:2020-01-10 发布日期:2019-10-08
  • 通讯作者: 李功丽
  • 作者简介:李功丽(1981-),女,河南信阳人,讲师,博士,主要研究方向:信息安全;李钰(1998-),女,河南安阳人,主要研究方向:信息安全、密码学;张恩(1974-),男,河南新乡人,副教授,博士,CCF会员,主要研究方向:信息安全、密码学;尹天宇(1999-),男,河南长垣人,主要研究方向:安全多方计算。
  • 基金资助:
    国家自然科学基金资助项目(U1604156,61772176,61602158);河南省科技攻关计划项目(172102210045,192102210131)。

Efficient genetic comparison scheme for user privacy protection

LI Gongli1,2, LI Yu1,2, ZHANG En1,2, YIN Tianyu1,2   

  1. 1. College of Computer and Information Engineering, Henan Normal University, Xinxiang Henan 453007, China;
    2. Engineering Laboratory of Intelligence Business and Internet of Things of Henan Province(Henan Normal University), Xinxiang Henan 453007, China
  • Received:2019-06-24 Revised:2019-08-15 Online:2020-01-10 Published:2019-10-08
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (U1604156, 61772176, 61602158), the Science and Technology Research Project of Henan Province (172102210045, 192102210131).

摘要: 针对当前的基因序列比对协议普遍要求一个可信赖的第三方,可能因此造成大范围的隐私数据泄漏的问题,提出了一种基于线性扫描的基因比对方案。首先对两方的基因序列进行基于混淆电路(GC)的编码,然后线性扫描整个基因组数据库并用混淆电路实现客户的基因序列与库中所有基因序列的比对。上述方案可以在保护双方用户隐私的前提下,实现基因比对。不过该方案需要扫描整个基因组数据库,时间复杂度为On),在基因组数据库较大时效率较低。为了提高基因比对的效率,进一步提出了基于不经意随机存取(ORAM)的基因比对方案,先将基因数据存储在ORAM上,然后只需把目标路径上的数据项取出并用混淆电路进行基因比对。该方案的比对次数和数据库的大小呈亚线性关系,时间复杂度为O(log n)。实验结果表明,基于ORAM的基因比对方案在实现隐私保护的同时,把比对次数由On)减小到了O(log n),明显降低了比对操作的时间复杂度,可以用来进行疾病诊断,尤其适用于基因组数据库较大的场景。

关键词: 基因比对, 相似度计算, 隐私保护, 不经意随机存取, 混淆电路

Abstract: Concerning the problem that current genetic comparison protocols generally require a trusted third party, which may result in the leakage of a wide range of private data, a genetic comparison scheme based on linear scan was proposed. The gene sequences of two parties were first encoded based on Garbled Circuit (GC), and then the genome database was linearly scanned and the garbled circuit was used to compare gene sequence of user with all gene sequences in database. The above scheme can achieve genetic comparison under the premise of protecting user privacy of both parties. However, the scheme needs to scan whole database with time complexity of O(n), and is inefficient when the genome database is large. In order to improve the efficiency of genetic comparison, a genetic comparison scheme based on Oblivious Random Access Memory (ORAM) was further proposed, in which genetic data was stored at ORAM first, then only the data blocks on target path were picked out to perform genetic comparison by using garbled circuit. This scheme has the number of comparisons sub-linear to the size of database and time complexity of O (log n). The experimental results show that the genetic comparison scheme based on ORAM reduces the number of comparisons from O(n) to O(log n) while realizing privacy protection, significantly decreases the time complexity of comparison operation. It can be used for disease diagnosis, especially in the case with large genome database.

Key words: genetic comparison, similarity calculation, privacy protection, Oblivious Random Access Memory (ORAM), Garbled Circuit (GC)

中图分类号: