• •    

面向云环境密文排序检索的字典划分向量空间模型

陆佳行,戴华,刘源龙,周倩,杨庚   

  1. 南京邮电大学
  • 收稿日期:2022-08-01 修回日期:2022-08-25 发布日期:2022-09-23
  • 通讯作者: 戴华
  • 基金资助:
    面向外包云环境的安全排序检索关键技术研究

Dictionary Partition Vector Space Model for Encrypted Ranked Search in the Cloud

  • Received:2022-08-01 Revised:2022-08-25 Online:2022-09-23
  • Contact: dai hua.

摘要: 针对传统向量空间模型(TVSM)生成的向量维度高,计算文档与检索关键词相关度的向量点积运算耗时高的问题,提出了一种面向云环境密文排序检索的字典划分向量空间模型(DPVSM)。首先给出DPVSM的具体内涵定义,并证明了DPVSM中检索关键词与文档的相关度得分与TVSM中相关度得分计算结果完全相等;然后,采用等长字典划分方法,提出加密向量生成算法和文档与检索关键词相关度得分计算算法。实验结果表明,DPVSM文档向量的空间开销约为TVSM的50%且文档数量越多开销降低越多;此外,检索向量的空间开销以及相关度得分计算的耗时也远低于TVSM。显然,DPVSM在生成向量的空间代价和相关度得分计算的时间效率上均优于传统向量空间模型。

关键词: 云计算, 向量空间模型, 可搜索加密, 字典划分, 多关键词检索

Abstract: Aiming at the problems that the vector dimension generated by Traditional Vector Space Model (TVSM) is high, and the vector dot product operation to calculate the correlation between the documents and the queried keywords is time-consuming, a Dictionary Partition Vector Space Model (DPVSM) for encrypted ranked search in the cloud was proposed. First, the specific connotation definition of the DPVSM was given, and it was proved that the relevance score between the queried keywords and a document in DPVSM was exactly the same as that in TVSM. Then, by adopting the equal-length dictionary partition method, the encrypted vector generating algorithm and the relevance score calculation algorithm between documents and queried keywords were proposed. The experimental results show that the space occupation of document vectors in DPVSM is about 50% of that in TVSM, and the more the number of documents, the greater the gap. In addition, the space occupation of query vectors and the time consumption of relevance score calculation are also much lower than TVSM. Obviously, DPVSM is superior to TVSM in both the space cost of storing vectors and the time cost of relevance score calculation.

Key words: cloud computing, vector space model, searchable encryption, dictionary partition, multi-keyword search

中图分类号: