计算机应用 ›› 2014, Vol. 34 ›› Issue (12): 3475-3480.

• 数据技术 • 上一篇    下一篇

图上基于局部敏感哈希的多关键字索引

韩京宇,杨健   

  1. 南京邮电大学 计算机学院, 南京 210003
  • 收稿日期:2014-07-08 修回日期:2014-09-15 出版日期:2014-12-01 发布日期:2014-12-31
  • 通讯作者: 韩京宇
  • 作者简介:韩京宇(1976-),男,吉林白山人,副教授,博士,CCF会员,主要研究方向:数据管理、知识库;杨健(1978-),男,江苏徐州人,副教授,博士,主要研究方向:信息系统、知识库。
  • 基金资助:

    国家自然科学基金资助项目

Locality-sensitive hashing index for multiple keywords over graphs

HAN Jingyu,YANG Jian   

  1. School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210003, China
  • Received:2014-07-08 Revised:2014-09-15 Online:2014-12-01 Published:2014-12-31
  • Contact: HAN Jingyu

摘要:

针对目前基于倒排表的图关键字索引不能有效处理多个关键字查询,也不能对关键字拼写容错的问题,提出一种位图和局部敏感哈希(BLH)相结合的双层索引来支持图的多关键字查询:上层构建位图,依据关键字组合的n-gram映射到子图类簇,每个类簇存储相似的子图;下层在每个类簇上构建局部敏感哈希索引,根据关键字组合的n-gram定位到包含关键字组合的子图。该方法可显著减少图上关键字查询的I/O,查询时间缩减80%;并且,基于n-gram构建索引,可以避免索引对拼写错误敏感,在关键字容错的前提下返回用户期望的结果。实际数据集上的实验结果表明BLH索引的有效性,可以支持万维网、社会网络的高效查询。

Abstract:

As existing inverted index not only cannot efficiently handle multiple-keyword query, but also cannot find results for misspelled keywords, a bi-level index leveraging Bitmap and Locality-sensitive Hashing (BLH) was proposed to support multiple-keyword queries. The upper-level of BLH is bitmaps, which map keywords onto clusters of sub-graphs based on the n-grams in the keywords. Each cluster stores the similar sub-graphs. On the lower-level, each cluster has a locality-sensitive hashing index, which helps identify the sub-graphs that contain the keywords based on their n-grams. The indexing scheme of BLH can dramatically decrease query I/Os, thus reducing the query time by 80%. Furthermore, the index based on n-gram can avoid the sensitivity to spelling mistakes of keywords, guaranteeing to return expected results in any case. The experimental results on real data sets demonstrate the effectiveness of the BLH index, which can efficiently support the querying of Web and social network.

中图分类号: