计算机应用 ›› 2015, Vol. 35 ›› Issue (6): 1654-1658.DOI: 10.11772/j.issn.1001-9081.2015.06.1654

• 人工智能 • 上一篇    下一篇

基于位置序列的广义后缀树用户相似性计算方法

肖艳丽, 张振宇, 袁江涛   

  1. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046
  • 收稿日期:2014-12-29 修回日期:2015-03-31 发布日期:2015-06-12
  • 通讯作者: 肖艳丽(1989-),女,四川达州人,硕士研究生,主要研究方向:数据挖掘、模式识别;xiaoyanli1314@163.com
  • 作者简介:张振宇(1964-),男,山西大同人,教授,CCF会员,主要研究方向:数据挖掘、移动对等网络;袁江涛(1989-),男,新疆伊宁人,硕士研究生,主要研究方向:机会网络信任模型、数据挖掘。
  • 基金资助:

    国家自然科学基金资助项目(61262089,61262087);新疆教育厅高校教师科研计划重点项目 (XJEDU2012I09);新疆大学博士毕业生科研启动基金资助项目 (BS110127)。

Calculation method of user similarity based on location sequence generalized suffix tree

XIAO Yanli, ZHANG Zhenyu, YUAN Jiangtao   

  1. School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China
  • Received:2014-12-29 Revised:2015-03-31 Published:2015-06-12

摘要:

为了解决移动数据形成的轨迹间用户相似性问题,提出了一种基于位置序列的广义后缀树(LSGST)用户相似性计算方法。该算法首先从移动数据中抽取位置序列,同时将位置序列映射为字符串,完成了对位置序列的处理到对字符串处理的转化工作;然后,构建不同用户间的位置序列广义后缀树;最后,分别从经过的相似地方个数、最长公共子序列、频繁公共位置序列三方面对相似性进行具体计算。理论分析和仿真表明,该算法提出的三个计算指标在计算相似性方面具有理想的效果;除此之外,与构造后缀树的普通方法相比,时间复杂度较低;与动态规划和朴素字符串匹配方法相比,该算法在寻找最长公共子串、频繁公共位置序列时,效率更高。实验结果表明LSGST能够有效测量相似性,同时减少了寻找测量指标时需要处理的轨迹数据量,并在时间复杂度方面明显优于对比算法。

关键词: 移动数据, 用户相似性, 位置序列, 字符串匹配, 广义后缀树

Abstract:

To solve the user similarity between trajectories formed by mobility data, an algorithm based on Location Sequence Generalized Suffix Tree (LSGST) was proposed. First, the location sequence was extracted from mobility data. At the same time the location sequence was mapped to a string. The transformation from the processing of location sequence to the processing of string was completed. Then the location sequence generalized suffix tree between different users was constructed. The similarity was calculated in detail from the number of similar positions, longest common subsequence and the frequent common position sequence. The theoretical analysis and simulation results show that the proposed algorithm has ideal effect in terms of similarity measure. Besides, compared to the ordinary construction method, the proposed algorithm has low time complexity. In the comparison with dynamic programming and naive string-matching, the proposed algorithm has higher efficiency when searching for the longest common sub-string and frequent public position sequence. The experimental results indicate that the LSGST can measure the similarity effectively, meanwhile reduces the trajectory data when searching for the measurement index, and has better performance in time complexity.

Key words: mobility data, user similarity, location sequence, string matching, generalized suffix tree

中图分类号: