基于位置序列的广义后缀树用户相似性计算方法

doi:10.11772/j.issn.1001-9081.2015.06.1654

计算机应用 ›› 2015, Vol. 35 ›› Issue (6): 1654-1658.DOI: 10.11772/j.issn.1001-9081.2015.06.1654

基于位置序列的广义后缀树用户相似性计算方法

肖艳丽, 张振宇, 袁江涛

新疆大学信息科学与工程学院, 乌鲁木齐 830046

收稿日期:2014-12-29 修回日期:2015-03-31 发布日期:2015-06-12
通讯作者: 肖艳丽(1989-),女,四川达州人,硕士研究生,主要研究方向:数据挖掘、模式识别;xiaoyanli1314@163.com
作者简介:张振宇(1964-),男,山西大同人,教授,CCF会员,主要研究方向:数据挖掘、移动对等网络;袁江涛(1989-),男,新疆伊宁人,硕士研究生,主要研究方向:机会网络信任模型、数据挖掘。
基金资助:
国家自然科学基金资助项目(61262089,61262087);新疆教育厅高校教师科研计划重点项目 (XJEDU2012I09);新疆大学博士毕业生科研启动基金资助项目 (BS110127)。

Calculation method of user similarity based on location sequence generalized suffix tree

XIAO Yanli, ZHANG Zhenyu, YUAN Jiangtao

School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China

Received:2014-12-29 Revised:2015-03-31 Online:2015-06-12

摘要/Abstract

摘要：

为了解决移动数据形成的轨迹间用户相似性问题,提出了一种基于位置序列的广义后缀树(LSGST)用户相似性计算方法。该算法首先从移动数据中抽取位置序列,同时将位置序列映射为字符串,完成了对位置序列的处理到对字符串处理的转化工作;然后,构建不同用户间的位置序列广义后缀树;最后,分别从经过的相似地方个数、最长公共子序列、频繁公共位置序列三方面对相似性进行具体计算。理论分析和仿真表明,该算法提出的三个计算指标在计算相似性方面具有理想的效果;除此之外,与构造后缀树的普通方法相比,时间复杂度较低;与动态规划和朴素字符串匹配方法相比,该算法在寻找最长公共子串、频繁公共位置序列时,效率更高。实验结果表明LSGST能够有效测量相似性,同时减少了寻找测量指标时需要处理的轨迹数据量,并在时间复杂度方面明显优于对比算法。

关键词: 移动数据, 用户相似性, 位置序列, 字符串匹配, 广义后缀树

Abstract:

To solve the user similarity between trajectories formed by mobility data, an algorithm based on Location Sequence Generalized Suffix Tree (LSGST) was proposed. First, the location sequence was extracted from mobility data. At the same time the location sequence was mapped to a string. The transformation from the processing of location sequence to the processing of string was completed. Then the location sequence generalized suffix tree between different users was constructed. The similarity was calculated in detail from the number of similar positions, longest common subsequence and the frequent common position sequence. The theoretical analysis and simulation results show that the proposed algorithm has ideal effect in terms of similarity measure. Besides, compared to the ordinary construction method, the proposed algorithm has low time complexity. In the comparison with dynamic programming and naive string-matching, the proposed algorithm has higher efficiency when searching for the longest common sub-string and frequent public position sequence. The experimental results indicate that the LSGST can measure the similarity effectively, meanwhile reduces the trajectory data when searching for the measurement index, and has better performance in time complexity.

Key words: mobility data, user similarity, location sequence, string matching, generalized suffix tree

中图分类号:

TP391

肖艳丽, 张振宇, 袁江涛. 基于位置序列的广义后缀树用户相似性计算方法[J]. 计算机应用, 2015, 35(6): 1654-1658.

XIAO Yanli, ZHANG Zhenyu, YUAN Jiangtao. Calculation method of user similarity based on location sequence generalized suffix tree[J]. Journal of Computer Applications, 2015, 35(6): 1654-1658.

参考文献

[1] TAN P, STEINBACH M, KUMAR V. Introduction to data mining[M]. FAN M, FAN H, et al.Translated. Beijing: Posts and Telecom Press, 2011: 3-20. (TAN P, STEINBACH M, KUMAR V.数据挖掘导论[M].范明, 范宏建, 等译.北京:人民邮电出版社, 2011:3-20.)
[2] LIN M, HSU W-J. Mining GPS data for mobility patterns: A survey[J]. Pervasive and Mobile Computing, 2014, 12: 1-16.
[3] GUY I, RONEN I, WILCOXL E. Do you know recommending people to invite into your social network [C]//Proceedings of the 13th International Conference on Intelligent User Interfaces. New York: ACM, 2009: 77-86.
[4] LI X, ZHANG X. Computing user similarity of spatio-temporal behaviour and interests based on LCS [J]. Computer Engineering and Applications, 2013, 49(20): 251-254.(李晓静, 张晓滨.基于LCS的用户时空行为兴趣相似性计算方法[J]. 计算机工程与应用, 2013, 49(20):251-254.)
[5] TANG M, ZHOU Y, CUI P, et al. Data mining application on the domain of birds migration research: Discovery of habitats and routes[C]//Proceedings of the 7th Conference on the Research and Application of the Cross-Strait Exchanges. Beijing: Science Press, 2009: 182-187.(唐明洁, 周园春, 崔鹏, 等.基于数据挖掘技术的青海湖鸟类迁徙规律发现[C]//第七届(2009)两岸三院信息技术与应用交流研讨会论文集.北京:科学出版社, 2009:182-187.)
[6] YUAN J, ZHENG Y, ZHANG C, et al. T-drive: Driving directions based on taxi trajectories [C]//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York: ACM, 2010: 99-108.
[7] AGRAWAL R, FALOUTSOS C, SWAMI A. Efficient similarity search in sequence databases [C]//Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, LNCS 730. Berlin: Springer, 1993: 69-84.
[8] PARK S, CHU W W, YOON J, et al. Efficient searches for similar subsequences of different lengths in sequence databases [C]//Proceeding of the 16th International Conference of Data Engineering. Piscataway: IEEE, 2000: 23-32.
[9] CAI Y, NG R. Indexing spatio-temporal trajectories with Chebyshev polynomials [C]//Proceedings of 2004 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2004: 599-610.
[10] VLACHOS M, KOLLIOS G, GUNOPULOS D. Discovering similar multidimensional trajectories [C]//Proceedings of 18th International Conference on Data Engineering. Piscataway: IEEE, 2002: 673-684.
[11] CHEN L, OZSU M T, ORIA V. Robust and fast similarity search for moving object trajectories [C]//Proceedings of 2005 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2005: 491-502.
[12] BUCHIN K, BUNCHIN M, GUDNUNDSSON M, et al. Detecting commuting patterns by clustering subtrajectories [C]//ISAAC'08: Proceedings of the 19th International Symposium on Algorithms and Computation. Berlin: Springer, 2008: 644-655.
[13] XIAO X, ZHENG Y, LUO Q, et al. Finding similar users using category-based location history [C]//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York: ACM, 2010: 442-445.
[14] ZHENG Y, XIE X. Learning travel recommendations from user generated GPS traces [J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(1): 2-19.
[15] WANG Y, HU R, HUANG W, et al. Mining user similarity using spatial-temporal intersection [J]. International Journal of Computer Science, 2013, 10(1): 215-221.
[16] LU E H-C, TSENG V S. Mining cluster-based mobile sequential patterns in location-based service environments [C]//Proceedings of the 2009 10th International Conference on Mobile Data Management: Systems, Services and Middleware. Washington, DC: IEEE Computer Society, 2009: 273-278.
[17] LEE M-J, CHUNG C-W. A user similarity calculation based on the location for social network services [C]//DASFAA 2011:Proceedings of the 16th International Conference on Database System for Advanced Applications. Berlin: Springer, 2011, 1: 38-52.
[18] BOGORNY V, KUIJPERS B, ALVARES L O, et al. ST-DMQL: A semantic trajectory data mining query language [J]. International Journal of Geographical Information Science, 2009, 23(10): 1245-1276.
[19] XIAO X, ZHENG Y, LUO Q, et al. Finding similar users using category-based location history [C]//GIS'10: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York: ACM, 2010: 442-445.
[20] BIAGIONI J, KRUMM J. Days of our lives: assessing day similarity from location traces [C]//UMAP 2013: Proceedings of the 21th International Conference on User Modeling, Adaptation, and Personalization, LNCS 7899. Berlin: Springer, 2013: 89-101.
[21] LV M, CHEN L, CHEN G. Mining user similarity based on routine activities [J]. Information Sciences, 2013, 236: 17-32.
[22] UKKONEN E. Online construction of suffix trees [J]. Algorithmica, 1995, 14(3): 249-260.
[23] EAGLE N, PENTLAND A, LAZER D. Inferring social network structure using mobile phone data [C]//Proceedings of the 2009 National Academy of Science, 2009, 106(36): 15274-15278.

基于位置序列的广义后缀树用户相似性计算方法

Calculation method of user similarity based on location sequence generalized suffix tree

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 12

编辑推荐

Metrics

[1]	李美子, 米一菲, 张倩, 张波. 社交网络中基于K核分解的意见领袖识别算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 26-35.
[2]	王付强, 彭甫镕, 丁小焕, 陆建峰. 基于位置的非对称相似性度量的协同过滤推荐算法[J]. 计算机应用, 2016, 36(1): 171-174.
[3]	燕彩蓉, 朱斌, 王健, 黄永锋. 基于划分的增量式字符串相似性连接方法[J]. 计算机应用, 2016, 36(1): 27-32.
[4]	陈勐, 禹晓辉, 刘洋. 基于深度表示模型的移动模式挖掘[J]. 计算机应用, 2016, 36(1): 33-38.
[5]	翁唱玲杨清. 移动数据库缓存模型研究[J]. 计算机应用, 2013, 33(11): 3267-3270.
[6]	袁书寒陈维斌傅顺开. 位置服务社交网络用户行为相似性分析[J]. 计算机应用, 2012, 32(02): 322-325.
[7]	刘黎志吴云韬. 应用WCF分布式框架实现移动数据同步[J]. 计算机应用, 2011, 31(12): 3281-3284.
[8]	吴发青贺樑夏薇薇任磊. 一种基于用户兴趣局部相似性的推荐算法[J]. 计算机应用, 2008, 28(8): 1981-1985.
[9]	陈冬旭程小辉. 基于线性规划的移动数据库广播调度算法研究[J]. 计算机应用, 2008, 28(4): 874-876.
[10]	陈历胜郭海滨叶飞跃. 移动计算环境下的一种同步复制模型[J]. 计算机应用, 2008, 28(10): 2544-2547.
[11]	蔡晓妍戴冠中杨黎斌. 改进的多模式字符串匹配算法[J]. 计算机应用, 2007, 27(6): 1415-1417.
[12]	程国达，邹亚会，朱静. 一种自适应信息集成方法[J]. 计算机应用, 2005, 25(03): 666-669.