计算机应用 ›› 2020, Vol. 40 ›› Issue (4): 1079-1084.DOI: 10.11772/j.issn.1001-9081.2019081467

• 数据科学与技术 • 上一篇    下一篇

基于稀疏轨迹聚类的自驾车旅游路线挖掘

杨奉毅1,2,3, 马玉鹏1,3, 包恒彬1,2,3, 韩云飞1,3, 马博1,2,3   

  1. 1. 中国科学院新疆理化技术研究所, 乌鲁木齐 830011;
    2. 中国科学院大学, 北京 100049;
    3. 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011
  • 收稿日期:2019-08-23 修回日期:2019-10-19 出版日期:2020-04-10 发布日期:2019-11-18
  • 通讯作者: 马玉鹏
  • 作者简介:杨奉毅(1994-),男,山东济宁人,硕士研究生,主要研究方向:大数据分析、数据挖掘;马玉鹏(1979-),男,新疆阜康人,研究员,博士,CCF会员,主要研究方向:物联网、大数据分析;包恒彬(1995-),男,辽宁本溪人,硕士研究生,主要研究方向:大数据分析、数据挖掘;韩云飞(1990-),男,山西晋城人,助理研究员,博士,主要研究方向:数据挖掘、计算机视觉;马博(1984-),男,辽宁鞍山人,副研究员,博士,CCF会员,主要研究方向:大数据分析、知识图谱。
  • 基金资助:
    新疆维吾尔自治区自然科学基金资助项目(2019D01A92);新疆天山杰出青年计划项目(2018Q005)。

Self-driving tour route mining based on sparse trajectory clustering

YANG Fengyi1,2,3, MA Yupeng1,3, BAO Hengbin1,2,3, HAN Yunfei1,3, MA Bo1,2,3   

  1. 1. The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi Xinjiang 830011, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi Xinjiang 830011, China
  • Received:2019-08-23 Revised:2019-10-19 Online:2020-04-10 Published:2019-11-18
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(2019D01A92),the Tianshan Distinguished Young Scholars of Xinjiang(2018Q005).

摘要: 针对自驾车游客加油轨迹稀疏,还原真实旅游路线困难的问题,提出一种基于语义表示的稀疏轨迹聚类算法,用以挖掘流行的自驾车旅游路线。与基于轨迹点匹配的传统轨迹聚类算法不同,该算法考虑不同轨迹点之间的语义关系,学习轨迹的低维向量表示。首先,利用神经网络语言模型学习加油站点的分布式向量表示;然后,取每条轨迹中所有站点向量的平均值作为该轨迹的向量表示;最后,采用经典的k均值算法对轨迹向量进行聚类。最终的可视化结果表明,所提算法有效地挖掘出了两条流行的自驾车旅游线路。

关键词: 稀疏轨迹, 旅游路线挖掘, 轨迹聚类, 分布式表示, 自驾车旅游

Abstract: Aiming at the difficulty of constructing real tour routes from sparse refueling trajectories of self-driving tourists,a sparse trajectory clustering algorithm based on semantic representation was proposed to mine popular self-driving tour routes. Different from traditional trajectory clustering algorithms based on trajectory point matching,in this algorithm, the semantic relationships between different trajectory points were considered and the low-dimensional vector representation of the trajectory was learned. Firstly,the neural network language model was used to learn the distributed vector representation of the gas stations. Then,the average value of all the station vectors in each trajectory was taken as the vector representation of this trajectory. Finally,the classical k-means algorithm was used to cluster the trajectory vectors. The final visualization results show that the proposed algorithm mines two popular self-driving tour routes effectively.

Key words: sparse trajectory, tour route mining, trajectory clustering, distributed representation, self-driving tour

中图分类号: