Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (8): 2395-2400.DOI: 10.11772/j.issn.1001-9081.2017.08.2395

Previous Articles     Next Articles

Tourism route recommendation based on dynamic clustering

XIAO Chunjing1,2, XIA Kewen1, QIAO Yongwei3, ZHANG Yuxiang2   

  1. 1. School of Electronics and Information Engineering, Hebei University of Technology, Tianjin 300300, China;
    2. School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China;
    3. Engineering and Technical Training Center, Civil Aviation University of China, Tianjin 300300, China
  • Received:2017-02-08 Revised:2017-04-10 Online:2017-08-10 Published:2017-08-12
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (U1533104),the Natural Science Foundation of Hebei Province (E2016202341),the Natural Science Foundation of Tianjin (14JCZDJC32500),the Fundamental Research Funds for the Central Universities (ZXH2012P009).


肖春景1,2, 夏克文1, 乔永卫3, 张宇翔2   

  1. 1. 河北工业大学 电子信息工程学院, 天津 300300;
    2. 中国民航大学 计算机科学与技术学院, 天津 300300;
    3. 中国民航大学 工程技术训练中心, 天津 300300
  • 通讯作者: 夏克文
  • 作者简介:肖春景(1978-),女,河北唐山人,讲师,博士研究生,主要研究方向:推荐系统、数据挖掘;夏克文(1965-),男,湖南武冈人,教授,博士,主要研究方向:智能信息处理、数据挖掘;乔永卫(1976-),男,山西祁县人,讲师,硕士,主要研究方向:机器学习、智能信息处理;张宇翔(1975-),男,山西大同人,副教授,博士,主要研究方向:机器学习、数据挖掘、人工智能。
  • 基金资助:

Abstract: In session-based Collaborative Filtering (CF), a user interaction history is divided into sessions using fixed time window and user preference is expressed by sequences of them.But in tourism data, there is no interaction in some sessions and it is difficult to select neighbors because of high sparsity. To alleviate data sparsity and better use the characteristics of the tourism data, a new tourism route recommendation method based on dynamic clustering was proposed. Firstly, the different characteristics of tourism data and other standard data were analyzed. Secondly, a user interaction history was divided into sessions by variable time window using dynamic clustering and user preference model was built by combining probabilistic topic distribution obtained by Latent Dirichlet Allocation (LDA) from each session and time penalty weights. Then, the set of neighbors and candidate routes were obtained through the feature vector of users, which reflected the characteristics of tourist age, route season and price. Finally, routes were recommended according to the relevance of probabilistic topic distribution between candidate routes and tourists. It not only alleviates data sparsity by using variable time window, but also generates the optimal number of time windows which is automatically obtained from data. User feature vector was used instead of similarity of tourism data to select neighbors, so as to the avoid the computational difficulty caused by data sparsity. The experimental results on real tourism data indicate that the proposed method not only adapts to the characteristics of tourism data, but also improves the recommendation accuracy.

Key words: dynamic clustering, Latent Dirichlet Allocation (LDA), preference model, time penalty, feature vector

摘要: 基于会话的协同过滤用固定时间窗划分交互历史并将用户兴趣表示为这些阶段的序列,但是旅游数据的高稀疏性会导致某些阶段内没有交互行为和近邻相似度计算困难的问题。为了缓解数据稀疏,有效利用数据特性,提出了基于动态聚类的旅游线路推荐算法。该方法首先分析了旅游数据不同于其他标准数据的特性;其次利用动态聚类得到的变长时间窗口对游客交互历史进行划分,利用潜在狄利克雷分布(LDA)抽取每个阶段的概率主题分布,结合时间惩罚权值建立用户兴趣漂移模型;接着,通过反映年龄、线路季节、价格等因素的游客特征向量为目标游客选择近邻和候选线路集合;最后根据候选线路和游客的概率主题相关度完成线路推荐。该方法通过采用变长时间窗口不但缓解了数据稀疏,而且划分的阶段数目不需提前指定,而是根据数据特性自动生成;近邻选择时采用特征向量而非旅游数据进行相似度计算,避免了由于数据稀疏无法计算的问题。在实际旅游数据上的大量实验结果表明,该方法不仅很好适应了旅游数据特征,而且提高了旅游线路的推荐精度。

关键词: 动态聚类, 潜在狄利克雷分布, 兴趣模型, 时间惩罚, 特征向量

CLC Number: