《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (8): 2456-2461.DOI: 10.11772/j.issn.1001-9081.2022071037

• 数据科学与技术 • 上一篇    

融合空间和文本信息的兴趣点类别表征模型

徐则林1,2, 杨敏2(), 陈勐1,2   

  1. 1.自然资源部城市国土资源监测与仿真重点实验室,广东 深圳 518034
    2.山东大学 软件学院,济南 250101
  • 收稿日期:2022-07-15 修回日期:2022-11-18 接受日期:2022-11-21 发布日期:2023-01-15 出版日期:2023-08-10
  • 通讯作者: 杨敏
  • 作者简介:徐则林(2000—),男,江苏海安人,硕士研究生,主要研究方向:时空数据挖掘
    陈勐(1990—),男,山东滕州人,副教授,博士,CCF会员,主要研究方向:数据挖掘、城市计算。
  • 基金资助:
    自然资源部城市国土资源监测与仿真重点实验室开放基金资助课题(KF?2021?06?079)

Point-of-interest category representation model with spatial and textual information

Zelin XU1,2, Min YANG2(), Meng CHEN1,2   

  1. 1.Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources,Shenzhen Guangdong 518034,China
    2.School of Software,Shandong University,Jinan Shandong 250101,China
  • Received:2022-07-15 Revised:2022-11-18 Accepted:2022-11-21 Online:2023-01-15 Published:2023-08-10
  • Contact: Min YANG
  • About author:XU Zelin, born in 2000, M. S. candidate. His research interests include spatio-temporal data mining.
    CHEN Meng, born in 1990, Ph. D., associate professor. His research interests include data mining, urban computing.
  • Supported by:
    Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources(KF-2021-06-079)

摘要:

准确表征兴趣点(POI)类别(如大学、餐厅等)是理解城市空间、辅助城市计算的关键。现有的POI类别表征模型通常只挖掘用户在POI之间的移动行为并学习序列特征,而忽视了POI数据的空间特征和文本语义特征。为了解决上述问题,提出一种融合空间和文本信息的POI类别表征学习模型Cat2Vec。首先,利用POI的空间共现关系构建POI类别共现点互信息(PMI)矩阵;然后,基于预训练的文本表征模型学习POI的文本语义特征;最后,引入新的映射矩阵,并基于矩阵分解技术将PMI矩阵分解为POI类别表征矩阵、文本语义特征矩阵以及映射矩阵的内积。在两个真实世界的数据集Yelp和高德上进行的POI语义重叠度评测中,相较于基准模型中表现最好的Doc2Vec模型,所提模型的性能分别平均提高了5.53%和8.17%。实验结果表明所提模型能更有效地嵌入POI语义。

关键词: 兴趣点类别, 表征学习, 特征融合, 兴趣点语义, 矩阵分解

Abstract:

Representing Point-Of-Interest (POI) categories (e.g., universities, restaurants) accurately is the key to understand urban space and assist urban computing. Existing models for POI category representation usually only mine users’ mobility behaviors among POIs and learn sequential features, while ignoring spatial and textual semantic features of POI data. In order to solve the above problems, a POI category representation learning model incorporating spatial and textual information — Cat2Vec was proposed. Firstly, a POI category co-occurrence Point-wise Mutual Information (PMI) matrix was constructed by using the spatial co-occurrence relationships of POIs. Then, the text semantic features of POIs were learnt by a pre-trained text representation model. Finally, a new mapping matrix was introduced, and based on the matrix factorization technology, the PMI matrix was decomposed into an inner product of a POI category representation matrix, a text semantic feature matrix and a mapping matrix. In the evaluation of semantic overlapping of POIs on two real-world datasets Yelp and AMap, compared to Doc2Vec, the best model among baselines, the proposed model has the performance improved by 5.53% and 8.17% averagely and respectively. Experimental results show that the proposed model can embed the semantics of POIs more effectively.

Key words: Point-Of-Interest (POI) category, representation learning, feature fusion, POI semantics, matrix factorization

中图分类号: