融合空间和文本信息的兴趣点类别表征模型

doi:10.11772/j.issn.1001-9081.2022071037

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (8): 2456-2461.DOI: 10.11772/j.issn.1001-9081.2022071037

• 数据科学与技术 • 上一篇

融合空间和文本信息的兴趣点类别表征模型

徐则林¹^,², 杨敏²(), 陈勐¹^,²

^1.自然资源部城市国土资源监测与仿真重点实验室，广东深圳 518034
^2.山东大学软件学院，济南 250101

收稿日期:2022-07-15 修回日期:2022-11-18 接受日期:2022-11-21 发布日期:2023-01-15 出版日期:2023-08-10
通讯作者: 杨敏
作者简介:徐则林（2000—），男，江苏海安人，硕士研究生，主要研究方向：时空数据挖掘
陈勐（1990—），男，山东滕州人，副教授，博士，CCF会员，主要研究方向：数据挖掘、城市计算。
基金资助:
自然资源部城市国土资源监测与仿真重点实验室开放基金资助课题(KF?2021?06?079)

Point-of-interest category representation model with spatial and textual information

Zelin XU¹^,², Min YANG²(), Meng CHEN¹^,²

^1.Key Laboratory of Urban Land Resources Monitoring and Simulation，Ministry of Natural Resources，Shenzhen Guangdong 518034，China
^2.School of Software，Shandong University，Jinan Shandong 250101，China

Received:2022-07-15 Revised:2022-11-18 Accepted:2022-11-21 Online:2023-01-15 Published:2023-08-10
Contact: Min YANG
About author:XU Zelin， born in 2000， M. S. candidate. His research interests include spatio-temporal data mining.
CHEN Meng， born in 1990， Ph. D.， associate professor. His research interests include data mining， urban computing.
Supported by:
Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources(KF-2021-06-079)

摘要/Abstract

摘要：

准确表征兴趣点（POI）类别（如大学、餐厅等）是理解城市空间、辅助城市计算的关键。现有的POI类别表征模型通常只挖掘用户在POI之间的移动行为并学习序列特征，而忽视了POI数据的空间特征和文本语义特征。为了解决上述问题，提出一种融合空间和文本信息的POI类别表征学习模型Cat2Vec。首先，利用POI的空间共现关系构建POI类别共现点互信息（PMI）矩阵；然后，基于预训练的文本表征模型学习POI的文本语义特征；最后，引入新的映射矩阵，并基于矩阵分解技术将PMI矩阵分解为POI类别表征矩阵、文本语义特征矩阵以及映射矩阵的内积。在两个真实世界的数据集Yelp和高德上进行的POI语义重叠度评测中，相较于基准模型中表现最好的Doc2Vec模型，所提模型的性能分别平均提高了5.53%和8.17%。实验结果表明所提模型能更有效地嵌入POI语义。

关键词: 兴趣点类别, 表征学习, 特征融合, 兴趣点语义, 矩阵分解

Abstract:

Representing Point-Of-Interest （POI） categories （e.g.， universities， restaurants） accurately is the key to understand urban space and assist urban computing. Existing models for POI category representation usually only mine users’ mobility behaviors among POIs and learn sequential features， while ignoring spatial and textual semantic features of POI data. In order to solve the above problems， a POI category representation learning model incorporating spatial and textual information — Cat2Vec was proposed. Firstly， a POI category co-occurrence Point-wise Mutual Information （PMI） matrix was constructed by using the spatial co-occurrence relationships of POIs. Then， the text semantic features of POIs were learnt by a pre-trained text representation model. Finally， a new mapping matrix was introduced， and based on the matrix factorization technology， the PMI matrix was decomposed into an inner product of a POI category representation matrix， a text semantic feature matrix and a mapping matrix. In the evaluation of semantic overlapping of POIs on two real-world datasets Yelp and AMap， compared to Doc2Vec， the best model among baselines， the proposed model has the performance improved by 5.53% and 8.17% averagely and respectively. Experimental results show that the proposed model can embed the semantics of POIs more effectively.

Key words: Point-Of-Interest (POI) category, representation learning, feature fusion, POI semantics, matrix factorization

中图分类号:

TP399

徐则林, 杨敏, 陈勐. 融合空间和文本信息的兴趣点类别表征模型[J]. 计算机应用, 2023, 43(8): 2456-2461.

Zelin XU, Min YANG, Meng CHEN. Point-of-interest category representation model with spatial and textual information[J]. Journal of Computer Applications, 2023, 43(8): 2456-2461.

图/表 7

图1 兴趣点类别-出现频次分布

Fig. 1 Distribution of POI category-frequency

图2 兴趣点类别的词汇分布

Fig. 2 Word distribution of POI category

表1 符号和描述

Tab. 1 Symbols and descriptions

符号	描述
$v, t, c$	兴趣点、目标兴趣点类别、上下文兴趣点类别
$N$	兴趣点类别数量
$D$	向量空间的维度
$K$	文本语义特征向量的维度
$V t$	目标兴趣点类别的向量表征
$V c'$	上下文兴趣点类别的向量表征
$M ∈ R N × N$	兴趣点类别的共现PMI矩阵
$T ∈ R N × D$	目标兴趣点类别的表征矩阵
$C ∈ R N × D$	上下文兴趣点类别的表征矩阵
$S ∈ R N × K$	兴趣点类别的文本语义特征矩阵
$Y ∈ R D × K$	映射矩阵

表1 符号和描述

Tab. 1 Symbols and descriptions

符号	描述
$v, t, c$	兴趣点、目标兴趣点类别、上下文兴趣点类别
$N$	兴趣点类别数量
$D$	向量空间的维度
$K$	文本语义特征向量的维度
$V t$	目标兴趣点类别的向量表征
$V c'$	上下文兴趣点类别的向量表征
$M ∈ R N × N$	兴趣点类别的共现PMI矩阵
$T ∈ R N × D$	目标兴趣点类别的表征矩阵
$C ∈ R N × D$	上下文兴趣点类别的表征矩阵
$S ∈ R N × K$	兴趣点类别的文本语义特征矩阵
$Y ∈ R D × K$	映射矩阵

图3 Cat2Vec模型框架

Fig. 3 Framework of Cat2Vec model

表2 语义重叠度评测的结果

Tab. 2 Results of semantic overlapping evaluation

模态	模型	Yelp数据集			高德数据集
模态	模型	$K n e a r = 5$	$K n e a r = 10$	$K n e a r = 15$	$K n e a r = 5$	$K n e a r = 10$	$K n e a r = 15$
空间	SC	0.417	0.375	0.350	0.712	0.639	0.590
空间	Cat2Vec（w/o Text）	0.420	0.342	0.299	0.504	0.392	0.328
文本	LDA	0.637	0.604	0.572	0.510	0.450	0.424
文本	Doc2Vec	0.628	0.574	0.538	0.723	0.623	0.552
多模态	Cat2Vec	0.647	0.606	0.581	0.760	0.675	0.613

表2 语义重叠度评测的结果

Tab. 2 Results of semantic overlapping evaluation

模态	模型	Yelp数据集			高德数据集
模态	模型	$K n e a r = 5$	$K n e a r = 10$	$K n e a r = 15$	$K n e a r = 5$	$K n e a r = 10$	$K n e a r = 15$
空间	SC	0.417	0.375	0.350	0.712	0.639	0.590
空间	Cat2Vec（w/o Text）	0.420	0.342	0.299	0.504	0.392	0.328
文本	LDA	0.637	0.604	0.572	0.510	0.450	0.424
文本	Doc2Vec	0.628	0.574	0.538	0.723	0.623	0.552
多模态	Cat2Vec	0.647	0.606	0.581	0.760	0.675	0.613

图4 参数敏感性分析

Fig. 4 Parameter sensitivity analysis

表3 不同模型运行时间的比较 (s)

Tab. 3 Running time comparison of different models

模型

Yelp

数据

高德

数据

模型

Yelp

数据

高德

数据

参考文献 21

1	乐阳，刘瑜，陈云松，等. 空间和地理计算与计算社会学的融合路径［J］. 武汉大学学报（信息科学版）， 2022， 47（1）： 1-18.
	YUE Y， LIU Y， CHEN Y S， et al. Integration path of spatial and geo-computing and computational social science［J］. Geomatics and Information Science of Wuhan University， 2022， 47（1）： 1-18.
2	郑宇. 城市计算：用大数据和AI驱动智能城市［J］. 中国计算机学会通讯， 2018， 14（1）：8-17.
	ZHENG Y. Urban computing： powering smart cities with big data and AI［J］. Communications of CCF， 2018， 14（1）：8-17.
3	LUCCHINI L， CENTELLEGHER S， PAPPALARDO L， et al. Living in a pandemic： changes in mobility routines， social activity and adherence to COVID-19 protective measures［J］. Scientific Reports， 2021， 11： No.24452. 10.1038/s41598-021-04139-1
4	ZHANG C， ZHAO K， CHEN M. Beyond the limits of predictability in human mobility prediction： context-transition predictability［J］. IEEE Transactions on Knowledge and Data Engineering， 2023， 35（5）： 4514-4526.
5	DONG Z， MENG X W， ZHANG Y J. Exploiting category-level multiple characteristics for POI recommendation［J］. IEEE Transactions on Knowledge and Data Engineering， 2023， 35（2）： 1488-1501.
6	孟祥福，张霄雁，唐延欢，等. 基于地理-社会关系的多样性与个性化兴趣点推荐［J］. 计算机学报， 2019， 42（11）： 2574-2590. 10.11897/SP.J.1016.2019.02574
	MENG X F， ZHANG X Y， TANG Y H， et al. A diversified and personalized recommendation approach based on geo-social relationships［J］. Chinese Journal of Computers， 2019， 42（11）： 2574-2590. 10.11897/SP.J.1016.2019.02574
7	CAO H C， XU F L， SANKARANARAYANAN J， et al. Habit2vec： trajectory semantic embedding for living pattern recognition in population［J］. IEEE Transactions on Mobile Computing， 2020， 19（5）： 1096-1108. 10.1109/tmc.2019.2902403
8	陈勐，刘洋，王月，等. 基于时序特征的移动模式挖掘［J］. 中国科学：信息科学， 2016， 46（9）： 1288-1297. 10.1360/n112015-00237
	CHEN M， LIU Y， WANG Y， et al. Mining mobility patterns based on temporal and sequential features［J］. SCIENTIA SINICA Informationis， 2016， 46（9）： 1288-1297. 10.1360/n112015-00237
9	YANG J， EICKHOFF C. Unsupervised learning of parsimonious general-purpose embeddings for user and location modeling［J］. ACM Transactions on Information Systems， 2018， 36（3）： No.32. 10.1145/3182165
10	CHEN M， ZHU L， XU R H， et al. Embedding hierarchical structures for venue category representation［J］. ACM Transactions on Information Systems， 2022， 40（3）： No.57. 10.1145/3478285
11	YAN B， JANOWICZ K， MAI G C， et al. From ITDL to Place2Vec： reasoning about place type similarity and relatedness by learning embeddings from augmented spatial contexts［C］// Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York： ACM， 2017： No.35. 10.1145/3139958.3140054
12	ZHAI W， BAI X， SHI Y， et al. Beyond Word2vec： an approach for urban functional region extraction and identification by combining Place2vec and POIs［J］. Computers， Environment and Urban Systems， 2019， 74： 1-12. 10.1016/j.compenvurbsys.2018.11.008
13	HUANG W M， CUI L Z， CHEN M， et al. Estimating urban functional distributions with semantics preserved POI embedding［J］. International Journal of Geographical Information Science， 2022， 36（10）： 1905-1930. 10.1080/13658816.2022.2040510
14	FENG S S， CONG G， AN B， et al. POI2Vec： geographical latent representation for predicting future visitors［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Pal Alto， CA： AAAI Press， 2017： 102-108. 10.1609/aaai.v31i1.10500
15	LIAN D F， WU Y J， GE Y， et al. Geography-aware sequential location recommendation［C］// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2020： 2009-2019. 10.1145/3394486.3403252
16	LE Q， MIKOLOV T. Distributed representations of sentences and documents［C］// Proceedings of the 31st International Conference on Machine Learning. New York： JMLR.org， 2014： 1188-1196.
17	LIU S P， YU J F， LI J Z， et al. Geographical information enhanced POI hierarchical classification［C］// Proceedings of the 2020 International Conference on Web Information Systems and Applications， LNCS 12432. Cham： Springer， 2020： 108-119.
18	MIKOLOV T， SUTSKEVER I， CHEN K， et al. Distributed representations of words and phrases and their compositionality［C］// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2013： 3111-3119.
19	LEVY O， GOLDBERG Y. Neural word embedding as implicit matrix factorization［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 2177-2185.
20	BLEI D M， NG A Y， JORDAN M I. Latent Dirichlet allocation［J］. Journal of Machine Learning Research， 2003， 3： 993-1022.
21	BIESTER L， BANEA C， MIHALCEA R. Building location embeddings from physical trajectories and textual representations［C］// Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Stroudsburg， PA： ACL， 2020： 425-434.

[1]	周寅莹, 周允升, 余敦辉, 孙军. 基于消极相似性的自适应社会化推荐[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2439-2447.
[2]	郑帅, 张晓龙, 邓鹤, 任宏伟. 基于多尺度特征融合和网格注意力机制的三维肝脏影像分割方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2303-2310.
[3]	谭钰, 王小琴, 蓝如师, 刘振丙, 罗笑南. 基于判别性矩阵分解的多标签跨模态哈希检索[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1349-1354.
[4]	吕学强, 张煜楠, 韩晶, 崔运鹏, 李欢. 融合边特征与注意力的表格结构识别模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 752-758.
[5]	李振亮, 李波. 基于矩阵分解的卷积神经网络改进方法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 685-691.
[6]	王萍, 陈楠, 鲁磊. 基于场景先验及注意力引导的跌倒检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 529-535.
[7]	陈刚, 廖永为, 杨振国, 刘文印. 基于多特征融合的多尺度生成对抗网络图像修复算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 536-544.
[8]	李文举, 张干, 崔柳, 储王慧. 基于坐标注意力的轻量级交通标志识别模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 608-614.
[9]	杨洪刚, 陈洁洁, 徐梦飞. 双线性内卷神经网络用于眼底疾病图像分类[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 259-264.
[10]	吕玉超, 姜茜, 徐英豪, 朱习军. 基于多尺度特征融合的改进臂丛神经分割方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 273-279.
[11]	强赞霞, 鲍先富. 基于卷积长短期记忆的残差注意力去雨网络[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2858-2864.
[12]	孟昱煜, 郭静. 信息熵改进主成分分析模型的链路预测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2823-2829.
[13]	杨瑞杰, 郑贵林. 基于InceptionV3和特征融合的人脸活体检测[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2037-2042.
[14]	张达为, 刘绪崇, 周维, 陈柱辉, 余瑶. 基于改进YOLOv3的实时交通标志检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2219-2226.
[15]	包永春, 张建臣, 杜守信, 张军军. 基于非负矩阵分解与稀疏表示的多标签分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1375-1382.

融合空间和文本信息的兴趣点类别表征模型

Point-of-interest category representation model with spatial and textual information

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 21

相关文章 15

编辑推荐

Metrics