基于状态精细化长短期记忆和注意力机制的社交生成对抗网络用于行人轨迹预测

doi:10.11772/j.issn.1001-9081.2022040602

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (5): 1565-1570.DOI: 10.11772/j.issn.1001-9081.2022040602

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于状态精细化长短期记忆和注意力机制的社交生成对抗网络用于行人轨迹预测

吴家皋¹^,²(), 章仕稳¹^,², 蒋宇栋¹^,², 刘林峰¹^,²

^1.南京邮电大学计算机学院，南京 210023
^2.江苏省大数据安全与智能处理重点实验室（南京邮电大学），南京 210023

收稿日期:2022-04-29 修回日期:2022-07-10 接受日期:2022-07-11 发布日期:2022-08-05 出版日期:2023-05-10
通讯作者: 吴家皋
作者简介:吴家皋（1969—），男，江苏苏州人，副教授，博士，CCF会员，主要研究方向：计算机网络、人工智能 jgwu@njupt.edu.cn
章仕稳（1996—），男，江苏南京人，硕士研究生，主要研究方向：轨迹预测、深度学习
蒋宇栋（1999—），男，江苏盐城人，硕士研究生，主要研究方向：轨迹预测、深度学习
刘林峰（1981—），男，江苏丹阳人，教授，博士，主要研究方向：计算机网络、移动计算。
基金资助:
国家自然科学基金资助项目(61872191)

Social-interaction GAN for pedestrian trajectory prediction based on state-refinement long short-term memory and attention mechanism

Jiagao WU¹^,²(), Shiwen ZHANG¹^,², Yudong JIANG¹^,², Linfeng LIU¹^,²

^1.School of Computer Science，Nanjing University of Posts and Telecommunications，Nanjing Jiangsu 210023，China
^2.Jiangsu Key Laboratory of Big Data Security and Intelligent Processing （Nanjing University of Posts and Telecommunications），Nanjing Jiangsu 210023，China

Received:2022-04-29 Revised:2022-07-10 Accepted:2022-07-11 Online:2022-08-05 Published:2023-05-10
Contact: Jiagao WU
About author:WU Jiagao， born in 1969， Ph. D.， associate professor. His research interests include computer network， artificial intelligence.
ZHANG Shiwen， born in 1996， M. S. candidate. His research interests include trajectory prediction， deep learning.
JIANG Yudong， born in 1999， M. S. candidate. His research interests include trajectory prediction， deep learning.
LIU Linfeng， born in 1981， Ph. D.， professor. His research interests include computer network， mobile computing.
Supported by:
National Natural Science Foundation of China(61872191)

摘要/Abstract

摘要：

针对当前行人轨迹预测研究仅考虑影响行人交互因素的问题，基于状态精细化长短期记忆（SR-LSTM）和注意力机制提出一种用于行人轨迹预测的社交生成对抗网络（SRA-SIGAN）模型，利用生成对抗网络（GAN）学习获得目标行人的运动规律。首先，使用SR-LSTM作为位置编码器提取运动意图信息；其次，通过设置速度注意力机制合理地为同一场景中的行人分配影响力，以更好地处理行人的交互；最后，由解码器生成预测的未来轨迹。在多个公开数据集上的测试实验结果表明，SRA-SIGAN模型的总体表现良好。特别是在Zara1数据集上，与SR-LSTM模型相比，SRA-SIGAN模型的平均位移误差（ADE）和最终位移误差（FDE）分别减小了20.0%和10.5%；与社交生成对抗网络（SIGAN）模型相比，SRA-SIGAN的ADE和FDE分别下降了31.7%和24.4%。

关键词: 生成对抗网络, 长短期记忆网络, 行人轨迹预测, 注意力机制, 行人交互

Abstract:

In order to solve the problem of most current research work only considering the factors affecting pedestrian interaction， based on State-Refinement Long Short-Term Memory （SR-LSTM） and attention mechanism， a Social-Interaction Generative Adversarial Network （SIGAN） for pedestrian trajectory prediction was proposed， namely SRA-SIGAN， where GAN was utilized to learn movement patterns of target pedestrians. Firstly， SR-LSTM was used as a location encoder to extract the information of motion intention. Secondly， the influence of pedestrians in the same scene was reasonably assigned by setting the velocity attention mechanism， thereby handling the pedestrian interaction better. Finally， the predicted future trajectory was generated by the decoder. Experimental results on several public datasets show that the performance of SRA-SIGAN model is good on the whole. Specifically on the Zara1 dataset， compared with SR-LSTM model，the Average Displacement Error （ADE）and Final Displacement Error （FDE）of SRA-SIGAN were reduced by 20.0% and 10.5%，respectively；compared with the SIGAN model，the ADE and FDE of SRA-SIGAN were decreased by 31.7% and 24.4%，respectively.

Key words: Generative Adversarial Network (GAN), Long Short-Term Memory (LSTM) network, pedestrian trajectory prediction, attention mechanism, pedestrian interaction

中图分类号:

TP18

吴家皋, 章仕稳, 蒋宇栋, 刘林峰. 基于状态精细化长短期记忆和注意力机制的社交生成对抗网络用于行人轨迹预测[J]. 计算机应用, 2023, 43(5): 1565-1570.

Jiagao WU, Shiwen ZHANG, Yudong JIANG, Linfeng LIU. Social-interaction GAN for pedestrian trajectory prediction based on state-refinement long short-term memory and attention mechanism[J]. Journal of Computer Applications, 2023, 43(5): 1565-1570.

图/表 8

图1 SRA-SIGAN模型结构

Fig. 1 Structure of SRA-SIGAN model

图2 速度注意力模块的示意图

Fig. 2 Schematic diagram of velocity attention module

表1 不同预测模型的ADE和FDE对比（tobs=8， tpred=12） ( m)

Tab. 1 ADE and FDE comparison of different prediction models （tobs=8， tpred=12）

数据集	SLSTM		SGAN		SR-LSTM		SIGAN		SRA-SIGAN
数据集	ADE	FDE	ADE	FDE	ADE	FDE	ADE	FDE	ADE	FDE
ETH	0.77	1.60	0.81	1.52	0.62	1.23	0.63	1.25	0.56	1.17
Hotel	0.38	0.80	0.72	1.61	0.52	1.01	0.37	0.74	0.41	0.81
Univ	0.58	1.28	0.60	1.26	0.48	1.06	0.51	1.10	0.40	0.98
Zara1	0.51	1.19	0.34	0.69	0.35	0.76	0.41	0.90	0.28	0.68
Zara2	0.39	0.89	0.42	0.84	0.32	0.69	0.32	0.70	0.29	0.66

表2 CIDNN与SRA-SIGAN的ADE对比（tobs=5， tpred=5） ( m)

Tab. 2 ADE comparison of CIDNN and SRA-SIGAN （tobs=5， tpred=5）

数据集	CIDNN	SRA-SIGAN	数据集	CIDNN	SRA-SIGAN
ETH	0.09	0.09	Zara1	0.15	0.11
Hotel	0.11	0.07	Zara2	0.10	0.07
Univ	0.12	0.10

表3 SRA-SIGAN与VA-SIGAN的ADE和FDE对比（tobs=8，tpred=12） ( m)

Tab. 3 ADE and FDE comparison of SRA-SIGAN and VA-SIGAN （tobs=8， tpred=12）

数据集	VA-SIGAN		SRA-SIGAN
数据集	ADE	FDE	ADE	FDE
ETH	0.61	1.21	0.56	1.17
Hotel	0.50	0.99	0.41	0.81
Univ	0.47	1.06	0.40	0.98
Zara1	0.31	0.73	0.28	0.68
Zara2	0.32	0.70	0.29	0.66

表4 SRA-SIGAN与SR-SIGAN的ADE和FDE对比（tobs=8， tpred=12） ( m)

Tab. 4 ADE and FDE comparison of SRA-SIGAN and SR-SIGAN （tobs=8， tpred=12）

数据集	SR-SIGAN		SRA-SIGAN
数据集	ADE	FDE	ADE	FDE
ETH	0.59	1.20	0.56	1.17
Hotel	0.44	0.84	0.41	0.81
Univ	0.43	1.01	0.40	0.98
Zara1	0.29	0.70	0.28	0.68
Zara2	0.31	0.68	0.29	0.66

图3 不同数据集上ADE与k的关系曲线

Fig. 3 Relation curve between ADE and k on different datasets

图4 不同数据集上FDE与k的关系曲线

Fig. 4 Relation curve betweenFDE and k on different datasets

参考文献 21

1	GRANT J M， FLYNN P J. Crowd scene understanding from video： a survey［J］. ACM Transactions on Multimedia Computing， Communications， and Applications， 2017， 13（2）： No.19. 10.1145/3052930
2	ALAHI A， RAMANATHAN V， LI F F. Socially-aware large-scale crowd forecasting［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 2211-2218. 10.1109/cvpr.2014.283
3	代雨柔，杨庆，张凤荔，等. 基于自监督学习的社交网络用户轨迹预测模型［J］. 计算机应用， 2021， 41（9）：2545-2551. 10.11772/j.issn.1001-9081.2020111859
	DAI Y R， YANG Q， ZHANG F L， et al. Trajectory prediction model of social network users based on self-supervised learning［J］. Journal of Computer Applications， 2021， 41（9）：2545-2551. 10.11772/j.issn.1001-9081.2020111859
4	MA L， TIAN S. A hybrid CNN-LSTM model for aircraft 4D trajectory prediction［J］. IEEE Access， 2020， 8： 134668-134680. 10.1109/access.2020.3010963
5	LIU J， WANG G， HU P， et al. Global context-aware attention LSTM networks for 3D action recognition［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3671-3680. 10.1109/cvpr.2017.391
6	DEO N， RANGESH A， TRIVEDI M M. How would surround vehicles move？ a unified framework for maneuver classification and motion prediction［J］. IEEE Transactions on Intelligent Vehicles， 2018， 3（2）： 129-140. 10.1109/tiv.2018.2804159
7	BAGAUTDINOV T， ALAHI A， FLEURET F， et al. Social scene understanding： end-to-end multi-person action localization and collective activity recognition［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE 2017： 3425-3434. 10.1109/cvpr.2017.365
8	祝亢，黄珍，王绪明. 基于深度强化学习的智能船舶航迹跟踪控制［J］. 中国舰船研究， 2021， 16（1）：105-113. 10.19693/j.issn.1673-3185.01940
	ZHU K， HUANG Z， WANG X M. Tracking control of intelligent ship based on deep reinforcement learning［J］. Chinese Journal of Ship Research， 2021， 16（1）：105-113. 10.19693/j.issn.1673-3185.01940
9	ALAHI A， GOEL K， RAMANATHAN V， et al. Social LSTM： human trajectory prediction in crowded spaces［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 961-971. 10.1109/cvpr.2016.110
10	LEE N， CHOI W， VERNAZA P， et al. DESIRE： distant future prediction in dynamic scenes with interacting agents［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2165-2174. 10.1109/cvpr.2017.233
11	BARTOLI F， LISANTI G， BALLAN L， et al. Context-aware trajectory prediction［C］// Proceedings of the 24th International Conference on Pattern Recognition. Piscataway： IEEE， 2018： 1941-1946. 10.1109/icpr.2018.8545447
12	XU Y Y， PIAO Z X， GAO S H. Encoding crowd interaction with deep neural network for pedestrian trajectory prediction［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 5275-5284. 10.1109/cvpr.2018.00553
13	GOODFELLOW I J， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial nets［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems — Volume 2. Cambridge： MIT Press， 2014： 2672-2680.
14	GUPTA A， JOHNSON J， LI F F， et al. Social GAN： socially acceptable trajectories with generative adversarial networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2255-2264. 10.1109/cvpr.2018.00240
15	AMIRIAN J， HAYET J B， PETTRÉ J. Social ways： learning multi-modal distributions of pedestrian trajectories with GANs［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2019： 2964-2972. 10.1109/cvprw.2019.00359
16	ZHANG S W， WU J G， DONG J B， et al. Social-interaction GAN： pedestrian trajectory prediction［C］// Proceedings of the 2021 International Conference on Wireless Algorithms， Systems， and Applications LNCS 12939. Cham： Springer， 2021： 429-440. 10.1007/978-3-030-86137-7_46
17	ZHANG P， OUYANG W L， ZHANG P F， et al. SR-LSTM： state refinement for LSTM towards pedestrian trajectory prediction［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 12077-12086. 10.1109/cvpr.2019.01236
18	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
19	MIRZA M， OSINDERO S. Conditional generative adversarial nets［EB/OL］. ［2022-04-01］..
20	PELLEGRINI S， ESS A， SCHINDLER K， et al. You'll never walk alone： modeling social behavior for multi-target tracking［C］// Proceedings of the IEEE 12th International Conference on Computer Vision. Piscataway： IEEE， 2009： 261-268. 10.1109/iccv.2009.5459260
21	LERNER A， CHRYSANTHOU Y， LISCHINSKI D. Crowds by example［J］. Computer Graphics Forum， 2007， 26（3）： 655-664. 10.1111/j.1467-8659.2007.01089.x

[1]	黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1620-1624.
[2]	郭劲文, 马兴华, 骆功宁, 王玮, 曹阳, 王宽全. 基于Transformer的结构强化IVOCT导丝伪影去除方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1596-1605.
[3]	张凯, 覃正楚, 刘月, 秦心怡. 多学习行为协同的知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1422-1429.
[4]	刘阳, 陆志扬, 王骏, 施俊. 基于自注意力连接UNet的磁共振成像去吉布斯伪影算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1606-1611.
[5]	丁正凯, 傅启明, 陈建平, 陆悠, 吴宏杰, 方能炜, 邢镔. 结合注意力机制与深度强化学习的超短期光伏功率预测[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1647-1654.
[6]	刘辉, 张琳玉, 王复港, 何如瑾. 基于注意力机制和上下文信息的目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1557-1564.
[7]	石利锋, 倪郑威. 基于槽位相关信息提取的对话状态追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1430-1437.
[8]	蒋瑞林, 覃仁超. 基于深度可分离卷积的多神经网络恶意代码检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1527-1533.
[9]	朱周华, 齐琦. 基于改进YOLOv5s电动车头盔的自动检测与识别[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1291-1296.
[10]	樊小宇, 蔺素珍, 王彦博, 刘峰, 李大威. 基于残差图卷积神经网络的高倍欠采样核磁共振图像重建算法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1261-1268.
[11]	陈路, 陈道喜, 陆一鸣, 陆卫忠. 基于注意力机制编码器‒解码器的手写数学公式识别模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1297-1302.
[12]	杨海宇, 郭文普, 康凯. 基于卷积长短时深度神经网络的信号调制方式识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1318-1322.
[13]	郝巨鸣, 杨景玉, 韩淑梅, 王阳萍. 引入Ghost模块和ECA的YOLOv4公路路面裂缝检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1284-1290.
[14]	王昊, 王子成, 张超, 马韵升. 基于生成对抗网络的数据不确定性量化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1094-1101.
[15]	袁泉, 徐雲鹏, 唐成亮. 基于路径标签的文档级关系抽取方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1029-1035.

基于状态精细化长短期记忆和注意力机制的社交生成对抗网络用于行人轨迹预测

Social-interaction GAN for pedestrian trajectory prediction based on state-refinement long short-term memory and attention mechanism

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 21

相关文章 15

编辑推荐

Metrics