基于Res2Net-YOLACT和融合特征的室内跌倒检测算法

doi:10.11772/j.issn.1001-9081.2021040857

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (3): 757-763.DOI: 10.11772/j.issn.1001-9081.2021040857

• 2021年中国计算机学会人工智能会议(CCFAI 2021) • 上一篇

基于Res2Net-YOLACT和融合特征的室内跌倒检测算法

张璐, 方春, 祝铭()

山东理工大学计算机科学与技术学院，山东淄博 255049

收稿日期:2021-05-25 修回日期:2021-06-30 接受日期:2021-07-06 发布日期:2021-11-09 出版日期:2022-03-10
通讯作者: 祝铭
作者简介:张璐（1996—），男，山东潍坊人，硕士研究生，主要研究方向：计算机视觉、深度学习
方春（1981—），女，山东淄博人，讲师，博士，主要研究方向：智能计算、模式识别；
基金资助:
国家自然科学基金资助项目(61602280);山东省高等学校优秀青年创新团队支持计划项目(2019KJN048);淄博市校城融合发展计划项目(2019ZBXC114)

Indoor fall detection algorithm based on Res2Net-YOLACT and fusion feature

Lu ZHANG, Chun FANG, Ming ZHU()

School of Computer Science and Technology，Shandong University of Technology，Zibo Shandong 255049，China

Received:2021-05-25 Revised:2021-06-30 Accepted:2021-07-06 Online:2021-11-09 Published:2022-03-10
Contact: Ming ZHU
About author:ZHANG Lu， born in 1996， M. S. candidate. His research interests include computer vision， deep learning.
FANG Chun， born in 1981， Ph. D.， lecturer. Her research interests include intelligence computation， pattern recognition.
Supported by:
National Natural Science Foundation of China(61602280);Outstanding Youth Innovation Team Support Program of Shandong Colleges and Universities(2019KJN048);University and City Integration Development Program of Zibo City(2019ZBXC114)

摘要/Abstract

摘要：

为了加强对老年人的监护、降低跌倒带来的安全风险，提出了一种新的基于Res2Net-YOLACT和融合特征的室内跌倒检测算法。首先，通过融入Res2Net模块的YOLACT网络来提取视频图像序列中的人体轮廓；然后，利用两级判断的方法做出跌倒决策，其中一级判别通过运动速度特征粗略判断是否发生异常状态，二级通过融合人体形状特征和深度特征的模型结构对人体姿势进行判别；最后，当检测出跌倒且发生时间大于阈值时，发出跌倒报警。实验结果表明，该跌倒检测算法可以在复杂的场景下很好地提取到人体轮廓，对光照的鲁棒性较好，并且检测速度可达每秒28帧，能满足实时检测要求。此外，融入手工特征后的算法分类性能表现更优，分类准确率达98.65%，比卷积神经网络（CNN）特征算法提升了1.03个百分点。

关键词: 健康监护, YOLACT, 融合特征, 卷积神经网络, 跌倒检测

Abstract:

In order to strengthen the monitoring of old people and reduce the safety risks caused by falls， a new indoor fall detection algorithm based on Res2Net-YOLACT and fusion feature was proposed. For the video image sequences， firstly， the YOLACT network integrated with Res2Net module was used to extract the human body contour， and then a two-level judgment method was used to make a fall decision. In the first level， whether an abnormal state occurs was judged roughly through the movement speed feature， and in the second level， the human body posture was determined through the model structure that combines the body shape features and the depth feature. Finally， when fall posture was detected and the occurrence time was greater than the threshold， a fall alarm was given. Experimental results show that the proposed fall detection algorithm can extract the human body contour well in complex scenes， which has good robustness to illumination as well as a real-time performance of up to 28 fps （frames per second）. In addition， the classification performance of the algorithm after adding manual features is better， the classification accuracy is 98.65%， which is 1.03 percentage points higher than that of the algorithm with original CNN （Convolutional Neural Network） features.

Key words: health care, YOLACT, fusion feature, Convolutional Neural Network (CNN), fall detection

中图分类号:

TP391

张璐, 方春, 祝铭. 基于Res2Net-YOLACT和融合特征的室内跌倒检测算法[J]. 计算机应用, 2022, 42(3): 757-763.

Lu ZHANG, Chun FANG, Ming ZHU. Indoor fall detection algorithm based on Res2Net-YOLACT and fusion feature[J]. Journal of Computer Applications, 2022, 42(3): 757-763.

图/表 14

参考文献 23

1	丁志宏，杜书然，王明鑫. 我国城市老年人跌倒状况及其影响因素研究［J］. 人口与发展， 2018， 24（4）：120-128.
	DING Z H， DU S R， WANG M X. Research on the falls and its risk factors among the urban aged in China［J］. Population and Development， 2018， 24（4）：120-128.
2	师昉，李福亮，张思佳，等. 中国老年跌倒研究的现状与对策［J］. 中国康复， 2018， 33（3）：246-248. 10.3870/zgkf.2018.03.021
	SHI F， LI F L， ZHANG S J， et al. The status quo and countermeasures of research on elderly falls in China［J］. Chinese Journal of Rehabilitation， 2018， 33（3）：246-248. 10.3870/zgkf.2018.03.021
3	SANTOS G， ENDO P， MONTEIRO K， et al. Accelerometer-based human fall detection using convolutional neural networks［J］. Sensors， 2019， 19（7）：1644. 10.3390/s19071644
4	CLEMENTE J， SONG W Z， VALERO M， et al. Indoor person identification and fall detection through non-intrusive floor seismic sensing［C］// Proceedings of the 2019 IEEE International Conference on Smart Computing. Piscataway： IEEE， 2019： 417-424. 10.1109/smartcomp.2019.00081
5	EZATZADEH S， KEYVANPOUR M R. ViFa： an analytical framework for vision-based fall detection in a surveillance environment［J］. Multimedia Tools and Applications， 2019， 78（18）： 25515-25537. 10.1007/s11042-019-7720-3
6	LU X， XU C， WANG L， et al. Improved background subtraction method for detecting moving objects based on GMM［J］. IEEE Transactions on Electrical and Electronic Engineering， 2018， 13（11）： 1540-1550. 10.1002/tee.22718
7	KRUNGKAEW R， KUSAKUNNIRAN W. Foreground segmentation in a video by using a novel dynamic codebook［C］// Proceedings of the 2016 13th International Conference on Electrical Engineering/Electronics， Computer， Telecommunications and Information Technology. Piscataway： IEEE， 2016：1-6. 10.1109/ecticon.2016.7561253
8	HE B， YU S. An improved background subtraction method based on ViBe［C］// Proceedings of the 7th Chinese Conference on Pattern Recognition. Cham： Springer， 2016： 356-368. 10.1007/978-981-10-3002-4_30
9	MIN W， WEI L， HAN Q， et al. Human fall detection based on motion tracking and shape aspect ratio［J］. International Journal of Multimedia and Ubiquitous Engineering， 2016， 11（10）： 1-14. 10.14257/ijmue.2016.11.10.01
10	VAIDEHI V， GANAPATHY K， MOHAN K， et al. Video based automatic fall detection in indoor environment［C］// Proceedings of the 2011 International Conference on Recent Trends in Information Technology. Piscataway： IEEE， 2011： 1016-1020. 10.1109/icrtit.2011.5972252
11	LIN C， WANG S， HONG J， et al. Vision-based fall detection through shape features［C］// Proceedings of the 2016 IEEE 2nd International Conference on Multimedia Big Data. Piscataway： IEEE， 2016： 237-240. 10.1109/bigmm.2016.22
12	MIRMAHBOUB B， SAMAVI S， KARIMI N， et al. Automatic monocular system for human fall detection based on variations in silhouette area［J］. IEEE Transactions on Biomedical Engineering， 2013， 60（2）：427-436. 10.1109/tbme.2012.2228262
13	TRA K， PHAM T V. Human fall detection based on adaptive background mixture model and HMM［C］// Proceedings of the 2013 International Conference on Advanced Technologies for Communications. Piscataway： IEEE， 2013：95-100. 10.1109/atc.2013.6698085
14	YU M， GONG L， KOLLIAS S. Computer vision based fall detection by a convolutional neural network［C］// Proceedings of the 19th ACM International Conference on Multimodal Interaction. New York： ACM， 2017： 416-420. 10.1145/3136755.3136802
15	KHAN M A， SHARIF M， AKRAM T， et al. Hand-crafted and deep convolutional neural network features fusion and selection strategy： an application to intelligent human action recognition［J］. Applied Soft Computing， 2019， 87：105986. 10.1016/j.asoc.2019.105986
16	BOLYA D， ZHOU C， XIAO F， et al. YOLACT： real-time instance segmentation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9157-9166. 10.1109/iccv.2019.00925
17	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
18	GAO S， CHENG M， ZHAO K， et al. Res2Net： a new multi-scale backbone architecture［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2019， 43（2）： 652-662 .
19	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector［C］// Proceedings of the 2016 European Conference on Computer Vision. Cham： Springer， 2016： 21-37. 10.1007/978-3-319-46448-0_2
20	YU M， YU Y， RHUMA A， et al. An online one class support vector machine-based person-specific fall detection system for monitoring an elderly individual in a room environment［J］. IEEE Journal of Biomedical & Health Informatics， 2013， 17（6）：1002-1014. 10.1109/jbhi.2013.2274479
21	YAO G， LEI T， ZHONG J. A review of convolutional-neural-network-based action recognition［J］. Pattern Recognition Letters， 2018， 118：14-22. 10.1016/j.patrec.2018.05.018
22	TAN C， SUN F， KONG T， et al. A survey on deep transfer learning［C］// Proceedings of the 27th International Conference on Artificial Neural Networks. Cham： Springer， 2018：270-279. 10.1007/978-3-030-01424-7_27
23	FENG Q， GAO C， WANG L， et al. Spatio-temporal fall event detection in complex scenes using attention guided LSTM［J］. Pattern Recognition Letters， 2020， 130： 242-249. 10.1016/j.patrec.2018.08.031

网络层	卷积结构	卷积核	卷积步长	特征图大小
输入层	—	—	—	550×550
Conv1	—	7×7@64	2	275×275
Pool1	Maxpool	3×3@64	2	138×138
L1	Bottlenect （×3）	1×1@64 3×3@64 3×3@64 3×3@64 1×1@256	—	138×138
L2	Bottlenect （×4）	1×1@128 3×3@128 3×3@128 3×3@128 1×1@512	—	69×69
L3	Bottlenect （×6）	1×1@256 3×3@256 3×3@256 3×3@256 1×1@1024	—	35×35
L4	Bottlenect （×3）	1×1@512 3×3@512 3×3@512 3×3@512 1×1@2 048	—	18×18

网络层	卷积结构	卷积核	卷积步长	特征图大小
输入层	—	—	—	550×550
Conv1	—	7×7@64	2	275×275
Pool1	Maxpool	3×3@64	2	138×138
L1	Bottlenect （×3）	1×1@64 3×3@64 3×3@64 3×3@64 1×1@256	—	138×138
L2	Bottlenect （×4）	1×1@128 3×3@128 3×3@128 3×3@128 1×1@512	—	69×69
L3	Bottlenect （×6）	1×1@256 3×3@256 3×3@256 3×3@256 1×1@1024	—	35×35
L4	Bottlenect （×3）	1×1@512 3×3@512 3×3@512 3×3@512 1×1@2 048	—	18×18

层数	输入尺寸	卷积核	池化	输出尺寸
输入层	30×30	—	—	30×30
Conv1	30×30	3×3@32	—	28×28@32
Pooling1	28×28	—	Max pooling	14×14@32
Conv2	14×14	3×3@16	—	12×12@16
Pooling2	12×12	—	Max pooling	6×6@16
Conv3	6×6	3×3@8	—	4×4@8
Pooling3	4×4	—	Max pooling	4×4@8
FC	4×4@8	—	—	1×128
FC	1×128	—	—	1×64
输出层	1×64	—	—	1×4

层数	输入尺寸	卷积核	池化	输出尺寸
输入层	30×30	—	—	30×30
Conv1	30×30	3×3@32	—	28×28@32
Pooling1	28×28	—	Max pooling	14×14@32
Conv2	14×14	3×3@16	—	12×12@16
Pooling2	12×12	—	Max pooling	6×6@16
Conv3	6×6	3×3@8	—	4×4@8
Pooling3	4×4	—	Max pooling	4×4@8
FC	4×4@8	—	—	1×128
FC	1×128	—	—	1×64
输出层	1×64	—	—	1×4

视频名称	视频大小	录制环境	包含的行为活动
Video1	320×240	白天光照充足	站立、弯身、跌倒
Video2	320×240	白天光照充足	站立、弯身、坐
Video3	320×240	夜晚灯光（亮光）	站立、弯身、跌倒
Video4	320×240	夜晚灯光（亮光）	站立、弯身、坐
Video5	320×240	夜晚灯光（暗光）	站立、弯身、跌倒
Video6	320×240	夜晚灯光（暗光）	站立、弯身、坐

基于Res2Net-YOLACT和融合特征的室内跌倒检测算法

Indoor fall detection algorithm based on Res2Net-YOLACT and fusion feature

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 23

相关文章 15

编辑推荐

Metrics

算法	视频名称	Recall/%	Precision/%	F1/%	速度/fps
GMM	Video1	88.43	34.56	49.70	—
	Video2	91.67	45.24	60.52
	Video3	79.21	37.86	51.23
	Video4	73.77	32.98	45.58
	Video5	88.18	40.17	55.20
	Video6	75.64	36.61	49.34
	均值	82.82	37.90	52.00
Codebook	Video1	78.34	48.34	59.79	—
	Video2	85.71	36.87	51.56
	Video3	76.81	38.57	51.35
	Video4	70.66	37.81	49.26
	Video5	82.19	40.18	53.97
	Video6	73.24	43.46	54.55
	均值	77.83	40.35	53.16
Mask RCNN	Video1	97.64	96.81	97.22	≈7
	Video2	98.33	95.33	96.80
	Video3	96.44	94.74	95.58
	Video4	96.14	96.83	96.48
	Video5	93.64	95.77	94.69
	Video6	94.87	94.45	94.66
	均值	96.18	95.66	95.92
YOLACT	Video1	97.11	96.17	96.63	≈24
	Video2	97.32	94.26	95.77
	Video3	94.65	95.17	94.91
	Video4	96.15	95.83	95.99
	Video5	93.12	94.31	93.71
	Video6	94.98	95.14	95.06
	均值	95.56	95.15	95.35
Res2Net-YOLACT	Video1	96.11	96.06	96.08	≈28
	Video2	97.27	94.43	95.83
	Video3	93.91	94.81	94.36
	Video4	95.91	96.03	95.97
	Video5	93.10	94.52	93.80
	Video6	94.66	94.48	94.57
	均值	95.16	95.06	95.11

目标检测算法	跌倒检测方法	跌倒检测帧数	误判帧数	实际跌倒帧数	准确率/%	误判率/%
Codebook	阈值法	60	7	70	85.71	10.00
GMM		61	6		87.14	8.57
Mask RCNN		65	5		92.86	7.14
YOLACT		65	5		92.86	7.14
Res2Net-YOLACT		65	5		92.86	7.14
Codebook	CNN 分类	61	4	70	87.14	5.71
GMM		62	3		88.57	4.29
Mask RCNN		66	2		94.29	2.86
YOLACT		67	2		95.71	2.86
Res2Net-YOLACT		67	2		95.71	2.86
Codebook	本文算法	58	4	70	82.86	5.71
GMM		61	4		87.14	5.71
Mask RCNN		68	1		97.14	1.43
YOLACT		67	2		95.71	2.86
Res2Net-YOLACT		68	1		97.14	1.43

光照	跌倒检测帧数	误判帧数	实际跌倒帧数	准确率/%	误判率/%
正常光	66	0	68	97.06	0.00
干扰光	70	2	73	95.89	2.74
正常光	69	1	71	97.18	1.41
干扰光	68	0	70	97.14	0.00
正常光	62	1	65	95.38	1.54
干扰光	64	0	67	95.52	0.00
正常光	72	2	74	97.30	2.70
干扰光	67	1	70	95.71	1.43
正常光	60	1	63	95.24	1.59
干扰光	58	0	60	96.67	0.00

活动	视频片段数	检测结果
活动	视频片段数	跌倒数	非跌倒数
行走	100	0	100
坐起	100	3	97
下弯	100	3	97
躺	100	2	98
跌倒	100	97	3

[1]	黄勇康, 梁美玉, 王笑笑, 陈徵, 曹晓雯. 基于深度时空残差卷积神经网络的课堂教学视频中多人课堂行为识别[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 736-742.
[2]	潘仁志, 钱付兰, 赵姝, 张燕平. 基于卷积神经网络交互的用户属性偏好建模的推荐模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 404-411.
[3]	李薇, 樊瑶驰, 江巧永, 王磊, 徐庆征. 基于教与学优化的可变卷积自编码器的医学图像分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 592-598.
[4]	曹建荣, 朱亚琴, 张玉婷, 吕俊杰, 杨红娟. 基于关节点特征的跌倒检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 622-630.
[5]	陈薪羽, 刘明哲, 任俊, 汤影. 基于多列卷积神经网络的参数异步更新算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 395-403.
[6]	富坤, 高金辉, 赵晓梦, 李佳宁. 融合全局结构信息的拓扑优化图卷积网络[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 357-364.
[7]	包银鑫, 曹阳, 施佺. 基于改进时空残差卷积神经网络的城市路网短时交通流预测[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 258-264.
[8]	李恒鑫, 常侃, 谭宇飞, 凌铭阳, 覃团发. 应用通道间相关性及增强信息蒸馏的彩色图像去马赛克网络[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 245-251.
[9]	马敬奇, 雷欢, 陈敏翼. 基于AlphaPose优化模型的老人跌倒行为检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 294-301.
[10]	李建明, 陈斌, 江志伟, 覃健. 优化搜索空间下带约束的可微分神经网络架构搜索[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 44-49.
[11]	许慧青, 陈斌, 王敬飞, 陈志毅, 覃健. 基于卷积神经网络的细长路面病害检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 265-272.
[12]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[13]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[14]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[15]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.