《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (7): 2192-2199.DOI: 10.11772/j.issn.1001-9081.2023070926
收稿日期:
2023-07-09
修回日期:
2023-10-11
接受日期:
2023-10-13
发布日期:
2023-10-26
出版日期:
2024-07-10
通讯作者:
王一帆
作者简介:
徐松(2000—),男,安徽宿州人,硕士研究生,主要研究方向:视频显著性目标检测、弱监督显著性目标检测;基金资助:
Song XU1, Wenbo ZHANG1, Yifan WANG2()
Received:
2023-07-09
Revised:
2023-10-11
Accepted:
2023-10-13
Online:
2023-10-26
Published:
2024-07-10
Contact:
Yifan WANG
About author:
XU Song, born in 2000, M. S. candidate. His research interests include video salient object detection, weakly supervised salient object detection.Supported by:
摘要:
现有视频显著性目标检测(VSOD)网络面临2个问题:一是在捕获时间信息时计算成本过大,导致网络难以在移动端实际应用;二是网络泛化能力较弱,难以处理视频中诸如遮挡、运动模糊等挑战性场景。因此,提出一种基于动态滤波器和对比学习思想的轻量视频显著性目标检测网络。首先,对连续帧的每帧图像进行粗略的前景特征点采样并进行相似度矩阵的计算,利用相似度矩阵进行加权从而滤除存在的噪声特征;其次,用滤波后的前景特征生成动态滤波器参数,对原始特征图执行卷积操作以提取前景物体;同时在训练阶段设计了一个对比学习模块帮助网络学习,在推理阶段并不会引入额外的计算量。在三个数据集DAVIS、DAVSOD和VOS上进行了广泛实验,实验结果表明,所提网络相较于DCFNet (Dynamic Context-sensitive Filtering Network for video salient object detection),在F-measure、S-measure以及平均绝对误差(MAE)3个指标上性能接近,帧率从28 frame/s提升到38 frame/s,提升了35.7%,同时网络参数量仅有15.6×106,更有利于实际应用中在边缘侧进行部署。
中图分类号:
徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 计算机应用, 2024, 44(7): 2192-2199.
Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information[J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
网络 | 骨干网络 | 光流 | CRF | DAVIS | DAVSOD | VOS | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
F-measure/% | S-measure/% | MAE | F-measure/% | S-measure/% | MAE | F-measure/% | S-measure/% | MAE | ||||
SCOM[ | — | — | — | 78.30 | 83.20 | 0.048 | 46.40 | 59.90 | 0.220 | 69.00 | 71.20 | 0.162 |
DLVS[ | — | — | — | 70.80 | 79.40 | 0.061 | 52.10 | 65.70 | 0.129 | 67.50 | 76.00 | 0.099 |
PDB[ | ResNet50 | √ | √ | 85.50 | 88.20 | 0.028 | 57.20 | 69.80 | 0.116 | 74.20 | 81.80 | 0.078 |
SSAV[ | ResNet50 | √ | √ | 86.10 | 89.30 | 0.028 | 60.30 | 72.40 | 0.090 | 74.20 | 81.90 | 0.073 |
STFA[ | ResNet50 | √ | √ | 86.50 | 89.20 | 0.023 | 65.10 | 74.60 | 0.086 | 79.10 | 85.00 | 0.058 |
CAS[ | ResNet50 | √ | √ | 86.00 | 87.30 | 0.032 | 60.80 | 69.90 | 0.086 | 77.40 | 80.80 | 0.051 |
RCRNet[ | ResNet50 | √ | √ | 84.80 | 88.60 | 0.027 | 65.30 | 74.10 | 0.087 | 83.30 | 87.30 | 0.051 |
MGA[ | ResNet50 | √ | √ | 89.20 | 91.20 | 0.022 | 65.60 | 75.10 | 0.081 | 73.50 | 79.20 | 0.075 |
FSNet[ | ResNet50 | √ | √ | 90.70 | 92.00 | 0.020 | 68.50 | 77.30 | 0.072 | — | — | — |
DCFNet[ | ResNet101 | × | × | 91.00 | 91.40 | 0.016 | 66.00 | 74.10 | 0.074 | 79.10 | 84.60 | 0.060 |
PCSA[ | MobilenetV3 | × | × | 88.00 | 90.20 | 0.022 | 65.50 | 74.10 | 0.086 | 74.70 | 82.70 | 0.065 |
本文网络 | MobilenetV3 | × | × | 89.70 | 90.90 | 0.018 | 65.80 | 74.00 | 0.080 | 78.20 | 83.80 | 0.062 |
表1 不同网络在DAVIS、DAVSOD以及VOS数据集上的性能对比
Tab. 1 Performance comparison of different networks on DAVIS, DAVSOD and VOS datasets
网络 | 骨干网络 | 光流 | CRF | DAVIS | DAVSOD | VOS | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
F-measure/% | S-measure/% | MAE | F-measure/% | S-measure/% | MAE | F-measure/% | S-measure/% | MAE | ||||
SCOM[ | — | — | — | 78.30 | 83.20 | 0.048 | 46.40 | 59.90 | 0.220 | 69.00 | 71.20 | 0.162 |
DLVS[ | — | — | — | 70.80 | 79.40 | 0.061 | 52.10 | 65.70 | 0.129 | 67.50 | 76.00 | 0.099 |
PDB[ | ResNet50 | √ | √ | 85.50 | 88.20 | 0.028 | 57.20 | 69.80 | 0.116 | 74.20 | 81.80 | 0.078 |
SSAV[ | ResNet50 | √ | √ | 86.10 | 89.30 | 0.028 | 60.30 | 72.40 | 0.090 | 74.20 | 81.90 | 0.073 |
STFA[ | ResNet50 | √ | √ | 86.50 | 89.20 | 0.023 | 65.10 | 74.60 | 0.086 | 79.10 | 85.00 | 0.058 |
CAS[ | ResNet50 | √ | √ | 86.00 | 87.30 | 0.032 | 60.80 | 69.90 | 0.086 | 77.40 | 80.80 | 0.051 |
RCRNet[ | ResNet50 | √ | √ | 84.80 | 88.60 | 0.027 | 65.30 | 74.10 | 0.087 | 83.30 | 87.30 | 0.051 |
MGA[ | ResNet50 | √ | √ | 89.20 | 91.20 | 0.022 | 65.60 | 75.10 | 0.081 | 73.50 | 79.20 | 0.075 |
FSNet[ | ResNet50 | √ | √ | 90.70 | 92.00 | 0.020 | 68.50 | 77.30 | 0.072 | — | — | — |
DCFNet[ | ResNet101 | × | × | 91.00 | 91.40 | 0.016 | 66.00 | 74.10 | 0.074 | 79.10 | 84.60 | 0.060 |
PCSA[ | MobilenetV3 | × | × | 88.00 | 90.20 | 0.022 | 65.50 | 74.10 | 0.086 | 74.70 | 82.70 | 0.065 |
本文网络 | MobilenetV3 | × | × | 89.70 | 90.90 | 0.018 | 65.80 | 74.00 | 0.080 | 78.20 | 83.80 | 0.062 |
网络 | 参数量/106 | 推理时间/s |
---|---|---|
SSAV | 81.2 | 0.450 |
AGNN | 82.3 | 0.550 |
MGA | 254.0 | 0.290 |
AnDiff[ | 79.3 | 0.360 |
DCFNet | 274.0 | 0.036 |
本文网络 | 15.6 | 0.026 |
表2 不同网络的参数量和推理时间比较
Tab. 2 Comparison of parameter quantity and inference time among different networks
网络 | 参数量/106 | 推理时间/s |
---|---|---|
SSAV | 81.2 | 0.450 |
AGNN | 82.3 | 0.550 |
MGA | 254.0 | 0.290 |
AnDiff[ | 79.3 | 0.360 |
DCFNet | 274.0 | 0.036 |
本文网络 | 15.6 | 0.026 |
超参数K | F-measure/% | S-measure/% | MAE |
---|---|---|---|
0 | 86.00 | 88.00 | 0.027 |
10 | 88.89 | 90.00 | 0.020 |
20 | 89.51 | 90.43 | 0.019 |
50 | 89.70 | 90.91 | 0.018 |
100 | 89.64 | 90.94 | 0.018 |
300 | 89.71 | 90.90 | 0.018 |
表3 超参数K对网络性能的影响
Tab. 3 Influence of hyper-parameter K on network performance
超参数K | F-measure/% | S-measure/% | MAE |
---|---|---|---|
0 | 86.00 | 88.00 | 0.027 |
10 | 88.89 | 90.00 | 0.020 |
20 | 89.51 | 90.43 | 0.019 |
50 | 89.70 | 90.91 | 0.018 |
100 | 89.64 | 90.94 | 0.018 |
300 | 89.71 | 90.90 | 0.018 |
选点方式 | F-measure/% | S-measure/% | MAE |
---|---|---|---|
阈值+等概率采样 | 89.06 | 90.22 | 0.019 |
自适应最大池化 | 88.89 | 90.00 | 0.020 |
Softmax+置信度 | 89.70 | 90.91 | 0.018 |
表4 不同选点方式对网络性能的影响
Tab. 4 Influence of different selection methods on network performance
选点方式 | F-measure/% | S-measure/% | MAE |
---|---|---|---|
阈值+等概率采样 | 89.06 | 90.22 | 0.019 |
自适应最大池化 | 88.89 | 90.00 | 0.020 |
Softmax+置信度 | 89.70 | 90.91 | 0.018 |
是否进行投票 | F-measure/% | S-measure/% | MAE |
---|---|---|---|
是 | 89.70 | 90.91 | 0.018 |
否 | 89.41 | 90.64 | 0.018 |
表5 前景特征投票对于网络性能的影响
Tab. 5 Influence of foreground feature voting on network performance
是否进行投票 | F-measure/% | S-measure/% | MAE |
---|---|---|---|
是 | 89.70 | 90.91 | 0.018 |
否 | 89.41 | 90.64 | 0.018 |
融合方式 | F-measure/% | S-measure/% | MAE |
---|---|---|---|
直接加 | 89.59 | 90.84 | 0.019 |
进行拼接 | 89.63 | 90.84 | 0.018 |
动态信息融合 | 89.70 | 90.91 | 0.018 |
表6 解码器阶段特征融合对于网络性能的影响
Tab. 6 Influence of decoder stage feature fusion on network performance
融合方式 | F-measure/% | S-measure/% | MAE |
---|---|---|---|
直接加 | 89.59 | 90.84 | 0.019 |
进行拼接 | 89.63 | 90.84 | 0.018 |
动态信息融合 | 89.70 | 90.91 | 0.018 |
1 | TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787. |
2 | PAN Y, YAO T, LI H, et al. Video captioning with transferred semantic attributes [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6504-6512. |
3 | JI W, YU S, WU J, et al. Learning calibrated medical image segmentation via multi-rater agreement modeling [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12336-12346. |
4 | ITTI L. Automatic foveation for video compression using a neurobiological model of visual attention [J]. IEEE Transactions on Image Processing, 2004, 13(10): 1304-1318. |
5 | HADIZADEH H, BAJIĆ I V. Saliency-aware video compression [J]. IEEE Transactions on Image Processing, 2014, 23(1): 19-33. |
6 | WU H, LI G, LUO X. Weighted attentional blocks for probabilistic object tracking [J]. The Visual Computer, 2014, 30: 229-243. |
7 | YAN P, LI G, XIE Y, et al. Semi-supervised video salient object detection using pseudo-labels [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7283-7292. |
8 | GU Y, WANG L, WANG Z, et al. Pyramid constrained self-attention network for fast video salient object detection [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 10869-10876. |
9 | YANG Z, WANG Q, BERTINETTO L, et al. Anchor diffusion for unsupervised video object segmentation [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 931-940. |
10 | YANG S, ZHANG L, QI J, et al. Learning motion-appearance co-attention for zero-shot video object segmentation [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 1544-1553. |
11 | WANG W, LU X, SHEN J, et al. Zero-shot video object segmentation via attentive graph neural networks [C]// Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2019: 9236-9245. |
12 | YANG B, BENDER G, LE Q V, et al. CondConv: conditionally parameterized convolutions for efficient inference [C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 1307-1318. |
13 | DIBA A, SHARMA V, VAN GOOL L, et al. DynamoNet: dynamic action and motion network [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6191-6200. |
14 | ZHOU S, ZHANG J, PAN J, et al. Spatio-temporal filter adaptive network for video deblurring [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 2482-2491. |
15 | HE J, DENG Z, QIAO Y. Dynamic multi-scale filters for semantic segmentation [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 3561-3571. |
16 | PANG Y, ZHANG L, ZHAO X, et al. Hierarchical dynamic filtering network for RGB-D salient object detection [C]// Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 235-252. |
17 | YU S, XIAO J, ZHANG B, et al. Democracy does matter: comprehensive feature mining for co-salient object detection [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 969-978. |
18 | ZHANG M, LIU J, WANG Y, et al. Dynamic context-sensitive filtering network for video salient object detection [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 1533-1543. |
19 | CHEN Y, ZOU W, TANG Y, et al. SCOM: spatiotemporal constrained optimization for salient object detection [J]. IEEE Transactions on Image Processing, 2018, 27(7): 3345-3357. |
20 | WANG W, SHEN J, SHAO L. Video salient object detection via fully convolutional networks [J]. IEEE Transactions on Image Processing, 2017, 27(1): 38-49. |
21 | SONG H, WANG W, ZHAO S, et al. Pyramid dilated deeper ConvLSTM for video salient object detection [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 715-731. |
22 | FAN D-P, WANG W, CHENG M-M, et al. Shifting more attention to video salient object detection [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 8546-8556. |
23 | CHEN C, WANG G, PENG C, et al. Exploring rich and efficient spatial temporal interactions for real-time video salient object detection [J]. IEEE Transactions on Image Processing, 2021, 30: 3995-4007. |
24 | JI Y, ZHANG H, JIE Z, et al. CASNet: a cross-attention Siamese network for video salient object detection [J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(6): 2676-2690. |
25 | LI H, CHEN G, LI G, et al. Motion guided attention for video salient object detection [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7273-7282. |
26 | JI G-P, FU K, WU Z, et al. Full-duplex strategy for video object segmentation [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 4902-4913. |
[1] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[2] | 王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918. |
[3] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
[4] | 杨兴耀, 陈羽, 于炯, 张祖莲, 陈嘉颖, 王东晓. 结合自我特征和对比学习的推荐模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2704-2710. |
[5] | 李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703. |
[6] | 潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877. |
[7] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
[8] | 黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969. |
[9] | 薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392. |
[10] | 汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399. |
[11] | 高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406. |
[12] | 刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557. |
[13] | 李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594. |
[14] | 莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617. |
[15] | 顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||