Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2192-2199.DOI: 10.11772/j.issn.1001-9081.2023070926
• Multimedia computing and computer simulation • Previous Articles Next Articles
Song XU1, Wenbo ZHANG1, Yifan WANG2(
)
Received:2023-07-09
Revised:2023-10-11
Accepted:2023-10-13
Online:2023-10-26
Published:2024-07-10
Contact:
Yifan WANG
About author:XU Song, born in 2000, M. S. candidate. His research interests include video salient object detection, weakly supervised salient object detection.Supported by:通讯作者:
王一帆
作者简介:徐松(2000—),男,安徽宿州人,硕士研究生,主要研究方向:视频显著性目标检测、弱监督显著性目标检测;基金资助:CLC Number:
Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information[J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023070926
| 网络 | 骨干网络 | 光流 | CRF | DAVIS | DAVSOD | VOS | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F-measure/% | S-measure/% | MAE | F-measure/% | S-measure/% | MAE | F-measure/% | S-measure/% | MAE | ||||
| SCOM[ | — | — | — | 78.30 | 83.20 | 0.048 | 46.40 | 59.90 | 0.220 | 69.00 | 71.20 | 0.162 |
| DLVS[ | — | — | — | 70.80 | 79.40 | 0.061 | 52.10 | 65.70 | 0.129 | 67.50 | 76.00 | 0.099 |
| PDB[ | ResNet50 | √ | √ | 85.50 | 88.20 | 0.028 | 57.20 | 69.80 | 0.116 | 74.20 | 81.80 | 0.078 |
| SSAV[ | ResNet50 | √ | √ | 86.10 | 89.30 | 0.028 | 60.30 | 72.40 | 0.090 | 74.20 | 81.90 | 0.073 |
| STFA[ | ResNet50 | √ | √ | 86.50 | 89.20 | 0.023 | 65.10 | 74.60 | 0.086 | 79.10 | 85.00 | 0.058 |
| CAS[ | ResNet50 | √ | √ | 86.00 | 87.30 | 0.032 | 60.80 | 69.90 | 0.086 | 77.40 | 80.80 | 0.051 |
| RCRNet[ | ResNet50 | √ | √ | 84.80 | 88.60 | 0.027 | 65.30 | 74.10 | 0.087 | 83.30 | 87.30 | 0.051 |
| MGA[ | ResNet50 | √ | √ | 89.20 | 91.20 | 0.022 | 65.60 | 75.10 | 0.081 | 73.50 | 79.20 | 0.075 |
| FSNet[ | ResNet50 | √ | √ | 90.70 | 92.00 | 0.020 | 68.50 | 77.30 | 0.072 | — | — | — |
| DCFNet[ | ResNet101 | × | × | 91.00 | 91.40 | 0.016 | 66.00 | 74.10 | 0.074 | 79.10 | 84.60 | 0.060 |
| PCSA[ | MobilenetV3 | × | × | 88.00 | 90.20 | 0.022 | 65.50 | 74.10 | 0.086 | 74.70 | 82.70 | 0.065 |
| 本文网络 | MobilenetV3 | × | × | 89.70 | 90.90 | 0.018 | 65.80 | 74.00 | 0.080 | 78.20 | 83.80 | 0.062 |
Tab. 1 Performance comparison of different networks on DAVIS, DAVSOD and VOS datasets
| 网络 | 骨干网络 | 光流 | CRF | DAVIS | DAVSOD | VOS | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F-measure/% | S-measure/% | MAE | F-measure/% | S-measure/% | MAE | F-measure/% | S-measure/% | MAE | ||||
| SCOM[ | — | — | — | 78.30 | 83.20 | 0.048 | 46.40 | 59.90 | 0.220 | 69.00 | 71.20 | 0.162 |
| DLVS[ | — | — | — | 70.80 | 79.40 | 0.061 | 52.10 | 65.70 | 0.129 | 67.50 | 76.00 | 0.099 |
| PDB[ | ResNet50 | √ | √ | 85.50 | 88.20 | 0.028 | 57.20 | 69.80 | 0.116 | 74.20 | 81.80 | 0.078 |
| SSAV[ | ResNet50 | √ | √ | 86.10 | 89.30 | 0.028 | 60.30 | 72.40 | 0.090 | 74.20 | 81.90 | 0.073 |
| STFA[ | ResNet50 | √ | √ | 86.50 | 89.20 | 0.023 | 65.10 | 74.60 | 0.086 | 79.10 | 85.00 | 0.058 |
| CAS[ | ResNet50 | √ | √ | 86.00 | 87.30 | 0.032 | 60.80 | 69.90 | 0.086 | 77.40 | 80.80 | 0.051 |
| RCRNet[ | ResNet50 | √ | √ | 84.80 | 88.60 | 0.027 | 65.30 | 74.10 | 0.087 | 83.30 | 87.30 | 0.051 |
| MGA[ | ResNet50 | √ | √ | 89.20 | 91.20 | 0.022 | 65.60 | 75.10 | 0.081 | 73.50 | 79.20 | 0.075 |
| FSNet[ | ResNet50 | √ | √ | 90.70 | 92.00 | 0.020 | 68.50 | 77.30 | 0.072 | — | — | — |
| DCFNet[ | ResNet101 | × | × | 91.00 | 91.40 | 0.016 | 66.00 | 74.10 | 0.074 | 79.10 | 84.60 | 0.060 |
| PCSA[ | MobilenetV3 | × | × | 88.00 | 90.20 | 0.022 | 65.50 | 74.10 | 0.086 | 74.70 | 82.70 | 0.065 |
| 本文网络 | MobilenetV3 | × | × | 89.70 | 90.90 | 0.018 | 65.80 | 74.00 | 0.080 | 78.20 | 83.80 | 0.062 |
| 网络 | 参数量/106 | 推理时间/s |
|---|---|---|
| SSAV | 81.2 | 0.450 |
| AGNN | 82.3 | 0.550 |
| MGA | 254.0 | 0.290 |
| AnDiff[ | 79.3 | 0.360 |
| DCFNet | 274.0 | 0.036 |
| 本文网络 | 15.6 | 0.026 |
Tab. 2 Comparison of parameter quantity and inference time among different networks
| 网络 | 参数量/106 | 推理时间/s |
|---|---|---|
| SSAV | 81.2 | 0.450 |
| AGNN | 82.3 | 0.550 |
| MGA | 254.0 | 0.290 |
| AnDiff[ | 79.3 | 0.360 |
| DCFNet | 274.0 | 0.036 |
| 本文网络 | 15.6 | 0.026 |
| 超参数K | F-measure/% | S-measure/% | MAE |
|---|---|---|---|
| 0 | 86.00 | 88.00 | 0.027 |
| 10 | 88.89 | 90.00 | 0.020 |
| 20 | 89.51 | 90.43 | 0.019 |
| 50 | 89.70 | 90.91 | 0.018 |
| 100 | 89.64 | 90.94 | 0.018 |
| 300 | 89.71 | 90.90 | 0.018 |
Tab. 3 Influence of hyper-parameter K on network performance
| 超参数K | F-measure/% | S-measure/% | MAE |
|---|---|---|---|
| 0 | 86.00 | 88.00 | 0.027 |
| 10 | 88.89 | 90.00 | 0.020 |
| 20 | 89.51 | 90.43 | 0.019 |
| 50 | 89.70 | 90.91 | 0.018 |
| 100 | 89.64 | 90.94 | 0.018 |
| 300 | 89.71 | 90.90 | 0.018 |
| 选点方式 | F-measure/% | S-measure/% | MAE |
|---|---|---|---|
| 阈值+等概率采样 | 89.06 | 90.22 | 0.019 |
| 自适应最大池化 | 88.89 | 90.00 | 0.020 |
| Softmax+置信度 | 89.70 | 90.91 | 0.018 |
Tab. 4 Influence of different selection methods on network performance
| 选点方式 | F-measure/% | S-measure/% | MAE |
|---|---|---|---|
| 阈值+等概率采样 | 89.06 | 90.22 | 0.019 |
| 自适应最大池化 | 88.89 | 90.00 | 0.020 |
| Softmax+置信度 | 89.70 | 90.91 | 0.018 |
| 是否进行投票 | F-measure/% | S-measure/% | MAE |
|---|---|---|---|
| 是 | 89.70 | 90.91 | 0.018 |
| 否 | 89.41 | 90.64 | 0.018 |
Tab. 5 Influence of foreground feature voting on network performance
| 是否进行投票 | F-measure/% | S-measure/% | MAE |
|---|---|---|---|
| 是 | 89.70 | 90.91 | 0.018 |
| 否 | 89.41 | 90.64 | 0.018 |
| 融合方式 | F-measure/% | S-measure/% | MAE |
|---|---|---|---|
| 直接加 | 89.59 | 90.84 | 0.019 |
| 进行拼接 | 89.63 | 90.84 | 0.018 |
| 动态信息融合 | 89.70 | 90.91 | 0.018 |
Tab. 6 Influence of decoder stage feature fusion on network performance
| 融合方式 | F-measure/% | S-measure/% | MAE |
|---|---|---|---|
| 直接加 | 89.59 | 90.84 | 0.019 |
| 进行拼接 | 89.63 | 90.84 | 0.018 |
| 动态信息融合 | 89.70 | 90.91 | 0.018 |
| 1 | TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787. |
| 2 | PAN Y, YAO T, LI H, et al. Video captioning with transferred semantic attributes [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6504-6512. |
| 3 | JI W, YU S, WU J, et al. Learning calibrated medical image segmentation via multi-rater agreement modeling [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12336-12346. |
| 4 | ITTI L. Automatic foveation for video compression using a neurobiological model of visual attention [J]. IEEE Transactions on Image Processing, 2004, 13(10): 1304-1318. |
| 5 | HADIZADEH H, BAJIĆ I V. Saliency-aware video compression [J]. IEEE Transactions on Image Processing, 2014, 23(1): 19-33. |
| 6 | WU H, LI G, LUO X. Weighted attentional blocks for probabilistic object tracking [J]. The Visual Computer, 2014, 30: 229-243. |
| 7 | YAN P, LI G, XIE Y, et al. Semi-supervised video salient object detection using pseudo-labels [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7283-7292. |
| 8 | GU Y, WANG L, WANG Z, et al. Pyramid constrained self-attention network for fast video salient object detection [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 10869-10876. |
| 9 | YANG Z, WANG Q, BERTINETTO L, et al. Anchor diffusion for unsupervised video object segmentation [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 931-940. |
| 10 | YANG S, ZHANG L, QI J, et al. Learning motion-appearance co-attention for zero-shot video object segmentation [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 1544-1553. |
| 11 | WANG W, LU X, SHEN J, et al. Zero-shot video object segmentation via attentive graph neural networks [C]// Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2019: 9236-9245. |
| 12 | YANG B, BENDER G, LE Q V, et al. CondConv: conditionally parameterized convolutions for efficient inference [C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 1307-1318. |
| 13 | DIBA A, SHARMA V, VAN GOOL L, et al. DynamoNet: dynamic action and motion network [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6191-6200. |
| 14 | ZHOU S, ZHANG J, PAN J, et al. Spatio-temporal filter adaptive network for video deblurring [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 2482-2491. |
| 15 | HE J, DENG Z, QIAO Y. Dynamic multi-scale filters for semantic segmentation [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 3561-3571. |
| 16 | PANG Y, ZHANG L, ZHAO X, et al. Hierarchical dynamic filtering network for RGB-D salient object detection [C]// Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 235-252. |
| 17 | YU S, XIAO J, ZHANG B, et al. Democracy does matter: comprehensive feature mining for co-salient object detection [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 969-978. |
| 18 | ZHANG M, LIU J, WANG Y, et al. Dynamic context-sensitive filtering network for video salient object detection [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 1533-1543. |
| 19 | CHEN Y, ZOU W, TANG Y, et al. SCOM: spatiotemporal constrained optimization for salient object detection [J]. IEEE Transactions on Image Processing, 2018, 27(7): 3345-3357. |
| 20 | WANG W, SHEN J, SHAO L. Video salient object detection via fully convolutional networks [J]. IEEE Transactions on Image Processing, 2017, 27(1): 38-49. |
| 21 | SONG H, WANG W, ZHAO S, et al. Pyramid dilated deeper ConvLSTM for video salient object detection [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 715-731. |
| 22 | FAN D-P, WANG W, CHENG M-M, et al. Shifting more attention to video salient object detection [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 8546-8556. |
| 23 | CHEN C, WANG G, PENG C, et al. Exploring rich and efficient spatial temporal interactions for real-time video salient object detection [J]. IEEE Transactions on Image Processing, 2021, 30: 3995-4007. |
| 24 | JI Y, ZHANG H, JIE Z, et al. CASNet: a cross-attention Siamese network for video salient object detection [J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(6): 2676-2690. |
| 25 | LI H, CHEN G, LI G, et al. Motion guided attention for video salient object detection [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7273-7282. |
| 26 | JI G-P, FU K, WU Z, et al. Full-duplex strategy for video object segmentation [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 4902-4913. |
| [1] | Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710. |
| [2] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
| [3] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
| [4] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
| [5] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
| [6] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
| [7] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
| [8] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
| [9] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
| [10] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
| [11] | Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557. |
| [12] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
| [13] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
| [14] | Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625. |
| [15] | Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||