Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (9): 2919-2930.DOI: 10.11772/j.issn.1001-9081.2023091303
• Multimedia computing and computer simulation • Previous Articles Next Articles
Yan RONG1,2, Jiawen LIU1, Xinlei LI1()
Received:
2023-09-20
Revised:
2023-11-24
Accepted:
2023-12-01
Online:
2024-01-31
Published:
2024-09-10
Contact:
Xinlei LI
About author:
RONG Yan, born in 2001, M. S. candidate. Her research interests include computer vision, affective computing.Supported by:
通讯作者:
李馨蕾
作者简介:
戎妍(2001—),女,江苏丹阳人,硕士研究生,CCF会员,主要研究方向:计算机视觉、情感计算基金资助:
CLC Number:
Yan RONG, Jiawen LIU, Xinlei LI. Adaptive hybrid network for affective computing in student classroom[J]. Journal of Computer Applications, 2024, 44(9): 2919-2930.
戎妍, 刘嘉雯, 李馨蕾. 面向学生课堂情感计算的自适应混合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2919-2930.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023091303
P | R | mAP | |||
---|---|---|---|---|---|
0.969 | 0.949 | 0.556 | 0.980 | ||
| 0.968 | 0.950 | 0.570 | 0.979 | |
| 0.970 | 0.954 | 0.571 | 0.983 | |
| | 0.980 | 0.953 | 0.571 | 0.985 |
Tab. 1 Influence of SAC-ELANBLOCK module on model performance
P | R | mAP | |||
---|---|---|---|---|---|
0.969 | 0.949 | 0.556 | 0.980 | ||
| 0.968 | 0.950 | 0.570 | 0.979 | |
| 0.970 | 0.954 | 0.571 | 0.983 | |
| | 0.980 | 0.953 | 0.571 | 0.985 |
GAM1 | GAM2 | GAM3 | P | R | mAP | |
---|---|---|---|---|---|---|
0.969 | 0.949 | 0.556 | 0.980 | |||
| 0.966 | 0.971 | 0.587 | 0.989 | ||
| 0.970 | 0.968 | 0.605 | 0.987 | ||
| 0.987 | 0.967 | 0.548 | 0.990 | ||
| | | 0.984 | 0.978 | 0.576 | 0.991 |
Tab. 2 Influence of GAM structure on model performance
GAM1 | GAM2 | GAM3 | P | R | mAP | |
---|---|---|---|---|---|---|
0.969 | 0.949 | 0.556 | 0.980 | |||
| 0.966 | 0.971 | 0.587 | 0.989 | ||
| 0.970 | 0.968 | 0.605 | 0.987 | ||
| 0.987 | 0.967 | 0.548 | 0.990 | ||
| | | 0.984 | 0.978 | 0.576 | 0.991 |
注意力机制 | P | R | mAP | |
---|---|---|---|---|
SE[ | 0.973 | 0.951 | 0.556 | 0.985 |
CBAM[ | 0.957 | 0.946 | 0.552 | 0.977 |
Polarized Self-Attention[ | 0.975 | 0.951 | 0.529 | 0.980 |
CoordAttention[ | 0.976 | 0.949 | 0.566 | 0.982 |
Sequential Self-Attention[ | 0.975 | 0.970 | 0.570 | 0.984 |
SimAM[ | 0.974 | 0.953 | 0.560 | 0.979 |
TripletAttention[ | 0.979 | 0.938 | 0.542 | 0.979 |
GAM | 0.984 | 0.978 | 0.576 | 0.991 |
Tab. 3 Comparison results of different attention mechanisms
注意力机制 | P | R | mAP | |
---|---|---|---|---|
SE[ | 0.973 | 0.951 | 0.556 | 0.985 |
CBAM[ | 0.957 | 0.946 | 0.552 | 0.977 |
Polarized Self-Attention[ | 0.975 | 0.951 | 0.529 | 0.980 |
CoordAttention[ | 0.976 | 0.949 | 0.566 | 0.982 |
Sequential Self-Attention[ | 0.975 | 0.970 | 0.570 | 0.984 |
SimAM[ | 0.974 | 0.953 | 0.560 | 0.979 |
TripletAttention[ | 0.979 | 0.938 | 0.542 | 0.979 |
GAM | 0.984 | 0.978 | 0.576 | 0.991 |
损失函数 | P | R | mAP | |
---|---|---|---|---|
CIoU | 0.969 | 0.949 | 0.556 | 0.980 |
SIoU[ | 0.975 | 0.949 | 0.556 | 0.980 |
AlphaIoU[ | 0.944 | 0.938 | 0.556 | 0.975 |
FocalEIoU[ | 0.984 | 0.946 | 0.572 | 0.981 |
EIoU | 0.979 | 0.956 | 0.567 | 0.983 |
CIoU+NWD | 0.979 | 0.949 | 0.565 | 0.982 |
WiseIoU[ | 0.972 | 0.944 | 0.526 | 0.974 |
NWD-EIoU | 0.985 | 0.949 | 0.581 | 0.984 |
Tab. 4 Comparison results of different loss functions in face detection module
损失函数 | P | R | mAP | |
---|---|---|---|---|
CIoU | 0.969 | 0.949 | 0.556 | 0.980 |
SIoU[ | 0.975 | 0.949 | 0.556 | 0.980 |
AlphaIoU[ | 0.944 | 0.938 | 0.556 | 0.975 |
FocalEIoU[ | 0.984 | 0.946 | 0.572 | 0.981 |
EIoU | 0.979 | 0.956 | 0.567 | 0.983 |
CIoU+NWD | 0.979 | 0.949 | 0.565 | 0.982 |
WiseIoU[ | 0.972 | 0.944 | 0.526 | 0.974 |
NWD-EIoU | 0.985 | 0.949 | 0.581 | 0.984 |
SEB | GAM | CoordRep | NWD-EIoU | P | R | mAP | |
---|---|---|---|---|---|---|---|
0.969 | 0.949 | 0.556 | 0.980 | ||||
| 0.980 | 0.953 | 0.571 | 0.985 | |||
| 0.984 | 0.978 | 0.576 | 0.991 | |||
| 0.970 | 0.968 | 0.605 | 0.987 | |||
| 0.985 | 0.949 | 0.581 | 0.984 | |||
| | | | 0.994 | 0.986 | 0.598 | 0.994 |
Tab. 5 Ablation experiment results of improvement modules in face detection module
SEB | GAM | CoordRep | NWD-EIoU | P | R | mAP | |
---|---|---|---|---|---|---|---|
0.969 | 0.949 | 0.556 | 0.980 | ||||
| 0.980 | 0.953 | 0.571 | 0.985 | |||
| 0.984 | 0.978 | 0.576 | 0.991 | |||
| 0.970 | 0.968 | 0.605 | 0.987 | |||
| 0.985 | 0.949 | 0.581 | 0.984 | |||
| | | | 0.994 | 0.986 | 0.598 | 0.994 |
损失函数 | Acc | R | F1 |
---|---|---|---|
Cross EntropyLoss | 0.828 | 0.824 | 0.814 |
LabelSmoothLoss | 0.826 | 0.813 | 0.810 |
SeesawLoss | 0.829 | 0.845 | 0.830 |
Focal Loss | 0.832 | 0.832 | 0.820 |
ASL(clip=0.5) | 0.849 | 0.814 | 0.808 |
ASL | 0.853 | 0.843 | 0.841 |
Tab. 6 Comparison results of different loss functions in affective computing module
损失函数 | Acc | R | F1 |
---|---|---|---|
Cross EntropyLoss | 0.828 | 0.824 | 0.814 |
LabelSmoothLoss | 0.826 | 0.813 | 0.810 |
SeesawLoss | 0.829 | 0.845 | 0.830 |
Focal Loss | 0.832 | 0.832 | 0.820 |
ASL(clip=0.5) | 0.849 | 0.814 | 0.808 |
ASL | 0.853 | 0.843 | 0.841 |
DCNv2 | 特征融合 | ASL | Acc | R | F1 |
---|---|---|---|---|---|
0.832 | 0.832 | 0.820 | |||
| 0.853 | 0.843 | 0.841 | ||
| 0.845 | 0.842 | 0.823 | ||
| 0.861 | 0.857 | 0.846 | ||
| | | 0.923 | 0.912 | 0.907 |
Tab. 7 Ablation experiment results of DCNv2, feature fusion, and loss function
DCNv2 | 特征融合 | ASL | Acc | R | F1 |
---|---|---|---|---|---|
0.832 | 0.832 | 0.820 | |||
| 0.853 | 0.843 | 0.841 | ||
| 0.845 | 0.842 | 0.823 | ||
| 0.861 | 0.857 | 0.846 | ||
| | | 0.923 | 0.912 | 0.907 |
算法 | one-stage | Anchor | Backbone | mAP | 浮点运算量/GFLOPs | Params/106 | ||
---|---|---|---|---|---|---|---|---|
DETR[ | | R-50 | 0.437 | 0.868 | 0.386 | 5.170 | 41.280 | |
FSAF[ | | R-50 | 0.725 | 0.988 | 0.866 | 9.950 | 36.010 | |
YOLOX[ | | YOLOX-S | 0.301 | 0.679 | 0.209 | 1.630 | 8.940 | |
YOLOv6[ | | | YOLOv6-s | 0.689 | 0.983 | 0.815 | 2.681 | 17.187 |
RetinaNet[ | | | R-50-FPN | 0.718 | 0.968 | 0.866 | 10.050 | 36.100 |
Grid RCNN[ | | R-50 | 0.698 | 0.979 | 0.855 | 136.830 | 64.240 | |
Cascade RCNN[ | | R-50-FPN | 0.743 | 0.979 | 0.907 | 51.150 | 68.930 | |
Faster RCNN[ | | R-50-FPN | 0.718 | 0.980 | 0.885 | 23.350 | 23.350 | |
TOOD[ | | R-50 | 0.732 | 0.989 | 0.890 | 8.860 | 31.790 | |
FCOS[ | | R-50 | 0.705 | 0.979 | 0.847 | 9.660 | 31.840 | |
Deformable DETR[ | | R-50 | 0.529 | 0.928 | 0.537 | 11.010 | 39.820 | |
SC-ACNet | | | 本文网络 | 0.748 | 0.995 | 0.902 | 6.338 | 36.503 |
Tab. 8 Comparison results of different object detection algorithms in face detection module
算法 | one-stage | Anchor | Backbone | mAP | 浮点运算量/GFLOPs | Params/106 | ||
---|---|---|---|---|---|---|---|---|
DETR[ | | R-50 | 0.437 | 0.868 | 0.386 | 5.170 | 41.280 | |
FSAF[ | | R-50 | 0.725 | 0.988 | 0.866 | 9.950 | 36.010 | |
YOLOX[ | | YOLOX-S | 0.301 | 0.679 | 0.209 | 1.630 | 8.940 | |
YOLOv6[ | | | YOLOv6-s | 0.689 | 0.983 | 0.815 | 2.681 | 17.187 |
RetinaNet[ | | | R-50-FPN | 0.718 | 0.968 | 0.866 | 10.050 | 36.100 |
Grid RCNN[ | | R-50 | 0.698 | 0.979 | 0.855 | 136.830 | 64.240 | |
Cascade RCNN[ | | R-50-FPN | 0.743 | 0.979 | 0.907 | 51.150 | 68.930 | |
Faster RCNN[ | | R-50-FPN | 0.718 | 0.980 | 0.885 | 23.350 | 23.350 | |
TOOD[ | | R-50 | 0.732 | 0.989 | 0.890 | 8.860 | 31.790 | |
FCOS[ | | R-50 | 0.705 | 0.979 | 0.847 | 9.660 | 31.840 | |
Deformable DETR[ | | R-50 | 0.529 | 0.928 | 0.537 | 11.010 | 39.820 | |
SC-ACNet | | | 本文网络 | 0.748 | 0.995 | 0.902 | 6.338 | 36.503 |
模型 | SC-ACD | KDEF | RaFD | 浮点运算量/GFLOPs | Params/106 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Acc | R | F1 | Acc | R | F1 | Acc | R | F1 | |||
ConvNext[ | 0.525 | 0.429 | 0.409 | 0.652 | 0.653 | 0.652 | 0.155 | 0.278 | 0.187 | 5.84 | 27.83 |
MobileNet v2[ | 0.863 | 0.814 | 0.818 | 0.848 | 0.848 | 0.847 | 0.741 | 0.714 | 0.683 | 0.41 | 2.23 |
EfficientNet[ | 0.847 | 0.814 | 0.818 | 0.956 | 0.955 | 0.955 | 0.989 | 0.992 | 0.990 | 0.52 | 4.02 |
ShuffleNet v2[ | 0.863 | 0.857 | 0.857 | 0.924 | 0.919 | 0.918 | 0.941 | 0.945 | 0.911 | 0.19 | 1.26 |
DenseNet[ | 0.133 | 0.283 | 0.181 | 0.632 | 0.636 | 0.622 | 0.131 | 0.136 | 0.109 | 3.74 | 6.96 |
CSPNet[ | 0.825 | 0.845 | 0.819 | 0.666 | 0.666 | 0.666 | 0.975 | 0.975 | 0.974 | 6.57 | 27.64 |
VAN[ | 0.850 | 0.825 | 0.819 | 0.954 | 0.954 | 0.953 | 0.991 | 0.989 | 0.990 | 1.13 | 3.85 |
PoolFormer[ | 0.888 | 0.858 | 0.860 | 0.962 | 0.961 | 0.961 | 0.974 | 0.972 | 0.972 | 2.38 | 11.41 |
MViTv2[ | 0.775 | 0.736 | 0.730 | 0.968 | 0.968 | 0.977 | 0.995 | 0.996 | 0.996 | 6.41 | 23.41 |
Swin Transformer v2[ | 0.875 | 0.843 | 0.843 | 0.975 | 0.978 | 0.976 | 0.964 | 0.972 | 0.967 | 5.96 | 27.58 |
ConvMixer[ | 0.825 | 0.775 | 0.773 | 0.728 | 0.701 | 0.689 | 0.954 | 0.961 | 0.956 | 28.83 | 20.35 |
Twins[ | 0.775 | 0.712 | 0.706 | 0.576 | 0.574 | 0.569 | 0.013 | 0.125 | 0.024 | 5.06 | 23.60 |
SC-ACNet | 0.923 | 0.913 | 0.908 | 0.972 | 0.972 | 0.971 | 0.994 | 0.997 | 0.996 | 2.03 | 4.94 |
Tab. 9 Comparison results of different algorithms in affective computing module
模型 | SC-ACD | KDEF | RaFD | 浮点运算量/GFLOPs | Params/106 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Acc | R | F1 | Acc | R | F1 | Acc | R | F1 | |||
ConvNext[ | 0.525 | 0.429 | 0.409 | 0.652 | 0.653 | 0.652 | 0.155 | 0.278 | 0.187 | 5.84 | 27.83 |
MobileNet v2[ | 0.863 | 0.814 | 0.818 | 0.848 | 0.848 | 0.847 | 0.741 | 0.714 | 0.683 | 0.41 | 2.23 |
EfficientNet[ | 0.847 | 0.814 | 0.818 | 0.956 | 0.955 | 0.955 | 0.989 | 0.992 | 0.990 | 0.52 | 4.02 |
ShuffleNet v2[ | 0.863 | 0.857 | 0.857 | 0.924 | 0.919 | 0.918 | 0.941 | 0.945 | 0.911 | 0.19 | 1.26 |
DenseNet[ | 0.133 | 0.283 | 0.181 | 0.632 | 0.636 | 0.622 | 0.131 | 0.136 | 0.109 | 3.74 | 6.96 |
CSPNet[ | 0.825 | 0.845 | 0.819 | 0.666 | 0.666 | 0.666 | 0.975 | 0.975 | 0.974 | 6.57 | 27.64 |
VAN[ | 0.850 | 0.825 | 0.819 | 0.954 | 0.954 | 0.953 | 0.991 | 0.989 | 0.990 | 1.13 | 3.85 |
PoolFormer[ | 0.888 | 0.858 | 0.860 | 0.962 | 0.961 | 0.961 | 0.974 | 0.972 | 0.972 | 2.38 | 11.41 |
MViTv2[ | 0.775 | 0.736 | 0.730 | 0.968 | 0.968 | 0.977 | 0.995 | 0.996 | 0.996 | 6.41 | 23.41 |
Swin Transformer v2[ | 0.875 | 0.843 | 0.843 | 0.975 | 0.978 | 0.976 | 0.964 | 0.972 | 0.967 | 5.96 | 27.58 |
ConvMixer[ | 0.825 | 0.775 | 0.773 | 0.728 | 0.701 | 0.689 | 0.954 | 0.961 | 0.956 | 28.83 | 20.35 |
Twins[ | 0.775 | 0.712 | 0.706 | 0.576 | 0.574 | 0.569 | 0.013 | 0.125 | 0.024 | 5.06 | 23.60 |
SC-ACNet | 0.923 | 0.913 | 0.908 | 0.972 | 0.972 | 0.971 | 0.994 | 0.997 | 0.996 | 2.03 | 4.94 |
1 | WANG Y, SONG W, TAO W, et al. A systematic review on affective computing: emotion models, databases, and recent advances [J]. Information Fusion, 2022, 83: 19-52. |
2 | MEHRABIAN A. Communication without words [M]// MORTENSEN C D. Communication Theory. 2nd ed. New York: Routledge, 2008: 193-200. |
3 | 周进,叶俊民,李超. 多模态学习情感计算:动因、框架与建议[J]. 电化教育研究, 2021, 42(7): 26-32. |
ZHOU J, YE J M, LI C. Multimodal learning affective computing: motivations, frameworks, and suggestions [J]. e-Education Research, 2021, 42(7): 26-32. | |
4 | WEN J, JIANG D, TU G, et al. Dynamic interactive multiview memory network for emotion recognition in conversation [J]. Information Fusion, 2023, 91: 123-133. |
5 | SUN B, WU Y, ZHAO K, et al. Student class behavior dataset: a video dataset for recognizing, detecting, and captioning students’ behaviors in classroom scenes [J]. Neural Computing and Applications, 2021, 33(14): 8335-8354. |
6 | MASUD U, SAEED T, MALAIKAH H M, et al. Smart assistive system for visually impaired people obstruction avoidance through object detection and classification [J]. IEEE Access, 2022, 10: 13428-13441. |
7 | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers [C]// Proceedings of the 16th European Conference on Computer Vision, LNCS 12346. Cham: Springer, 2020: 213-229. |
8 | ZHU C, HE Y, SAVVIDES M. Feature selective anchor-free module for single-shot object detection [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 840-849. |
9 | GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021 [EB/OL]. [2023-11-15]. . |
10 | LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications [EB/OL]. (2022-09-07) [2023-11-18]. . |
11 | WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475. |
12 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. |
13 | LU X, LI B, YUE Y, et al. Grid R-CNN [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 7355-7364. |
14 | CAI Z, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6154-6162. |
15 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99. |
16 | FENG C, ZHONG Y, GAO Y, et al. TOOD: task-aligned one-stage object detection [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 3490-3499. |
17 | TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9626-9635. |
18 | ZHONG P, WANG D, MIAO C. EEG-based emotion recognition using regularized graph neural networks [J]. IEEE Transactions on Affective Computing, 2022, 13(3): 1290-1301. |
19 | YE F, PU S, ZHONG Q, et al. Dynamic GCN: context-enriched topology learning for skeleton-based action recognition [C]// Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 55-63. |
20 | GUPTA S, KUMAR P, TEKCHANDANI R K. Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models [J]. Multimedia Tools and Applications, 2023, 82(8): 11365-11394. |
21 | HOU C, AI J, LIN Y, et al. Evaluation of online teaching quality based on facial expression recognition [J]. Future Internet, 2022, 14(6): No.177. |
22 | DONG Z, JI X, LAI C S, et al. Memristor-based hierarchical attention network for multimodal affective computing in mental health monitoring [J]. IEEE Consumer Electronics Magazine, 2023, 12(4): 94-106. |
23 | CALVO M G, LUNDQVIST D. Facial expressions of emotion (KDEF): identification under different display-duration conditions[J]. Behavior Research Methods, 2008, 40(1): 109-115. |
24 | LANGNER O, DOTSCH R, BIJLSTRA G, et al. Presentation and validation of the Radboud faces database [J]. Cognition and Emotion, 2010, 24(8): 1377-1388. |
25 | QIAO S, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10208-10219. |
26 | LIU Y, SHAO Z, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. [2023-10-13]. . |
27 | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the 2018 European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
28 | LIU R, LEHMAN J, MOLINO P, et al. An intriguing failing of convolutional neural networks and the CoordConv solution [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9628-9639. |
29 | YANG Z, WANG X, LI J. EIoU: an improved vehicle detection algorithm based on VehicleNet neural network [J]. Journal of Physics: Conference Series, 2021, 1924: No.012001. |
30 | XU C, WANG J, YANG W, et al. Detecting tiny objects in aerial images: a normalized Wasserstein distance and a new benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190: 79-93. |
31 | MEHTA S, RASTEGARI M. MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer[EB/OL]. (2022-03-04) [2023-08-02]. . |
32 | ZHU X, HU H, LIN S, et al. Deformable ConvNets v2: more deformable, better results [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9300-9308. |
33 | RIDNIK T, BEN-BARUCH E, ZAMIR N, et al. Asymmetric loss for multi-label classification [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 82-91. |
34 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. |
35 | LIU H, LIU F, FAN X, et al. Polarized self-attention: towards high-quality pixel-wise mapping [J]. Neurocomputing, 2022, 506: 158-167. |
36 | HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13708-13717. |
37 | FAN Z, LIU Z, WANG Y, et al. Sequential recommendation via stochastic self-attention [C]// Proceedings of the ACM Web Conference 2022. New York: ACM, 2022: 2036-2047. |
38 | YANG L, ZHANG R Y, LI L, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 11863-11874. |
39 | MISRA D, NALAMADA T, ARASANIPALAI A U, et al. Rotate to attend: convolutional triplet attention module [C]// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 3138-3147. |
40 | GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression[EB/OL]. (2022-05-25) [2023-03-29]. . |
41 | HE J, ERFANI S, MA X, et al. Alpha-IoU: a family of power intersection over union losses for bounding box regression [C]// Proceedings of the 35th Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 20230-20242. |
42 | ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression [J]. Neurocomputing, 2022, 506: 146-157. |
43 | TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism [EB/OL]. [2023-10-14]. . |
44 | ZHU X, SU W, LU L, et al. Deformable DETR: deformable Transformers for end-to-end object detection [EB/OL]. (2021-03-18) [2023-11-11]. . |
45 | LIU Z, MAO H, WU C Y, et al. A ConvNet for the 2020s [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11966-11976. |
46 | SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510-4520. |
47 | TAN M, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks [C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 6105-6114. |
48 | MA N, ZHANG X, ZHENG H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design [C]// Proceedings of the 15th European Conference on Computer Vision, LNCS 11218. Cham: Springer, 2018: 122-138. |
49 | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. |
50 | WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2020: 1571-1580. |
51 | GUO M H, LU C Z, LIU Z N, et al. Visual attention network [J]. Computational Visual Media, 2023, 9(4): 733-752. |
52 | YU W, LUO M, ZHOU P, et al. MetaFormer is actually what you need for vision [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10809-10819. |
53 | LI Y, WU C Y, FAN H, et al. MViTv2: improved multiscale vision Transformers for classification and detection [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 4794-4804. |
54 | LIU Z, HU H, LIN Y, et al. Swin Transformer V2: scaling up capacity and resolution [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11999-12009. |
55 | NG D, CHEN Y, TIAN B, et al. ConvMixer: feature interactive convolution with curriculum learning for small footprint and noisy far-field keyword spotting [C]// Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 3603-3607. |
56 | CHU X, TIAN Z, WANG Y, et al. Twins: revisiting the design of spatial attention in vision Transformers [C]// Proceedings of the 35th Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 9355-9366. |
57 | ZHANG C, ZHANG C, ZHENG S, et al. A complete survey on generative AI (AIGC): is ChatGPT from GPT-4 to GPT-5 all you need? [EB/OL]. [2023-03-21]. . |
58 | KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[EB/OL]. [2023-06-11]. . |
59 | AMIN M M, CAMBRIA E, SCHULLER B W. Will affective computing emerge from foundation models and general artificial intelligence? A first evaluation of ChatGPT [J]. IEEE Intelligent Systems, 2023, 38(2): 15-23. |
[1] | Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413. |
[2] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[3] | Hongtian LI, Xinhao SHI, Weiguo PAN, Cheng XU, Bingxin XU, Jiazheng YUAN. Few-shot object detection via fusing multi-scale and attention mechanism [J]. Journal of Computer Applications, 2024, 44(5): 1437-1444. |
[4] | Zhanjun JIANG, Baijing WU, Long MA, Jing LIAN. Faster-RCNN water-floating garbage recognition based on multi-scale feature and polarized self-attention [J]. Journal of Computer Applications, 2024, 44(3): 938-944. |
[5] | Hao YANG, Yi ZHANG. Feature pyramid network algorithm based on context information and multi-scale fusion importance awareness [J]. Journal of Computer Applications, 2023, 43(9): 2727-2734. |
[6] | Hong WANG, Qing QIAN, Huan WANG, Yong LONG. Lightweight image tamper localization algorithm based on large kernel attention convolution [J]. Journal of Computer Applications, 2023, 43(9): 2692-2699. |
[7] | Shuai ZHENG, Xiaolong ZHANG, He DENG, Hongwei REN. 3D liver image segmentation method based on multi-scale feature fusion and grid attention mechanism [J]. Journal of Computer Applications, 2023, 43(7): 2303-2310. |
[8] | Chunlan ZHAN, Anzhi WANG, Minghui WANG. Camouflage object segmentation method based on channel attention and edge fusion [J]. Journal of Computer Applications, 2023, 43(7): 2166-2172. |
[9] | Zhouhua ZHU, Qi QI. Automatic detection and recognition of electric vehicle helmet based on improved YOLOv5s [J]. Journal of Computer Applications, 2023, 43(4): 1291-1296. |
[10] | You YANG, Ruhui ZHANG, Pengcheng XU, Kang KANG, Hao ZHAI. Improved U-Net for seal segmentation of Republican archives [J]. Journal of Computer Applications, 2023, 43(3): 943-948. |
[11] | Xin ZHAO, Qianqian ZHU, Cong ZHAO, Jialing WU. Segmentation of breast nodules in ultrasound images based on multi-scale and cross-spatial fusion [J]. Journal of Computer Applications, 2023, 43(11): 3599-3606. |
[12] | LYU Yuchao, JIANG Xi, XU Yinghao, ZHU Xijun. Improved brachial plexus nerve segmentation method based on multi-scale feature fusion [J]. Journal of Computer Applications, 2023, 43(1): 273-279. |
[13] | Zanxia QIANG, Xianfu BAO. Residual attention deraining network based on convolutional long short-term memory [J]. Journal of Computer Applications, 2022, 42(9): 2858-2864. |
[14] | Tianhao QIU, Shurong CHEN. EfficientNet based dual-branch multi-scale integrated learning for pedestrian re-identification [J]. Journal of Computer Applications, 2022, 42(7): 2065-2071. |
[15] | HAN Jiandong, LI Xiaoyu. Pedestrian re-identification method based on multi-scale feature fusion [J]. Journal of Computer Applications, 2021, 41(10): 2991-2996. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||