Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (4): 1264-1274.DOI: 10.11772/j.issn.1001-9081.2025050543
• Multimedia computing and computer simulation • Previous Articles
Yongbing ZHANG1, Lirong YAN1, Xiaofen TANG1,2(
)
Received:2025-05-19
Revised:2025-07-25
Accepted:2025-08-01
Online:2025-08-08
Published:2026-04-10
Contact:
Xiaofen TANG
About author:ZHANG Yongbing, born in 1999, M. S. candidate. His research interests include domain generalized object detection.Supported by:通讯作者:
唐晓芬
作者简介:张永兵(1999—),男,甘肃天水人,硕士研究生,CCF会员,主要研究方向:域泛化目标检测基金资助:CLC Number:
Yongbing ZHANG, Lirong YAN, Xiaofen TANG. Progressive dual-stage modality interaction for single-domain generalized object detection[J]. Journal of Computer Applications, 2026, 46(4): 1264-1274.
张永兵, 闫丽蓉, 唐晓芬. 渐进式双阶段模态交互的单域泛化目标检测[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1264-1274.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025050543
| 方法 | mAP | mPT | ||||
|---|---|---|---|---|---|---|
| Daytime Clear | Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | ||
| Faster R-CNN[ | 48.1 | 34.4 | 26.0 | 12.4 | 32.0 | 26.2 |
| SW[ | 50.6 | 33.4 | 26.3 | 13.7 | 30.8 | 26.1 |
| IBN-Net[ | 49.7 | 32.1 | 26.1 | 14.3 | 29.6 | 25.5 |
| IterNorm[ | 43.9 | 29.6 | 22.8 | 12.6 | 28.4 | 23.4 |
| ISW[ | 51.3 | 33.2 | 25.9 | 14.1 | 31.8 | 26.3 |
| SHADE[ | — | 33.9 | 29.5 | 16.8 | 33.4 | 28.4 |
| CDSD[ | 56.1 | 36.6 | 28.2 | 16.6 | 33.5 | 28.7 |
| SRCD[ | — | 28.8 | 17.0 | 35.9 | 29.6 | |
| C-Gap*[ | 52.0 | 36.3 | ||||
| 本文方法 | 38.4 | 33.3 | 18.5 | 39.1 | 32.3 | |
Tab. 1 Performance comparison of single-domain generalized object detection on Diverse Weather dataset
| 方法 | mAP | mPT | ||||
|---|---|---|---|---|---|---|
| Daytime Clear | Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | ||
| Faster R-CNN[ | 48.1 | 34.4 | 26.0 | 12.4 | 32.0 | 26.2 |
| SW[ | 50.6 | 33.4 | 26.3 | 13.7 | 30.8 | 26.1 |
| IBN-Net[ | 49.7 | 32.1 | 26.1 | 14.3 | 29.6 | 25.5 |
| IterNorm[ | 43.9 | 29.6 | 22.8 | 12.6 | 28.4 | 23.4 |
| ISW[ | 51.3 | 33.2 | 25.9 | 14.1 | 31.8 | 26.3 |
| SHADE[ | — | 33.9 | 29.5 | 16.8 | 33.4 | 28.4 |
| CDSD[ | 56.1 | 36.6 | 28.2 | 16.6 | 33.5 | 28.7 |
| SRCD[ | — | 28.8 | 17.0 | 35.9 | 29.6 | |
| C-Gap*[ | 52.0 | 36.3 | ||||
| 本文方法 | 38.4 | 33.3 | 18.5 | 39.1 | 32.3 | |
| 方法 | Night Sunny | Dusk Rainy | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AP0.5 | mAP | AP0.5 | mAP | |||||||||||||
| bus | bike | car | mot. | pers. | rid. | tru. | bus | bike | car | mot. | pers. | rid. | tru. | |||
| Faster R-CNN[ | 34.7 | 32.0 | 56.6 | 13.6 | 37.4 | 27.6 | 38.6 | 34.4 | 28.5 | 20.3 | 58.2 | 6.5 | 23.4 | 11.3 | 33.9 | 26.0 |
| SW[ | 29.2 | 49.8 | 16.6 | 31.5 | 28.0 | 40.2 | 33.4 | 35.2 | 16.7 | 50.1 | 10.4 | 20.1 | 13.0 | 38.8 | 26.3 | |
| IBN-Net[ | 37.8 | 27.3 | 49.6 | 15.1 | 29.2 | 27.1 | 38.9 | 32.1 | 37.0 | 14.8 | 50.3 | 11.4 | 17.3 | 13.3 | 38.4 | 26.1 |
| IterNorm[ | 38.5 | 23.5 | 38.9 | 15.8 | 26.6 | 25.9 | 38.1 | 29.6 | 32.9 | 14.1 | 38.9 | 11.0 | 15.5 | 11.6 | 35.7 | 22.8 |
| ISW[ | 38.5 | 28.5 | 49.6 | 15.4 | 31.9 | 27.5 | 41.3 | 33.2 | 34.7 | 16.0 | 50.0 | 11.1 | 17.8 | 12.6 | 38.8 | 25.9 |
| CDSD[ | 40.6 | 35.1 | 50.7 | 19.7 | 34.7 | 32.1 | 36.6 | 37.1 | 19.6 | 50.9 | 19.7 | 16.3 | 28.2 | |||
| SRCD[ | 13.1 | 32.5 | 52.3 | 34.8 | 42.9 | 21.4 | 50.6 | 11.9 | 20.1 | 40.5 | 28.8 | |||||
| C-Gap*[ | 37.6 | 14.7 | 28.0 | 42.0 | 36.3 | 34.0 | 12.7 | 39.9 | ||||||||
| 本文方法 | 35.6 | 58.7 | 21.4 | 39.8 | 30.7 | 43.7 | 38.4 | 39.7 | 25.3 | 60.7 | 17.5 | 29.9 | 18.9 | 41.3 | 33.3 | |
Tab. 2 Comparison of detection performance of each class on target domains Night Sunny and Dusk Rainy
| 方法 | Night Sunny | Dusk Rainy | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AP0.5 | mAP | AP0.5 | mAP | |||||||||||||
| bus | bike | car | mot. | pers. | rid. | tru. | bus | bike | car | mot. | pers. | rid. | tru. | |||
| Faster R-CNN[ | 34.7 | 32.0 | 56.6 | 13.6 | 37.4 | 27.6 | 38.6 | 34.4 | 28.5 | 20.3 | 58.2 | 6.5 | 23.4 | 11.3 | 33.9 | 26.0 |
| SW[ | 29.2 | 49.8 | 16.6 | 31.5 | 28.0 | 40.2 | 33.4 | 35.2 | 16.7 | 50.1 | 10.4 | 20.1 | 13.0 | 38.8 | 26.3 | |
| IBN-Net[ | 37.8 | 27.3 | 49.6 | 15.1 | 29.2 | 27.1 | 38.9 | 32.1 | 37.0 | 14.8 | 50.3 | 11.4 | 17.3 | 13.3 | 38.4 | 26.1 |
| IterNorm[ | 38.5 | 23.5 | 38.9 | 15.8 | 26.6 | 25.9 | 38.1 | 29.6 | 32.9 | 14.1 | 38.9 | 11.0 | 15.5 | 11.6 | 35.7 | 22.8 |
| ISW[ | 38.5 | 28.5 | 49.6 | 15.4 | 31.9 | 27.5 | 41.3 | 33.2 | 34.7 | 16.0 | 50.0 | 11.1 | 17.8 | 12.6 | 38.8 | 25.9 |
| CDSD[ | 40.6 | 35.1 | 50.7 | 19.7 | 34.7 | 32.1 | 36.6 | 37.1 | 19.6 | 50.9 | 19.7 | 16.3 | 28.2 | |||
| SRCD[ | 13.1 | 32.5 | 52.3 | 34.8 | 42.9 | 21.4 | 50.6 | 11.9 | 20.1 | 40.5 | 28.8 | |||||
| C-Gap*[ | 37.6 | 14.7 | 28.0 | 42.0 | 36.3 | 34.0 | 12.7 | 39.9 | ||||||||
| 本文方法 | 35.6 | 58.7 | 21.4 | 39.8 | 30.7 | 43.7 | 38.4 | 39.7 | 25.3 | 60.7 | 17.5 | 29.9 | 18.9 | 41.3 | 33.3 | |
| 方法 | Night Rainy | Daytime Foggy | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AP0.5 | mAP | AP0.5 | mAP | |||||||||||||
| bus | bike | car | mot. | pers. | rid. | tru. | bus | bike | car | mot. | pers. | rid. | tru. | |||
| Faster R-CNN[ | 16.8 | 6.9 | 26.3 | 0.6 | 11.6 | 9.4 | 15.4 | 12.4 | 28.1 | 29.7 | 49.7 | 26.3 | 33.2 | 35.5 | 21.5 | 32.0 |
| SW[ | 22.3 | 7.8 | 27.6 | 0.2 | 10.3 | 10.0 | 17.7 | 13.7 | 30.6 | 26.2 | 44.6 | 25.1 | 30.7 | 34.6 | 23.6 | 30.8 |
| IBN-Net[ | 10.0 | 28.4 | 0.9 | 8.3 | 9.8 | 18.1 | 14.3 | 29.9 | 26.1 | 44.5 | 24.4 | 26.2 | 33.5 | 22.4 | 29.6 | |
| IterNorm[ | 21.4 | 6.7 | 22.0 | 0.9 | 9.1 | 10.6 | 17.6 | 12.6 | 29.7 | 21.8 | 42.4 | 24.4 | 26.0 | 33.3 | 21.6 | 28.4 |
| ISW[ | 22.5 | 11.4 | 26.9 | 0.4 | 9.9 | 9.8 | 17.5 | 14.1 | 29.5 | 26.4 | 49.2 | 27.9 | 30.7 | 34.8 | 24.0 | 31.8 |
| CDSD[ | 24.4 | 11.6 | 29.5 | 10.5 | 11.4 | 19.2 | 16.6 | 32.9 | 28.0 | 48.8 | 29.8 | 32.5 | 38.2 | 24.1 | 33.5 | |
| SRCD[ | 26.5 | 0.8 | 10.2 | 24.0 | 17.0 | 36.4 | 30.1 | 52.4 | 31.3 | 33.4 | 40.1 | 35.9 | ||||
| C-Gap*[ | 23.5 | 10.8 | 32.0 | 9.0 | 12.9 | 20.4 | 21.8 | |||||||||
| 本文方法 | 23.9 | 15.1 | 33.7 | 9.9 | 12.9 | 12.4 | 18.5 | 35.3 | 33.7 | 58.8 | 34.8 | 39.7 | 43.5 | 27.9 | 39.1 | |
Tab. 3 Comparison of detection performance of each class on target domains Night Rainy and Daytime Foggy
| 方法 | Night Rainy | Daytime Foggy | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AP0.5 | mAP | AP0.5 | mAP | |||||||||||||
| bus | bike | car | mot. | pers. | rid. | tru. | bus | bike | car | mot. | pers. | rid. | tru. | |||
| Faster R-CNN[ | 16.8 | 6.9 | 26.3 | 0.6 | 11.6 | 9.4 | 15.4 | 12.4 | 28.1 | 29.7 | 49.7 | 26.3 | 33.2 | 35.5 | 21.5 | 32.0 |
| SW[ | 22.3 | 7.8 | 27.6 | 0.2 | 10.3 | 10.0 | 17.7 | 13.7 | 30.6 | 26.2 | 44.6 | 25.1 | 30.7 | 34.6 | 23.6 | 30.8 |
| IBN-Net[ | 10.0 | 28.4 | 0.9 | 8.3 | 9.8 | 18.1 | 14.3 | 29.9 | 26.1 | 44.5 | 24.4 | 26.2 | 33.5 | 22.4 | 29.6 | |
| IterNorm[ | 21.4 | 6.7 | 22.0 | 0.9 | 9.1 | 10.6 | 17.6 | 12.6 | 29.7 | 21.8 | 42.4 | 24.4 | 26.0 | 33.3 | 21.6 | 28.4 |
| ISW[ | 22.5 | 11.4 | 26.9 | 0.4 | 9.9 | 9.8 | 17.5 | 14.1 | 29.5 | 26.4 | 49.2 | 27.9 | 30.7 | 34.8 | 24.0 | 31.8 |
| CDSD[ | 24.4 | 11.6 | 29.5 | 10.5 | 11.4 | 19.2 | 16.6 | 32.9 | 28.0 | 48.8 | 29.8 | 32.5 | 38.2 | 24.1 | 33.5 | |
| SRCD[ | 26.5 | 0.8 | 10.2 | 24.0 | 17.0 | 36.4 | 30.1 | 52.4 | 31.3 | 33.4 | 40.1 | 35.9 | ||||
| C-Gap*[ | 23.5 | 10.8 | 32.0 | 9.0 | 12.9 | 20.4 | 21.8 | |||||||||
| 本文方法 | 23.9 | 15.1 | 33.7 | 9.9 | 12.9 | 12.4 | 18.5 | 35.3 | 33.7 | 58.8 | 34.8 | 39.7 | 43.5 | 27.9 | 39.1 | |
| 方法 | mAP | mPT | ||
|---|---|---|---|---|
| Cityscapes | BDD100K | KITTI | ||
| Faster R-CNN[ | 34.3 | 29.8 | 47.0 | 37.0 |
| SW[ | 34.5 | 30.0 | 47.2 | 37.2 |
| IBN-Net[ | 33.2 | 25.7 | 48.1 | 35.7 |
| IterNorm[ | 34.3 | 30.3 | 46.9 | 37.2 |
| ISW[ | 40.4 | 28.5 | 55.0 | 41.3 |
| SHADE[ | 40.9 | 30.3 | 55.6 | 42.3 |
| CDSD[ | 35.2 | 27.4 | 47.8 | 36.8 |
| SRCD[ | ||||
| 本文方法 | 48.4 | 34.9 | 63.8 | 49.0 |
Tab. 4 Performance comparison of single-domain generalized object detection on Virtual-To-Reality dataset
| 方法 | mAP | mPT | ||
|---|---|---|---|---|
| Cityscapes | BDD100K | KITTI | ||
| Faster R-CNN[ | 34.3 | 29.8 | 47.0 | 37.0 |
| SW[ | 34.5 | 30.0 | 47.2 | 37.2 |
| IBN-Net[ | 33.2 | 25.7 | 48.1 | 35.7 |
| IterNorm[ | 34.3 | 30.3 | 46.9 | 37.2 |
| ISW[ | 40.4 | 28.5 | 55.0 | 41.3 |
| SHADE[ | 40.9 | 30.3 | 55.6 | 42.3 |
| CDSD[ | 35.2 | 27.4 | 47.8 | 36.8 |
| SRCD[ | ||||
| 本文方法 | 48.4 | 34.9 | 63.8 | 49.0 |
| 方法 | UAVDT Nighttime | UAVDT Foggy | Visdrone Nighttime | mPT | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AP0.5 | AP0.75 | AP | AP0.5 | AP0.75 | AP | AP0.5 | AP0.75 | AP | AP0.5 | AP0.75 | AP | |
| Faster R-CNN[ | 58.8 | 23.8 | 28.3 | 24.6 | 9.1 | 12.4 | 34.3 | 14.2 | 16.7 | 39.2 | 15.7 | 19.1 |
| SW[ | 55.7 | 21.1 | 26.0 | 23.9 | 8.6 | 11.4 | 32.8 | 13.6 | 15.0 | 37.5 | 14.4 | 17.5 |
| IBN-Net[ | 63.3 | 25.5 | 31.3 | 31.3 | 8.1 | 12.7 | 40.3 | 17.9 | 19.5 | 45.0 | 17.2 | 21.2 |
| IterNorm[ | 56.5 | 22.1 | 27.7 | 29.2 | 9.1 | 12.7 | 28.0 | 13.2 | 14.6 | 37.9 | 14.8 | 18.3 |
| JiGen[ | 58.6 | 24.1 | 29.2 | 27.9 | 9.1 | 12.3 | 34.5 | 14.5 | 17.6 | 40.3 | 15.9 | 19.7 |
| RSC[ | 50.6 | 12.7 | 21.0 | 21.4 | 9.1 | 9.5 | 27.2 | 11.3 | 13.6 | 33.1 | 11.0 | 14.7 |
| StableNet[ | 61.4 | 27.2 | 31.0 | 31.4 | 10.2 | 14.4 | 32.7 | 14.7 | 16.4 | 41.8 | 17.4 | 20.6 |
| FACT[ | 58.8 | 25.7 | 29.5 | 29.0 | 9.1 | 13.0 | 35.0 | 14.7 | 17.6 | 40.9 | 16.5 | 20.0 |
| DIDN[ | 63.5 | 29.2 | 32.4 | 35.4 | 10.8 | 34.8 | 15.3 | 18.2 | 44.6 | 18.4 | 22.0 | |
| CDSD[ | 61.5 | 26.8 | 30.9 | 29.7 | 9.1 | 14.5 | 34.4 | 15.0 | 17.9 | 41.9 | 16.9 | 21.1 |
| MAD[ | 64.4 | 27.4 | 33.6 | 30.2 | 9.1 | 14.8 | 40.3 | 19.3 | 21.0 | 45.0 | 18.6 | 23.1 |
| FDD[ | 32.4 | 16.8 | ||||||||||
| 本文方法 | 66.2 | 34.7 | 37.6 | 11.0 | 16.8 | 55.4 | 30.8 | 31.5 | 53.0 | 23.9 | 27.7 | |
Tab. 5 Performance comparison of single-domain generalized object detection on UAV-OD dataset
| 方法 | UAVDT Nighttime | UAVDT Foggy | Visdrone Nighttime | mPT | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AP0.5 | AP0.75 | AP | AP0.5 | AP0.75 | AP | AP0.5 | AP0.75 | AP | AP0.5 | AP0.75 | AP | |
| Faster R-CNN[ | 58.8 | 23.8 | 28.3 | 24.6 | 9.1 | 12.4 | 34.3 | 14.2 | 16.7 | 39.2 | 15.7 | 19.1 |
| SW[ | 55.7 | 21.1 | 26.0 | 23.9 | 8.6 | 11.4 | 32.8 | 13.6 | 15.0 | 37.5 | 14.4 | 17.5 |
| IBN-Net[ | 63.3 | 25.5 | 31.3 | 31.3 | 8.1 | 12.7 | 40.3 | 17.9 | 19.5 | 45.0 | 17.2 | 21.2 |
| IterNorm[ | 56.5 | 22.1 | 27.7 | 29.2 | 9.1 | 12.7 | 28.0 | 13.2 | 14.6 | 37.9 | 14.8 | 18.3 |
| JiGen[ | 58.6 | 24.1 | 29.2 | 27.9 | 9.1 | 12.3 | 34.5 | 14.5 | 17.6 | 40.3 | 15.9 | 19.7 |
| RSC[ | 50.6 | 12.7 | 21.0 | 21.4 | 9.1 | 9.5 | 27.2 | 11.3 | 13.6 | 33.1 | 11.0 | 14.7 |
| StableNet[ | 61.4 | 27.2 | 31.0 | 31.4 | 10.2 | 14.4 | 32.7 | 14.7 | 16.4 | 41.8 | 17.4 | 20.6 |
| FACT[ | 58.8 | 25.7 | 29.5 | 29.0 | 9.1 | 13.0 | 35.0 | 14.7 | 17.6 | 40.9 | 16.5 | 20.0 |
| DIDN[ | 63.5 | 29.2 | 32.4 | 35.4 | 10.8 | 34.8 | 15.3 | 18.2 | 44.6 | 18.4 | 22.0 | |
| CDSD[ | 61.5 | 26.8 | 30.9 | 29.7 | 9.1 | 14.5 | 34.4 | 15.0 | 17.9 | 41.9 | 16.9 | 21.1 |
| MAD[ | 64.4 | 27.4 | 33.6 | 30.2 | 9.1 | 14.8 | 40.3 | 19.3 | 21.0 | 45.0 | 18.6 | 23.1 |
| FDD[ | 32.4 | 16.8 | ||||||||||
| 本文方法 | 66.2 | 34.7 | 37.6 | 11.0 | 16.8 | 55.4 | 30.8 | 31.5 | 53.0 | 23.9 | 27.7 | |
| 组号 | CMBIF | MIMI | ADP | mAP/% | mPT/% | |||
|---|---|---|---|---|---|---|---|---|
| Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | |||||
| 1 | 36.3 | 30.1 | 17.2 | 37.7 | 30.3 | |||
| 2 | √ | 37.2 | 30.7 | 17.5 | 37.9 | 30.8 | ||
| 3 | √ | 37.1 | 31.7 | 17.2 | 39.1 | 31.3 | ||
| 4 | √ | √ | 32.3 | 17.9 | 38.0 | 31.5 | ||
| 5 | √ | √ | 37.6 | |||||
| 6 | √ | √ | √ | 38.4 | 33.3 | 18.5 | 39.1 | 32.3 |
Tab. 6 Ablation experimental results of different components of model
| 组号 | CMBIF | MIMI | ADP | mAP/% | mPT/% | |||
|---|---|---|---|---|---|---|---|---|
| Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | |||||
| 1 | 36.3 | 30.1 | 17.2 | 37.7 | 30.3 | |||
| 2 | √ | 37.2 | 30.7 | 17.5 | 37.9 | 30.8 | ||
| 3 | √ | 37.1 | 31.7 | 17.2 | 39.1 | 31.3 | ||
| 4 | √ | √ | 32.3 | 17.9 | 38.0 | 31.5 | ||
| 5 | √ | √ | 37.6 | |||||
| 6 | √ | √ | √ | 38.4 | 33.3 | 18.5 | 39.1 | 32.3 |
| 提示设计 | M | mAP/% | mPT/% | ||||
|---|---|---|---|---|---|---|---|
| Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | ||||
| 固定域提示 | A photo of a | 0 | 37.4 | 30.9 | 17.7 | 37.7 | 30.9 |
| 域无关提示 | A photo of a | 0 | 37.9 | 32.4 | 17.5 | 38.3 | 31.5 |
| 可学习域无关提示 | 6 | 37.2 | 31.4 | 16.9 | 38.0 | 30.9 | |
| 自适应域提示 | 6 | 38.3 | 31.8 | 38.6 | 31.7 | ||
拼接域无关提示和 自适应域提示 | 拼接1 | 4 | 38.3 | 32.3 | 17.7 | 38.6 | 31.7 |
| 拼接2 | 8 | 38.6 | 32.5 | ||||
| 拼接3 | 10 | 38.3 | 17.6 | 38.6 | 31.8 | ||
| 拼接4 | 6 | 33.3 | 18.5 | 39.1 | 32.3 | ||
Tab. 7 Impact of different prompt designs
| 提示设计 | M | mAP/% | mPT/% | ||||
|---|---|---|---|---|---|---|---|
| Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | ||||
| 固定域提示 | A photo of a | 0 | 37.4 | 30.9 | 17.7 | 37.7 | 30.9 |
| 域无关提示 | A photo of a | 0 | 37.9 | 32.4 | 17.5 | 38.3 | 31.5 |
| 可学习域无关提示 | 6 | 37.2 | 31.4 | 16.9 | 38.0 | 30.9 | |
| 自适应域提示 | 6 | 38.3 | 31.8 | 38.6 | 31.7 | ||
拼接域无关提示和 自适应域提示 | 拼接1 | 4 | 38.3 | 32.3 | 17.7 | 38.6 | 31.7 |
| 拼接2 | 8 | 38.6 | 32.5 | ||||
| 拼接3 | 10 | 38.3 | 17.6 | 38.6 | 31.8 | ||
| 拼接4 | 6 | 33.3 | 18.5 | 39.1 | 32.3 | ||
| 方法 | mAP | mPT | |||
|---|---|---|---|---|---|
| Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | ||
| Baseline(C-Gap) | 36.3 | 30.1 | 17.2 | 37.7 | 30.3 |
| +IMMI(stage1) | 31.6 | 16.8 | 37.4 | 30.8 | |
| +IMMI(stage2) | 32.3 | 17.2 | 38.5 | 31.4 | |
| +IMMI(stage3) | 37.3 | 18.9 | |||
| +IMMI(stage1、2、3) | 38.4 | 33.3 | 39.1 | 32.3 | |
Tab. 8 Impact of applying IMMI to different stages of ResNet-101
| 方法 | mAP | mPT | |||
|---|---|---|---|---|---|
| Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | ||
| Baseline(C-Gap) | 36.3 | 30.1 | 17.2 | 37.7 | 30.3 |
| +IMMI(stage1) | 31.6 | 16.8 | 37.4 | 30.8 | |
| +IMMI(stage2) | 32.3 | 17.2 | 38.5 | 31.4 | |
| +IMMI(stage3) | 37.3 | 18.9 | |||
| +IMMI(stage1、2、3) | 38.4 | 33.3 | 39.1 | 32.3 | |
| 方法 | mAP | mPT | |||
|---|---|---|---|---|---|
| Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | ||
| Baseline(C-Gap) | 36.3 | 30.1 | 17.2 | 37.7 | 30.3 |
| Cross Attention | 38.7 | 31.9 | 17.4 | 38.2 | 30.8 |
| IGLI-only | 32.3 | 17.8 | |||
| TGLI-only | 37.8 | 31.9 | 17.1 | 38.0 | 31.2 |
| 加法融合 | 37.8 | 32.4 | 18.0 | 38.7 | 31.7 |
| 拼接融合 | 38.3 | 18.2 | 38.3 | ||
| 本文方法 | 38.4 | 33.3 | 18.5 | 39.1 | 32.3 |
Tab. 9 Impact of cross-modal interaction
| 方法 | mAP | mPT | |||
|---|---|---|---|---|---|
| Night Sunny | Dusk Rainy | Night Rainy | Daytime Foggy | ||
| Baseline(C-Gap) | 36.3 | 30.1 | 17.2 | 37.7 | 30.3 |
| Cross Attention | 38.7 | 31.9 | 17.4 | 38.2 | 30.8 |
| IGLI-only | 32.3 | 17.8 | |||
| TGLI-only | 37.8 | 31.9 | 17.1 | 38.0 | 31.2 |
| 加法融合 | 37.8 | 32.4 | 18.0 | 38.7 | 31.7 |
| 拼接融合 | 38.3 | 18.2 | 38.3 | ||
| 本文方法 | 38.4 | 33.3 | 18.5 | 39.1 | 32.3 |
| 方法 | Params/106 | GFLOPs | mPT/% |
|---|---|---|---|
| C-Gap | 146.2 | 387.5 | 30.3 |
| 本文方法 | 151.7 | 397.7 | 32.3 |
| 剪枝 | 109.3 | 278.4 | 31.9 |
Tab. 10 Comparison of model parameters and computational complexity
| 方法 | Params/106 | GFLOPs | mPT/% |
|---|---|---|---|
| C-Gap | 146.2 | 387.5 | 30.3 |
| 本文方法 | 151.7 | 397.7 | 32.3 |
| 剪枝 | 109.3 | 278.4 | 31.9 |
| [1] | LeCUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. |
| [2] | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge: MIT Press, 2015: 91-99. |
| [3] | WANG W, LI H, WANG C, et al. Deep label propagation with nuclear norm maximization for visual domain adaptation[J]. IEEE Transactions on Image Processing, 2025, 34: 1246-1258. |
| [4] | CHEN Y, LI W, SAKARIDIS C, et al. Domain adaptive Faster R-CNN for object detection in the wild[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 3339-3348. |
| [5] | CHEN C, ZHENG Z, DING X, et al. Harmonizing transferability and discriminability for adapting object detectors[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 8866-8875. |
| [6] | HSU C C, TSAI Y H, LIN Y Y, et al. Every pixel matters: center-aware feature alignment for domain adaptive object detector[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12354. Cham: Springer, 2020: 733-748. |
| [7] | ZHENG Y, HUANG D, LIU S, et al. Cross-domain object detection through coarse-to-fine feature adaptation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13763-13772. |
| [8] | 桑雨,贡同,赵琛,等. 具有光度对齐的域适应夜间目标检测方法[J]. 计算机应用, 2026, 46(1): 242-251. |
| SANG Y, GONG T, ZHAO C, et al. Domain-adaptive nighttime object detection method with photometric alignment[J]. Journal of Computer Applications, 2026, 46(1): 242-251. | |
| [9] | LI H, PAN S J, WANG S, et al. Domain generalization with adversarial feature learning[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5400-5409. |
| [10] | CARLUCCI F M, D’INNOCENTE A, BUCCI S, et al. Domain generalization by solving jigsaw puzzles[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2224-2233. |
| [11] | ZHOU K, YANG Y, HOSPEDALES T, et al. Deep domain-adversarial image generation for domain generalization[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 13025-13032. |
| [12] | YUAN J, MA X, CHEN D, et al. Domain-specific bias filtering for single labeled domain generalization[J]. International Journal of Computer Vision, 2023, 131(2): 552-571. |
| [13] | ZHENG G, HUAI M, ZHANG A, et al. AdvST: revisiting data augmentations for single domain generalization[C]// Proceedings of the 38th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2024: 21832-21840. |
| [14] | SU Z, YAO K, YANG X, et al. Rethinking data augmentation for single-source domain generalization in medical image segmentation[C]// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 2366-2374. |
| [15] | 史彩娟,郑远帆,任弼娟,等. 单域泛化X-ray乳腺肿瘤检测[J]. 中国图象图形学报, 2024, 29(3): 725-740. |
| SHI C J, ZHENG Y F, REN B J, et al. Single-domain generalized breast tumor detection in X-ray images[J]. Journal of Image and Graphics, 2024, 29(3): 725-740. | |
| [16] | WU A, DENG C. Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 837-846. |
| [17] | LIN C, YUAN Z, ZHAO S, et al. Domain-invariant disentangled network for generalizable object detection[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 8751-8760. |
| [18] | 刘袁缘,王超凡,王文斌,等. 面向多种天气场景下目标检测的多域动态平均教师模型[J]. 计算机辅助设计与图形学学报, 2024, 36(3): 388-398. |
| LIU Y Y, WANG C F, WANG W B, et al. Multi-domain dynamic mean teacher for object detection in complex weather[J]. Journal of Computer-Aided Design and Computer Graphics, 2024, 36(3): 388-398. | |
| [19] | QI L, DONG P, XIONG T, et al. DoubleAUG: single-domain generalized object detector in urban via color perturbation and dual-style memory[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20(5): No.126. |
| [20] | FAN Q, SEGU M, TAI Y W, et al. Towards robust object detection invariant to real-world domain shifts[EB/OL]. [2024-04-22].. |
| [21] | RAO Z, GUO J, TANG L, et al. SRCD: semantic reasoning with compound domains for single-domain generalized object detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(7): 12497-12506. |
| [22] | LEE W, HONG D, LIM H, et al. Object-aware domain generalization for object detection[C]// Proceedings of the 38th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2024: 2947-2955. |
| [23] | ZHAO Y, ZHONG Z, ZHAO N, et al. Style-hallucinated dual consistency learning: a unified framework for visual domain generalization[J]. International Journal of Computer Vision, 2024, 132(3): 837-853. |
| [24] | VIDIT V, ENGILBERGE M, SALZMANN M. CLIP the gap: a single domain generalization approach for object detection[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 3219-3229. |
| [25] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 8748-8763. |
| [26] | DENG L, WU A, WANG Y, et al. Prompt-driven dynamic object-centric learning for single domain generalization[C]// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 17606-17615. |
| [27] | LI H, WANG W, WANG C, et al. Phrase grounding-based style transfer for single-domain generalized object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2026, 36(1): 106-118. |
| [28] | ZHOU K, YANG J, LOY C C, et al. Learning to prompt for vision-language models[J]. International Journal of Computer Vision, 2022, 130(9): 2337-2348. |
| [29] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
| [30] | KONG X, DONG C, ZHANG L. Towards effective multiple-in-one image restoration: a sequential and prompt learning strategy[EB/OL]. [2024-04-22].. |
| [31] | HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2980-2988. |
| [32] | YU F, CHEN H, WANG X, et al. BDD100K: a diverse driving dataset for heterogeneous multitask learning[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 2633-2642. |
| [33] | SAKARIDIS C, DAI D, VAN GOOL L. Semantic foggy scene understanding with synthetic data[J]. International Journal of Computer Vision, 2018, 126(9): 973-992. |
| [34] | HASSABALLAH M, KENK M A, MUHAMMAD K, et al. Vehicle detection and tracking in adverse weather using a deep learning framework[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(7): 4230-4242. |
| [35] | JOHNSON-ROBERSON M, BARTO C, MEHTA R, et al. Driving in the matrix: can virtual worlds replace human-generated annotations for real world tasks[EB/OL]. [2024-04-22].. |
| [36] | CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 3213-3223. |
| [37] | GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the KITTI vision benchmark suite[C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 3354-3361. |
| [38] | WANG K, FU X, GE C, et al. Towards generalized UAV object detection: a novel perspective from frequency domain disentanglement[J]. International Journal of Computer Vision, 2024, 132(11): 5410-5438. |
| [39] | DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway: IEEE, 2019: 213-226. |
| [40] | DU D, QI Y, YU H, et al. The unmanned aerial vehicle benchmark: object detection and tracking[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11214. Cham: Springer, 2018: 375-391. |
| [41] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
| [42] | PAN X, ZHAN X, SHI J, et al. Switchable whitening for deep representation learning[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1863-1871. |
| [43] | PAN X, LUO P, SHI J, et al. Two at once: enhancing learning and generalization capacities via IBN-Net[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11208. Cham: Springer, 2018: 484-500. |
| [44] | HUANG L, ZHOU Y, ZHU F, et al. Iterative normalization: beyond standardization towards efficient whitening[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4869-4878. |
| [45] | CHOI S, JUNG S, YUN H, et al. RobustNet: improving domain generalization in urban-scene segmentation via instance selective whitening[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 11575-11585. |
| [46] | HUANG Z, WANG H, XING E P, et al. Self-challenging improves cross-domain generalization[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12347. Cham: Springer, 2020: 124-140. |
| [47] | ZHANG X, CUI P, XU R, et al. Deep stable learning for out-of-distribution generalization[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 5368-5378. |
| [48] | XU Q, ZHANG R, ZHANG Y, et al. A Fourier-based framework for domain generalization[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14378-14387. |
| [49] | XU M, QIN L, CHEN W, et al. Multi-view adversarial discriminator: mine the non-causal factors for object detection in unseen domains[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 8103-8112. |
| [1] | Yiheng SUN, Maofu LIU. Tender information extraction method based on prompt tuning of knowledge [J]. Journal of Computer Applications, 2025, 45(4): 1169-1176. |
| [2] | Can MA, Ruizhang HUANG, Lina REN, Ruina BAI, Yaoyao WU. Chinese spelling correction method based on LLM with multiple inputs [J]. Journal of Computer Applications, 2025, 45(3): 849-855. |
| [3] | Yan YANG, Feng YE, Dong XU, Xuejie ZHANG, Jin XU. Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning [J]. Journal of Computer Applications, 2025, 45(3): 785-793. |
| [4] | Yonggang GONG, Shuhan CHEN, Xiaoqin LIAN, Qiansheng LI, Hongming MO, Hongyu LIU. Entity-relation extraction strategy in Chinese open-domains based on large language model [J]. Journal of Computer Applications, 2025, 45(10): 3121-3130. |
| [5] | Peng HUANG, Jiayu LIN, Zuhong LIANG. Unsupervised contrastive learning for Chinese with mutual information and prompt learning [J]. Journal of Computer Applications, 2025, 45(10): 3101-3110. |
| [6] | Bin LI, Min LIN, Siriguleng, Yingjie GAO, Yurong WANG, Shujun ZHANG. Joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network [J]. Journal of Computer Applications, 2025, 45(1): 75-81. |
| [7] | Xindong YOU, Yingzi WEN, Xinpeng SHE, Xueqiang LYU. Triplet extraction method for mine electromechanical equipment field [J]. Journal of Computer Applications, 2024, 44(7): 2026-2033. |
| [8] | Junfeng SHEN, Xingchen ZHOU, Can TANG. Dual-channel sentiment analysis model based on improved prompt learning method [J]. Journal of Computer Applications, 2024, 44(6): 1796-1806. |
| [9] | Yingjie GAO, Min LIN, Siriguleng, Bin LI, Shujun ZHANG. Prompt learning method for ancient text sentence segmentation and punctuation based on span-extracted prototypical network [J]. Journal of Computer Applications, 2024, 44(12): 3815-3822. |
| [10] | Junhao LUO, Yan ZHU. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis [J]. Journal of Computer Applications, 2024, 44(1): 79-85. |
| [11] | Mu LI, Yuheng YANG, Xizheng KE. Emotion recognition model based on hybrid-mel gama frequency cross-attention transformer modal [J]. Journal of Computer Applications, 2024, 44(1): 86-93. |
| [12] | Qiang ZHAO, Zhongqing WANG, Hongling WANG. Product summarization extraction model with multimodal information fusion [J]. Journal of Computer Applications, 2024, 44(1): 73-78. |
| [13] | Chunlei WANG, Xiao WANG, Kai LIU. Multimodal knowledge graph representation learning: a review [J]. Journal of Computer Applications, 2024, 44(1): 1-15. |
| [14] | Bihui YU, Xingye CAI, Jingxuan WEI. Few-shot text classification method based on prompt learning [J]. Journal of Computer Applications, 2023, 43(9): 2735-2740. |
| [15] | Menglin HUANG, Lei DUAN, Yuanhao ZHANG, Peiyan WANG, Renhao LI. Prompt learning based unsupervised relation extraction model [J]. Journal of Computer Applications, 2023, 43(7): 2010-2016. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||