1 |
LU J S, YANG J W, BATRA D, et al. Hierarchical question-image co-attention for visual question answering[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Piscataway: IEEE, 2016: 289-297.
|
2 |
BEN-YOUNES H, CADENE R, CORD M, et al. MUTAN: multimodal tucker fusion for visual question answering [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2631-2639. 10.1109/iccv.2017.285
|
3 |
TENEY D, ANDERSON P, HE X D, et al. Tips and tricks for visual question answering: learnings from the 2017 challenge [C]// Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4223-4232. 10.1109/cvpr.2018.00444
|
4 |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90
|
5 |
YU Z, YU J, FAN J P, et al. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017:1839-1848. 10.1109/iccv.2017.202
|
6 |
FUKUI A, PARK D H, YANG D, et al. Multimodal compact bilinear pooling for visual question answering and visual grounding [EB/OL].[2016-09-24]. . 10.18653/v1/d16-1044
|
7 |
赵宏,孔东一.图像特征注意力与自适应注意力融合的图像内容中文描述[J].计算机应用, 2021, 41(9): 2496-2503. 10.11772/j.issn.1001-9081.2020111829
|
|
ZHAO H, KONG D Y. Chinese description of image content based on image feature attention and adaptive attention[J].Journal of Computer Applications, 2021, 41(9): 2496-2503. 10.11772/j.issn.1001-9081.2020111829
|
8 |
陈龙杰,张钰,张玉梅,等.基于多注意力多尺度特征融合的图像描述生成算法[J].计算机应用,2019,39(2):354-359. 10.11772/j.issn.1001-9081.2018071464
|
|
CHEN L J, ZHANG Y, ZHANG Y M, et al. Deep multi-attention and multi-scale neural network for image caption [J]. Journal of Computer Applications, 2019, 39(2):354-359. 10.11772/j.issn.1001-9081.2018071464
|
9 |
CHEN C, HAN D, WANG J, et al. Multimodal encoder-decoder attention networks for visual question answering [J]. IEEE Access, 2020, 8: 35662-35671. 10.1109/access.2020.2975093
|
10 |
YANG Z, HE X, GAO J, et al. Stacked attention networks for image question answering [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 21-29. 10.1109/cvpr.2016.10
|
11 |
D-K NGUYEN, OKATANI T. Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:6087-6096. 10.1109/cvpr.2018.00637
|
12 |
GAO P, JIANG Z K, YOU H X, et al. Dynamic fusion with intra- and inter-modality attention flow for visual question answering[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6639-6648. 10.1109/cvpr.2019.00680
|
13 |
YU Z, YU J, CUI Y H, et al. Deep modular co-attention networks for visual question answering[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6281-6290. 10.1109/cvpr.2019.00644
|
14 |
AGRAWAL A, LU J, ANTOL S, et al. VQA: visual question answering [J]. International Journal of Computer Vision, 2017, 123(1): 4-31. 10.1007/s11263-016-0966-6
|
15 |
NOH H, SEO P H, HAN B. Image question answering using convolutional neural network with dynamic parameter prediction[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 30-38. 10.1109/cvpr.2016.11
|
16 |
ILIEVSKI I, YAN S, FENG J. A focused dynamic attention model for visual question answering [EB/OL]. [2016-04-06]. .
|
17 |
XU H, Ask SAENKO K., attend and answer: exploring question-guided spatial attention for visual question answering [C]// Proceedings of 14th European Conference on Computer Vision . Cham: Springer, 2016: 451-466. 10.1007/978-3-319-46478-7_28
|
18 |
YANG Z, HE X, GAO J, et al. Stacked attention networks for image question answering [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 21-29. 10.1109/cvpr.2016.10
|
19 |
KIM J-H, K-W ON, LIM W, et al. Hadamard product for low-rank bilinear pooling [EB/OL]. [2017-03-26]. .
|
20 |
NAM H, HA J, KIM J. Dual attention networks for multimodal reasoning and matching [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2156-2164. 10.1109/cvpr.2017.232
|
21 |
YU D, FU J, MEI T, et al. Multi-level attention networks for visual question answering[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4187-4195. 10.1109/cvpr.2017.446
|
22 |
YANG C, JIANG M Q, JIANG B, et al. Co-attention network with question type for visual question answering [J]. IEEE Access, 2019, 7: 40771-40781. 10.1109/access.2019.2908035
|
23 |
LIU F, LIU J, LU H Q, et al. Language and visual relations encoding for visual question answering [C]// Proceedings of the 2019 IEEE International Conference on Image Processing. Piscataway: IEEE, 2019:3307-3311. 10.1109/icip.2019.8803670
|
24 |
QIAO Y Y, YU Z, LIU J. VC-VQA: Visual calibration mechanism for visual question answering [C]// Proceedings of the 2020 IEEE International Conference on Image Processing. Piscataway: IEEE, 2020: 1481-1485. 10.1109/icip40778.2020.9190828
|
25 |
YU J, ZHANG W F, LU Y H, et al. Reasoning on the relation: enhancing visual representation for visual question answering and cross-modal retrieval [J]. IEEE Transactions on Multimedia. 2020, 22(12):3196-3209. 10.1109/tmm.2020.2972830
|