1 |
MORENCY L P, MIHALCEA R, DOSHI P. Towards multimodal sentiment analysis: harvesting opinions from the web[C]// Proceedings of the 13th International Conference on Multimodal Interfaces. New York: ACM, 2011: 169-176.
|
2 |
CHOY K L, FAN K K H, LO V. Development of an intelligent customer-supplier relationship management system: the application of case-based reasoning[J]. Industrial Management and Data Systems, 2003, 103(4): 263-274.
|
3 |
MA L, LU Z, SHANG L, et al. Multimodal convolutional neural networks for matching image and sentence[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 2623-2631.
|
4 |
MAO J, XU W, YANG Y, et al. Explain images with multimodal recurrent neural networks[EB/OL]. (2014-10-04) [2023-03-12]..
|
5 |
LI G, DUAN N, FANG Y, et al. Unicoder-VL: a universal encoder for vision and language by cross-modal pre-training[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 11336-11344.
|
6 |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-03) [2023-03-03]..
|
7 |
TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 10347-10357.
|
8 |
YU J, JIANG J. Adapting BERT for target-oriented multimodal sentiment classification[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2019: 5408-5414.
|
9 |
HOU R, CHANG H, MA B, et al. Cross attention network for few-shot classification[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019:4003-4014.
|
10 |
LU J, BATRA D, PARIKH D, et al. ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019, 32:13-23.
|
11 |
RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 8748-8763.
|
12 |
KIM W, SON B, KIM I. ViLT: vision-and-language transformer without convolution or region supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 5583-5594.
|
13 |
DAO T, FU D Y, ERMON S, et al. FlashAttention: fast and memory-efficient exact attention with io-awareness[C/OL]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. (2022) [2023-11-12]..
|
14 |
HE P, LIU X, GAO J, et al. DeBERTa: decoding-enhanced BERT with disentangled attention[EB/OL]. (2021-10-06) [2023-11-12]..
|
15 |
OQUAB M, DARCET T, MOUTAKANNI T, et al. DINOv2: learning robust visual features without supervision[EB/OL]. (2023-04-14)[2023-11-12]..
|
16 |
NIU T, ZHU S, PANG L, et al. Sentiment analysis on multi-view social data[C]// Proceedings of the 2006 International Conference on MultiMedia Modeling, LNCS 9517. Cham: Springer, 2016: 15-27.
|
17 |
李文潇,梅红岩,李雨恬. 基于深度学习的多模态情感分析研究综述[J]. 辽宁工业大学学报(自然科学版), 2022, 42(5):293-298.
|
|
LI W X, MEI H Y, LI Y T. Survey of multimodal sentiment analysis based on deep learning[J]. Journal of Liaoning Institute of Technology (Natural Science Edition), 2022, 42(5):293-298.
|
18 |
郭续,买日旦·吾守尔,古兰拜尔·吐尔洪. 基于多模态融合的情感分析算法研究综述[J]. 计算机工程与应用, 2024, 60(2):1-18.
|
|
GUO X, WUSHOUER M, TUERHONG G. Survey of sentiment analysis algorithms based on multimodal fusion[J]. Computer Engineering and Applications, 2024, 60(2):1-18.
|
19 |
LI J, LI D, XIONG C, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation[C]// Proceedings of the 39th International Conference on Machine Learning. New York: JMLR.org, 2022: 12888-12900.
|
20 |
SINGH A, HU R, GOSWAMI V, et al. FLAVA: a foundational language and vision alignment model[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 15617-15629.
|