Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3099-3106.DOI: 10.11772/j.issn.1001-9081.2022101510
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Yizhen BI, Huan MA, Changqing ZHANG()
Received:
2022-10-11
Revised:
2023-01-24
Accepted:
2023-02-02
Online:
2023-04-12
Published:
2023-10-10
Contact:
Changqing ZHANG
About author:
BI Yizhen, born in 1998, M. S. candidate. His research interests include multimodal learning, machine learning.通讯作者:
张长青
作者简介:
毕以镇(1998—),男,山东潍坊人,硕士研究生,主要研究方向:多模态学习、机器学习CLC Number:
Yizhen BI, Huan MA, Changqing ZHANG. Dynamic evaluation method for benefit of modality augmentation[J]. Journal of Computer Applications, 2023, 43(10): 3099-3106.
毕以镇, 马焕, 张长青. 增广模态收益动态评估方法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3099-3106.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022101510
数据集 | 维度 | 类别数 | |
---|---|---|---|
模态1 | 模态2 | ||
hand | 216 | 76 | 10 |
CMU-MOSEI | 50×300 | 50×35 | 7 |
Dermatology | 11 | 23 | 6 |
TCGA | 64×64×3 | 80 | 3 |
Tab. 1 Description of datasets
数据集 | 维度 | 类别数 | |
---|---|---|---|
模态1 | 模态2 | ||
hand | 216 | 76 | 10 |
CMU-MOSEI | 50×300 | 50×35 | 7 |
Dermatology | 11 | 23 | 6 |
TCGA | 64×64×3 | 80 | 3 |
数据集 | 模态1 | 模态2 | 融合后 |
---|---|---|---|
hand | 97.41±0.31 | 74.91±1.85 | 98.41±0.11 |
CMU-MOSEI | 50.25±0.14 | 41.88±0.27 | 50.37±0.13 |
Dermatology | 79.33±1.69 | 94.33±0.94 | 95.33±1.69 |
TCGA | 47.73±2.68 | 61.87±0.62 | 62.74±1.08 |
Tab. 2 Accuracy comparison between unimodal and multimodal
数据集 | 模态1 | 模态2 | 融合后 |
---|---|---|---|
hand | 97.41±0.31 | 74.91±1.85 | 98.41±0.11 |
CMU-MOSEI | 50.25±0.14 | 41.88±0.27 | 50.37±0.13 |
Dermatology | 79.33±1.69 | 94.33±0.94 | 95.33±1.69 |
TCGA | 47.73±2.68 | 61.87±0.62 | 62.74±1.08 |
数据集 | 模态1 | 模态2 | 测试集大小 |
---|---|---|---|
hand | × | √ | 400 |
CMU-MOSEI | × | √ | 4 643 |
Dermatology | √ | × | 100 |
TCGA | √ | × | 231 |
Tab. 3 Description of modality missing
数据集 | 模态1 | 模态2 | 测试集大小 |
---|---|---|---|
hand | × | √ | 400 |
CMU-MOSEI | × | √ | 4 643 |
Dermatology | √ | × | 100 |
TCGA | √ | × | 231 |
数据集 | 平均融合 | 加权融合 |
---|---|---|
hand | 96.91±0.82 | 98.41±0.11 |
CMU-MOSEI | 48.69±1.03 | 50.37±0.13 |
Dermatology | 92.66±0.47 | 95.33±1.69 |
TCGA | 59.71±2.31 | 62.74±1.08 |
Tab. 4 Accuracy comparison between weighted fusion and average fusion
数据集 | 平均融合 | 加权融合 |
---|---|---|
hand | 96.91±0.82 | 98.41±0.11 |
CMU-MOSEI | 48.69±1.03 | 50.37±0.13 |
Dermatology | 92.66±0.47 | 95.33±1.69 |
TCGA | 59.71±2.31 | 62.74±1.08 |
数据集 | 模态1的自适应权重( | 模态2的自适应权重( |
---|---|---|
hand | 0.624 5 | 0.375 5 |
CMU-MOSEI | 0.897 0 | 0.103 0 |
Dermatology | 0.431 8 | 0.568 2 |
TCGA | 0.390 0 | 0.610 0 |
Tab. 5 Training result of α
数据集 | 模态1的自适应权重( | 模态2的自适应权重( |
---|---|---|
hand | 0.624 5 | 0.375 5 |
CMU-MOSEI | 0.897 0 | 0.103 0 |
Dermatology | 0.431 8 | 0.568 2 |
TCGA | 0.390 0 | 0.610 0 |
1 | RAMACHANDRAM D, TAYLOR G W. Deep multimodal learning: a survey on recent advances and trends[J]. IEEE Signal Processing Magazine, 2017, 34(6):96-108. 10.1109/msp.2017.2738401 |
2 | LEE S, PARK S J, HONG K S. RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 4990-4999. 10.1109/iccv.2017.533 |
3 | VALADA A, MOHAN R, BURGARD W. Self-supervised model adaptation for multimodal semantic segmentation[J]. International Journal of Computer Vision, 2020, 128(5): 1239-1285. 10.1007/s11263-019-01188-y |
4 | FAN L, HUANG W, GAN C, et al. End-to-end learning of motion representation for video understanding[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6016-6025. 10.1109/cvpr.2018.00630 |
5 | GARCIA N C, MORERIO P, MURINO V. Modality distillation with multiple stream networks for action recognition[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11212. Cham: Springer, 2018: 106-121. |
6 | BALNTAS V, DOUMANOGLOU A, SAHIN C, et al. Pose guided RGBD feature learning for 3D object pose estimation[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 3876-3884. 10.1109/iccv.2017.416 |
7 | 吴明晖,张广洁,金苍宏. 基于多模态信息融合的时间序列预测模型[J]. 计算机应用, 2022, 42(8): 2326-2332. 10.11772/j.issn.1001-9081.2021061053 |
WU M H, ZHANG G J, JIN C H. Time series prediction model based on multimodal information fusion[J]. Journal of Computer Applications, 2022, 42(8): 2326-2332. 10.11772/j.issn.1001-9081.2021061053 | |
8 | 余娜,刘彦,魏雄炬,等. 基于注意力机制和金字塔融合的RGB-D室内场景语义分割[J]. 计算机应用, 2022, 42(3): 844-853. 10.11772/j.issn.1001-9081.2021030392 |
YU N, LIU Y, WEI X J, et al. Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion[J]. Journal of Computer Applications, 2022, 42(3): 844-853. 10.11772/j.issn.1001-9081.2021030392 | |
9 | WANG Y, HUANG W, SUN F, et al. Deep multimodal fusion by channel exchanging[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2020: 4835-4845. |
10 | HAZIRBAS C, MA L, DOMOKOS C, et al. FuseNet: incorporating depth into semantic segmentation via fusion-based cnn architecture[C]// Proceedings of the 2016 Asian Conference on Computer Vision, LNCS 10111. Cham: Springer, 2017: 213-228. |
11 | ZENG J, TONG Y, HUANG Y, et al. Deep surface normal estimation with hierarchical RGB-D fusion[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6146-6155. 10.1109/cvpr.2019.00631 |
12 | DU D, WANG L, WANG H, et al. Translate-to-recognize networks for RGB-D scene recognition[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 11828-11837. 10.1109/cvpr.2019.01211 |
13 | GRETTON A, BORGWARDT K M, RASCH M J, et al. A kernel two-sample test[J]. Journal of Machine Learning Research, 2012, 13: 723-773. |
14 | WANG J, WANG Z, TAO D, et al. Learning common and specific features for RGB-D semantic segmentation with deconvolutional networks[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9909. Cham: Springer, 2016: 664-679. |
15 | LIU Z, LI J, SHEN Z, et al. Learning efficient convolutional networks through network slimming[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2755-2763. 10.1109/iccv.2017.298 |
16 | BALTRUŠAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443. 10.1109/tpami.2018.2798607 |
17 | CASTELLANO G, KESSOUS L, CARIDAKIS G. Emotion recognition through multiple modalities: face, body gesture, speech[M]// PETER C, BEALE R. Affect and Emotion in Human-Computer Interaction: From Theory to Applications, LNCS 4868. Berlin: Springer, 2008: 92-103. |
18 | RAMIREZ G A, BALTRUŠAITIS T, MORENCY L P. Modeling latent discriminative dynamic of multi-dimensional affective signals[C]// Proceedings of the 2011 International Conference on Affective Computing and Intelligent Interaction, LNCS 6975. Berlin: Springer, 2011: 396-406. |
19 | LAN Z Z, BAO L, YU S I, et al. Multimedia classification and event detection using double fusion[J]. Multimedia Tools and Applications, 2014, 71(1): 333-347. 10.1007/s11042-013-1391-2 |
20 | CAI T, CAI T T, ZHANG A. Structured matrix completion with applications to genomic data integration[J]. Journal of the American Statistical Association, 2016, 111(514): 621-633. 10.1080/01621459.2015.1021005 |
21 | TRAN L, LIU X, ZHOU J, et al. Missing modalities imputation via cascaded residual autoencoder[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4971-4980. 10.1109/cvpr.2017.528 |
22 | TSAI Y H H, LIANG P P, ZADEH A, et al. Learning factorized multimodal representations[EB/OL]. (2019-05-14) [2023-01-20].. |
23 | WU M, GOODMAN N. Multimodal generative models for scalable weakly-supervised learning[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2018: 5580-5590. |
24 | ZHANG C, HAN Z, CUI Y, et al. CPM-Nets: cross partial multi-view networks[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2019: 559-569. |
25 | AMODEI D, OLAH C, STEINHARDT J, et al. Concrete problems in AI safety[EB/OL]. (2016-07-25) [2023-01-20].. |
26 | JANAI J, GÜNEY F, BEHL A, et al. Computer vision for autonomous vehicles: problems, datasets and state of the art[J]. Foundations and Trends® in Computer Graphics and Vision, 2020, 12(1/2/3): 1-308. 10.1561/0600000079 |
27 | GUO C, PLEISS G, SUN Y, et al. On calibration of modern neural networks[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 1321-1330. |
28 | LIANG S, LI Y, SRIKANT R. Enhancing the reliability of out-of-distribution image detection in neural networks[EB/OL]. (2020-08-30) [2023-01-20].. |
29 | CORBIÈRE C, THOME N, BAR-HEN A, et al. Addressing failure prediction by learning model confidence[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2019: 2902-2913. |
30 | GAL Y, GHAHRAMANI Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning[C]// Proceedings of the 33rd International Conference on Machine Learning. New York: JMLR.org, 2016: 1050-1059. |
31 | DUI R. Multiple Features dataset in UCI machine learning repository[DS/OL]. [2023-01-20].. |
32 | ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2018: 2236-2246. 10.18653/v1/p18-1208 |
33 | CHEN R J, LU M Y, WANG J, et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis[J]. IEEE Transactions on Medical Imaging, 2022, 41(4): 757-770. 10.1109/tmi.2020.3021387 |
34 | ILTER N, GUVENIR H. Dermatology dataset in UCI machine learning repository[DS/OL]. [2023-01-20].. |
[1] | Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731. |
[2] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[3] | Tingjie TANG, Jiajin HUANG, Jin QIN, Hui LU. Session-based recommendation based on graph co-occurrence enhanced multi-layer perceptron [J]. Journal of Computer Applications, 2024, 44(8): 2357-2364. |
[4] | Shibin LI, Jun GONG, Shengjun TANG. Semi-supervised heterophilic graph representation learning model based on Graph Transformer [J]. Journal of Computer Applications, 2024, 44(6): 1816-1823. |
[5] | Yunhua ZHU, Bing KONG, Lihua ZHOU, Hongmei CHEN, Chongming BAO. Multi-view clustering network guided by graph contrastive learning [J]. Journal of Computer Applications, 2024, 44(10): 3267-3274. |
[6] | Junhao LUO, Yan ZHU. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis [J]. Journal of Computer Applications, 2024, 44(1): 79-85. |
[7] | Mu LI, Yuheng YANG, Xizheng KE. Emotion recognition model based on hybrid-mel gama frequency cross-attention transformer modal [J]. Journal of Computer Applications, 2024, 44(1): 86-93. |
[8] | Qiang ZHAO, Zhongqing WANG, Hongling WANG. Product summarization extraction model with multimodal information fusion [J]. Journal of Computer Applications, 2024, 44(1): 73-78. |
[9] | Yirui HUANG, Junwei LUO, Jingqiang CHEN. Multi-modal dialog reply retrieval based on contrast learning and GIF tag [J]. Journal of Computer Applications, 2024, 44(1): 32-38. |
[10] | Chunlei WANG, Xiao WANG, Kai LIU. Multimodal knowledge graph representation learning: a review [J]. Journal of Computer Applications, 2024, 44(1): 1-15. |
[11] | Wei TONG, Liyang HE, Rui LI, Wei HUANG, Zhenya HUANG, Qi LIU. Efficient similar exercise retrieval model based on unsupervised semantic hashing [J]. Journal of Computer Applications, 2024, 44(1): 206-216. |
[12] | Zelin XU, Min YANG, Meng CHEN. Point-of-interest category representation model with spatial and textual information [J]. Journal of Computer Applications, 2023, 43(8): 2456-2461. |
[13] | Kun ZHANG, Fengyu YANG, Fa ZHONG, Guangdong ZENG, Shijian ZHOU. Source code vulnerability detection based on hybrid code representation [J]. Journal of Computer Applications, 2023, 43(8): 2517-2526. |
[14] | Jinghong WANG, Zhixia ZHOU, Hui WANG, Haokang LI. Attribute network representation learning with dual auto-encoder [J]. Journal of Computer Applications, 2023, 43(8): 2338-2344. |
[15] | Kun FU, Yuhan HAO, Minglei SUN, Yinghua LIU. Network representation learning based on autoencoder with optimized graph structure [J]. Journal of Computer Applications, 2023, 43(10): 3054-3061. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||