《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2461-2470.DOI: 10.11772/j.issn.1001-9081.2021061017
• 数据科学与技术 • 上一篇
收稿日期:
2021-06-15
修回日期:
2021-09-15
接受日期:
2021-10-12
发布日期:
2021-12-27
出版日期:
2022-08-10
通讯作者:
王展青
作者简介:
王晓雨(1997—),女,河南汝州人,硕士研究生,主要研究方向:机器学习、跨模态检索;基金资助:
Xiaoyu WANG, Zhanqing WANG(), Wei XIONG
Received:
2021-06-15
Revised:
2021-09-15
Accepted:
2021-10-12
Online:
2021-12-27
Published:
2022-08-10
Contact:
Zhanqing WANG
About author:
WANG Xiaoyu, born in 1997, M. S. candidate. Her research interests include machine learning, cross-modal retrieval.Supported by:
摘要:
大多数深度监督跨模态哈希方法采用对称的方式学习哈希码,导致其不能有效利用大规模数据集中的监督信息;并且对于哈希码的离散约束问题,常采用的基于松弛的策略会产生较大的量化误差,导致哈希码次优。针对以上问题,提出深度非对称离散跨模态哈希(DADCH)方法。首先构造了深度神经网络和字典学习相结合的非对称学习框架,以学习查询实例和数据库实例的哈希码,从而更有效地挖掘数据的监督信息,减少模型的训练时间;然后采用离散优化算法逐列优化哈希码矩阵,降低哈希码二值化的量化误差;同时为充分挖掘数据的语义信息,在神经网络中添加了标签层进行标签预测,并利用语义信息嵌入将不同类别的判别信息通过线性映射嵌入到哈希码中,增强哈希码的判别性。实验结果表明,在IAPR-TC12、MIRFLICKR-25K和NUS-WIDE数据集上,哈希码长度为64 bit时,所提方法在图像检索文本时的平均精度均值(mAP)较近年来提出的先进的深度跨模态检索方法——自监督对抗哈希(SSAH)分别高出约11.6、5.2、14.7个百分点。
中图分类号:
王晓雨, 王展青, 熊威. 深度非对称离散跨模态哈希方法[J]. 计算机应用, 2022, 42(8): 2461-2470.
Xiaoyu WANG, Zhanqing WANG, Wei XIONG. Deep asymmetric discrete cross-modal hashing method[J]. Journal of Computer Applications, 2022, 42(8): 2461-2470.
网络层 | 设置 |
---|---|
conv1 | k. 64×11×11; s. 4×4, pad 0, LRN, ×2 pool |
conv2 | k. 256×5×5; s. 1×1, pad 2, LRN, ×2 pool |
conv3 | k. 256×3×3; s. 1×1, pad 1 |
conv4 | k. 256×3×3; s. 1×1, pad 1 |
conv5 | k. 256×3×3; s. 1×1, pad 1,Max Pooling |
fc6 | 4 096 |
fc7 | 512 |
hash/label layer | r + c |
表1 图像网络的参数配置
Tab. 1 Parameter configuration of image network
网络层 | 设置 |
---|---|
conv1 | k. 64×11×11; s. 4×4, pad 0, LRN, ×2 pool |
conv2 | k. 256×5×5; s. 1×1, pad 2, LRN, ×2 pool |
conv3 | k. 256×3×3; s. 1×1, pad 1 |
conv4 | k. 256×3×3; s. 1×1, pad 1 |
conv5 | k. 256×3×3; s. 1×1, pad 1,Max Pooling |
fc6 | 4 096 |
fc7 | 512 |
hash/label layer | r + c |
网络层 | 参数配置 |
---|---|
fc1 | 4 096 |
fc2 | 512 |
hash/label layer | r + c |
表2 文本网络的参数配置
Tab. 2 Parameter configuration of text network
网络层 | 参数配置 |
---|---|
fc1 | 4 096 |
fc2 | 512 |
hash/label layer | r + c |
任务 | 方法 | IAPR-TC12 | MIRFLICKR-25K | NUS-WIDE | ||||||
---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
I→T | CVH | 0.342 | 0.336 | 0.330 | 0.557 | 0.554 | 0.554 | 0.374 | 0.366 | 0.361 |
STMH | 0.377 | 0.400 | 0.413 | 0.613 | 0.621 | 0.627 | 0.471 | 0.486 | 0.494 | |
SCM | 0.369 | 0.366 | 0.380 | 0.671 | 0.682 | 0.685 | 0.540 | 0.548 | 0.555 | |
SePH | 0.444 | 0.456 | 0.463 | 0.712 | 0.719 | 0.723 | 0.603 | 0.613 | 0.621 | |
DCMH | 0.453 | 0.473 | 0.484 | 0.741 | 0.746 | 0.749 | 0.590 | 0.603 | 0.609 | |
ADAH | 0.529 | 0.528 | 0.544 | 0.756 | 0.711 | 0.772 | 0.640 | 0.629 | 0.652 | |
SSAH | 0.500 | 0.533 | 0.553 | 0.782 | 0.790 | 0.800 | 0.642 | 0.636 | 0.639 | |
DADCH | 0.602 | 0.643 | 0.669 | 0.832 | 0.843 | 0.852 | 0.765 | 0.778 | 0.786 | |
T→I | CVH | 0.349 | 0.343 | 0.337 | 0.574 | 0.571 | 0.571 | 0.361 | 0.349 | 0.339 |
STMH | 0.368 | 0.389 | 0.404 | 0.607 | 0.615 | 0.621 | 0.447 | 0.467 | 0.478 | |
SCM | 0.345 | 0.341 | 0.347 | 0.693 | 0.701 | 0.706 | 0.534 | 0.541 | 0.548 | |
SePH | 0.442 | 0.456 | 0.464 | 0.721 | 0.726 | 0.731 | 0.598 | 0.602 | 0.610 | |
DCMH | 0.518 | 0.537 | 0.546 | 0.782 | 0.790 | 0.793 | 0.638 | 0.651 | 0.657 | |
ADAH | 0.535 | 0.556 | 0.564 | 0.792 | 0.806 | 0.807 | 0.678 | 0.697 | 0.703 | |
SSAH | 0.516 | 0.551 | 0.570 | 0.791 | 0.795 | 0.803 | 0.669 | 0.662 | 0.666 | |
DADCH | 0.611 | 0.649 | 0.671 | 0.835 | 0.847 | 0.857 | 0.767 | 0.786 | 0.795 |
表3 不同方法的mAP对比
Tab. 3 mAP comparison of different methods
任务 | 方法 | IAPR-TC12 | MIRFLICKR-25K | NUS-WIDE | ||||||
---|---|---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | 16 bit | 32 bit | 64 bit | ||
I→T | CVH | 0.342 | 0.336 | 0.330 | 0.557 | 0.554 | 0.554 | 0.374 | 0.366 | 0.361 |
STMH | 0.377 | 0.400 | 0.413 | 0.613 | 0.621 | 0.627 | 0.471 | 0.486 | 0.494 | |
SCM | 0.369 | 0.366 | 0.380 | 0.671 | 0.682 | 0.685 | 0.540 | 0.548 | 0.555 | |
SePH | 0.444 | 0.456 | 0.463 | 0.712 | 0.719 | 0.723 | 0.603 | 0.613 | 0.621 | |
DCMH | 0.453 | 0.473 | 0.484 | 0.741 | 0.746 | 0.749 | 0.590 | 0.603 | 0.609 | |
ADAH | 0.529 | 0.528 | 0.544 | 0.756 | 0.711 | 0.772 | 0.640 | 0.629 | 0.652 | |
SSAH | 0.500 | 0.533 | 0.553 | 0.782 | 0.790 | 0.800 | 0.642 | 0.636 | 0.639 | |
DADCH | 0.602 | 0.643 | 0.669 | 0.832 | 0.843 | 0.852 | 0.765 | 0.778 | 0.786 | |
T→I | CVH | 0.349 | 0.343 | 0.337 | 0.574 | 0.571 | 0.571 | 0.361 | 0.349 | 0.339 |
STMH | 0.368 | 0.389 | 0.404 | 0.607 | 0.615 | 0.621 | 0.447 | 0.467 | 0.478 | |
SCM | 0.345 | 0.341 | 0.347 | 0.693 | 0.701 | 0.706 | 0.534 | 0.541 | 0.548 | |
SePH | 0.442 | 0.456 | 0.464 | 0.721 | 0.726 | 0.731 | 0.598 | 0.602 | 0.610 | |
DCMH | 0.518 | 0.537 | 0.546 | 0.782 | 0.790 | 0.793 | 0.638 | 0.651 | 0.657 | |
ADAH | 0.535 | 0.556 | 0.564 | 0.792 | 0.806 | 0.807 | 0.678 | 0.697 | 0.703 | |
SSAH | 0.516 | 0.551 | 0.570 | 0.791 | 0.795 | 0.803 | 0.669 | 0.662 | 0.666 | |
DADCH | 0.611 | 0.649 | 0.671 | 0.835 | 0.847 | 0.857 | 0.767 | 0.786 | 0.795 |
任务 | 数据集 | DADCH-Ⅰ | DADCH-Ⅱ | DADCH-Ⅲ | DADCH |
---|---|---|---|---|---|
I→T | MIRFLICKR-25K | 0.789 | 0.827 | 0.831 | 0.843 |
NUS-WIDE | 0.717 | 0.742 | 0.745 | 0.778 | |
T→I | MIRFLICKR-25K | 0.796 | 0.830 | 0.828 | 0.847 |
NUS-WIDE | 0.723 | 0.746 | 0.753 | 0.786 |
表4 DADCH变体的mAP对比
Tab. 4 mAP comparison of DADCH variants
任务 | 数据集 | DADCH-Ⅰ | DADCH-Ⅱ | DADCH-Ⅲ | DADCH |
---|---|---|---|---|---|
I→T | MIRFLICKR-25K | 0.789 | 0.827 | 0.831 | 0.843 |
NUS-WIDE | 0.717 | 0.742 | 0.745 | 0.778 | |
T→I | MIRFLICKR-25K | 0.796 | 0.830 | 0.828 | 0.847 |
NUS-WIDE | 0.723 | 0.746 | 0.753 | 0.786 |
1 | SONG J K, HE T, GAO L L, et al. Unified binary generative adversarial networks for image retrieval and compression[J]. International Journal of Computer Vision, 2020, 128(8/9): 2243-2264. 10.1007/s11263-020-01305-2 |
2 | LIN J, LI Z C, TANG J H. Discriminative deep hashing for scalable face image retrieval [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 2266-2272. 10.24963/ijcai.2017/315 |
3 | 欧卫华,刘彬,周永辉,等.跨模态检索研究综述[J].贵州师范大学学报(自然科学版), 2018, 36(2): 114-120. |
OU W H, LIU B, ZHOU Y H, et al. A review of cross-modal retrieval research[J]. Journal of Guizhou Normal University (Natural Sciences), 2018, 36(2): 114-120. | |
4 | 邓一姣,张凤荔,陈学勤,等.面向跨模态检索的协同注意力网络模型[J].计算机科学, 2020, 47(4): 54-59. 10.11896/jsjkx.190600181 |
DENG Y J, ZHANG F L, CHEN X Q, et al. Collaborative attention network model for cross-modal retrieval[J]. Computer Science, 2020, 47(4): 54-59. 10.11896/jsjkx.190600181 | |
5 | PENG Y X, HUAGN X, ZHAO Y Z. An overview of cross-media retrieval: concepts, methodologies, benchmarks, and challenges[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9): 2372-2385. |
6 | WANG D, GAO X B, WANG X M, et al. Semantic topic multimodal hashing for cross-media retrieval [C]// Proceedings of the 24th International Joint Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2015: 3890-3896. |
7 | KUMAR S, UDUPA R. Learning hash functions for cross-view similarity search [C]// Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2011: 1360-1365. |
8 | ZHANG D Q, LI W J. Large-scale supervised multimodal hashing with semantic correlation maximization [C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2014: 2177-2183. 10.1609/aaai.v28i1.8995 |
9 | LIN Z J, DING G G, HU M Q, et al. Semantics-preserving hashing for cross-view retrieval [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 3864-3872. 10.1109/cvpr.2015.7299011 |
10 | ZHANG J, PENG Y X, YUAN M K. Unsupervised generative adversarial cross-modal hashing [C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 539-546. 10.1609/aaai.v32i1.11263 |
11 | ZHANG X, LAN H J, FENG J S. Attention-aware deep adversarial hashing for cross-modal retrieval [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11219. Cham: Springer, 2018: 614-629. |
12 | MENG M, WANG H T, YU J, et al. Asymmetric supervised consistent and specific hashing for cross-modal retrieval[J]. IEEE Transactions on Image Processing, 2021, 30: 986-1000. 10.1109/tip.2020.3038365 |
13 | JIANG Q Y, LI W J. Asymmetric deep supervised hashing [C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 3342-3349. 10.1609/aaai.v32i1.11814 |
14 | QIANG H P, WAN Y, LIU Z Y, et al. Discriminative deep asymmetric supervised hashing for cross-modal retrieval[J]. Knowledge-Based Systems, 2020, 204: No.106188. 10.1016/j.knosys.2020.106188 |
15 | LIN G S, SHEN C H, SUTER D, et al. A general two-step approach to learning-based hashing [C]// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2013: 2552-2559. 10.1109/iccv.2013.317 |
16 | LIN Q B, CAO W M, HE Z Q, et al. Semantic deep cross-modal hashing[J]. Neurocomputing, 2020, 396: 113-122. 10.1016/j.neucom.2020.02.043 |
17 | DING G G, GUO Y C, ZHOU J L. Collective matrix factorization hashing for multimodal data [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 2083-2090. 10.1109/cvpr.2014.267 |
18 | LIU H, JI R R, WU Y J, et al. Cross-modality binary code learning via fusion similarity hashing [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6345-6353. 10.1109/cvpr.2017.672 |
19 | XU X S. Dictionary learning based hashing for cross-modal retrieval [C]// Proceedings of the 24th ACM International Conference on Multimedia. New York: ACM, 2016: 177-181. 10.1145/2964284.2967206 |
20 | HU D, NIE F P, LI X L. Deep binary reconstruction for cross-modal hashing[J]. IEEE Transactions on Multimedia, 2019, 21(4): 973-985. 10.1109/tmm.2018.2866771 |
21 | YANG D J, WU D Y, ZHANG W Q, et al. Deep semantic-alignment hashing for unsupervised cross-modal retrieval [C]// Proceedings of the 2020 International Conference on Multimedia Retrieval. New York: ACM, 2020: 44-52. 10.1145/3372278.3390673 |
22 | TANG J, WANG K, SHAO L. Supervised matrix factorization hashing for cross-modal retrieval[J]. IEEE Transactions on Image Processing, 2016, 25(7): 3157-3166. 10.1109/tip.2016.2564638 |
23 | JIANG Q Y, LI W J. Discrete latent factor model for cross-modal hashing[J]. IEEE Transactions on Image Processing, 2019, 28(7): 3490-3501. 10.1109/tip.2019.2897944 |
24 | WU Y, LUO X, XU X S, et al. Dictionary learning based supervised discrete hashing for cross-media retrieval [C]// Proceedings of the 2018 ACM International Conference on Multimedia Retrieval. New York: ACM, 2018: 222-230. 10.1145/3206025.3206045 |
25 | JIANG Q Y, LI W J. Deep cross-modal hashing [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3270-3278. 10.1109/cvpr.2017.348 |
26 | LI C, DENG C, LI N, et al. Self-supervised adversarial hashing networks for cross-modal retrieval [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4242-4251. 10.1109/cvpr.2018.00446 |
27 | JIN L, LI K, LI Z C, et al. Deep semantic-preserving ordinal hashing for cross-modal similarity search[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(5): 1429-1440. 10.1109/tnnls.2018.2869601 |
28 | CHATFIELD K, SIMONYAN K, VEDALDI A, et al. Return of the devil in the details: delving deep into convolutional nets [C]// Proceedings of the 2014 British Machine Vision Conference. Durham: BMVA Press, 2014: No.54. 10.5244/c.28.6 |
29 | 戚玉丹,张化祥,刘一鹤.基于字典学习的跨媒体检索技术[J].计算机应用研究, 2019, 36(4): 1265-1269. |
QI Y D, ZHANG H X, LIU Y H. Cross-media retrieval technology based on dictionary learning[J]. Application Research of Computers, 2019, 36(4): 1265-1269. | |
30 | WU Y L, WANG S H, HUANG Q M. Multi-modal semantic autoencoder for cross-modal retrieval[J]. Neurocomputing, 2019, 331: 165-175. 10.1016/j.neucom.2018.11.042 |
31 | ZHAN Y B, YU J, YU Z, et al. Comprehensive distance-preserving autoencoders for cross-modal retrieval [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1137-1145. 10.1145/3240508.3240607 |
32 | AMARI S. Backpropagation and stochastic gradient descent method[J]. Neurocomputing, 1993, 5(4/5): 185-196. 10.1016/0925-2312(93)90006-o |
33 | SHEN F M, SHEN C H, LIU W, et al. Supervised discrete hashing [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 37-45. 10.1109/cvpr.2015.7298598 |
34 | ESCALANTE H J, HERNÁNDEZ C A, GONZALEZ J, et al. The segmented and annotated IAPR TC-12 benchmark[J]. Computer Vision and Image Understanding, 2010, 114(4): 419-428. 10.1016/j.cviu.2009.03.008 |
35 | HUISKES M J, LEW M S. The MIR Flickr retrieval evaluation [C]// Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. New York: ACM, 2008: 39-43. 10.1145/1460096.1460104 |
36 | CHUA T S, TANG J H, HONG R C, et al. NUS-WIDE: a real-world web image database from National University of Singapore [C]// Proceedings of the 2009 ACM International Conference on Image and Video Retrieval. New York: ACM, 2009: No.48. 10.1145/1646396.1646452 |
37 | HENDERSON P, FERRARI V. End-to-end training of object class detectors for mean average precision [C]// Proceedings of the 13th Asian Conference on Computer Vision, LNCS 10115. Cham: Springer, 2017: 198-213. |
38 | GOUTTE C, GAUSSIER E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation [C]// Proceedings of the 2005 European Conference on Advances in Information Retrieval Research, LNCS 3408. Berlin: Springer, 2005: 345-359. |
[1] | 杨博, 张恒巍, 李哲铭, 徐开勇. 基于图像翻转变换的对抗样本生成方法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2319-2325. |
[2] | 玄英律, 万源, 陈嘉慧. 基于多尺度卷积和注意力机制的LSTM时间序列分类[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2343-2352. |
[3] | 李坤, 侯庆. 基于注意力机制的轻量型人体姿态估计[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2407-2414. |
[4] | 陈荣源, 姚剑敏, 严群, 林志贤. 基于深度神经网络的视频播放速度识别[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2043-2051. |
[5] | 毛文涛, 吴桂芳, 吴超, 窦智. 基于中国写意风格迁移的动漫视频生成模型[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2162-2169. |
[6] | 于蒙, 何文涛, 周绪川, 崔梦天, 吴克奇, 周文杰. 推荐系统综述[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1898-1913. |
[7] | 陈权, 李莉, 陈永乐, 段跃兴. 面向深度学习可解释性的对抗攻击算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 510-518. |
[8] | 刘芳名, 张鸿. 基于多级语义的判别式跨模态哈希检索算法[J]. 计算机应用, 2021, 41(8): 2187-2192. |
[9] | 王曙燕, 侯则昱, 孙家泽. 面向深度学习的对抗样本差异性检测方法[J]. 计算机应用, 2021, 41(7): 1849-1856. |
[10] | 程美英, 钱乾, 倪志伟, 朱旭辉. 信息筛选多任务优化自组织迁移算法[J]. 计算机应用, 2021, 41(6): 1748-1755. |
[11] | 张明明, 卢庆宁, 李文中, 宋浒. 基于联合动态剪枝的深度神经网络压缩算法[J]. 计算机应用, 2021, 41(6): 1589-1596. |
[12] | 张文烨, 尚方信, 郭浩. 基于Octave卷积的混合精度神经网络量化方法[J]. 计算机应用, 2021, 41(5): 1299-1304. |
[13] | 杨丽, 王时绘, 朱博. 基于动态和静态偏好的兴趣点推荐算法[J]. 计算机应用, 2021, 41(2): 398-406. |
[14] | 李慧博, 赵云霄, 白亮. 基于深度神经网络和门控循环单元的动态图表示学习方法[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3432-3437. |
[15] | 陈彦如, 张涂静娃, 杜千, 冉茂亮, 王红军. 基于深度森林的高铁站室内热舒适度等级预测[J]. 计算机应用, 2021, 41(1): 258-264. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||