Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (11): 3403-3410.DOI: 10.11772/j.issn.1001-9081.2022111707
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Hong YANG(), He ZHANG, Shaoning JIN
Received:
2022-11-18
Revised:
2022-12-25
Accepted:
2022-12-28
Online:
2023-11-14
Published:
2023-11-10
Contact:
Hong YANG
About author:
YANG Hong, born in 1977, Ph. D., associate professor. Her research interests include data mining, behavior recognition.通讯作者:
杨红
作者简介:
杨红(1977—),女,辽宁葫芦岛人,副教授,博士,主要研究方向:数据挖掘、行为识别 yanghong@dlmu.edu.cnCLC Number:
Hong YANG, He ZHANG, Shaoning JIN. Human pose transfer model combining convolution and multi-head attention[J]. Journal of Computer Applications, 2023, 43(11): 3403-3410.
杨红, 张贺, 靳少宁. 融合卷积与多头注意力的人体姿态迁移模型[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3403-3410.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022111707
模块 | SSIM↑ | PSNR/dB↑ | FID↓ | LPIPS↓ |
---|---|---|---|---|
CoT | 0.772 1 | 18.644 3 | 13.339 9 | 0.213 3 |
Transformer | 0.776 4 | 19.007 8 | 11.332 7 | 0.197 2 |
方案(a) | 0.778 7 | 19.053 7 | 11.448 7 | 0.196 4 |
方案(b) | 0.779 0 | 19.057 4 | 11.360 9 | 0.195 4 |
方案(c) | ||||
方案(d) | 0.779 8 | 19.089 3 | 11.135 9 | 0.193 6 |
Tab. 1 Quantitative evaluation of different blocks
模块 | SSIM↑ | PSNR/dB↑ | FID↓ | LPIPS↓ |
---|---|---|---|---|
CoT | 0.772 1 | 18.644 3 | 13.339 9 | 0.213 3 |
Transformer | 0.776 4 | 19.007 8 | 11.332 7 | 0.197 2 |
方案(a) | 0.778 7 | 19.053 7 | 11.448 7 | 0.196 4 |
方案(b) | 0.779 0 | 19.057 4 | 11.360 9 | 0.195 4 |
方案(c) | ||||
方案(d) | 0.779 8 | 19.089 3 | 11.135 9 | 0.193 6 |
注意力头数量 | SSIM↑ | PSNR/dB↑ | FID↓ | LPIPS↓ |
---|---|---|---|---|
1 | 0.776 2 | 18.922 1 | 11.869 5 | 0.202 0 |
2 | 0.778 1 | 18.999 9 | 11.470 0 | 0.196 1 |
4 | 0.778 6 | 19.035 0 | 11.386 5 | 0.195 0 |
8 | 0.779 8 | 19.089 3 | 0.193 6 | |
16 | 11.018 4 |
Tab. 2 Quantitative evaluation of number of attention heads
注意力头数量 | SSIM↑ | PSNR/dB↑ | FID↓ | LPIPS↓ |
---|---|---|---|---|
1 | 0.776 2 | 18.922 1 | 11.869 5 | 0.202 0 |
2 | 0.778 1 | 18.999 9 | 11.470 0 | 0.196 1 |
4 | 0.778 6 | 19.035 0 | 11.386 5 | 0.195 0 |
8 | 0.779 8 | 19.089 3 | 0.193 6 | |
16 | 11.018 4 |
模型 | DeepFashion | Market-1501 | ||||||
---|---|---|---|---|---|---|---|---|
SSIM↑ | PSNR/dB↑ | FID↓ | LPIPS↓ | SSIM↑ | PSNR/dB↑ | FID↓ | LPIPS↓ | |
PG2 | 0.773 0 | 17.532 4 | 49.567 4 | 0.292 8 | 0.270 4 | 14.174 9 | 86.028 8 | 0.361 9 |
PATN | 0.771 7 | 18.254 3 | 20.750 0 | 0.253 6 | 0.281 8 | 14.262 2 | 22.681 4 | 0.319 4 |
ADGAN | 0.771 9 | 18.376 8 | 14.483 3 | 0.225 6 | — | — | — | — |
DIST | 0.767 7 | 18.573 7 | 10.842 9 | 0.225 8 | 0.280 8 | 14.336 8 | 0.281 5 | |
PISE | 0.768 2 | 18.520 8 | 11.514 4 | 0.208 0 | — | — | — | — |
SPIG | 0.775 8 | 18.586 7 | 12.702 7 | 0.210 2 | 0.313 9 | 14.489 4 | 23.057 3 | 0.277 7 |
DPTN | 19.149 2 | 11.466 4 | 0.285 4 | 18.994 6 | 0.271 1 | |||
本文模型 | 0.779 8 | 0.193 6 | 14.572 7 | 24.690 7 |
Tab. 3 Comparison of results of different models
模型 | DeepFashion | Market-1501 | ||||||
---|---|---|---|---|---|---|---|---|
SSIM↑ | PSNR/dB↑ | FID↓ | LPIPS↓ | SSIM↑ | PSNR/dB↑ | FID↓ | LPIPS↓ | |
PG2 | 0.773 0 | 17.532 4 | 49.567 4 | 0.292 8 | 0.270 4 | 14.174 9 | 86.028 8 | 0.361 9 |
PATN | 0.771 7 | 18.254 3 | 20.750 0 | 0.253 6 | 0.281 8 | 14.262 2 | 22.681 4 | 0.319 4 |
ADGAN | 0.771 9 | 18.376 8 | 14.483 3 | 0.225 6 | — | — | — | — |
DIST | 0.767 7 | 18.573 7 | 10.842 9 | 0.225 8 | 0.280 8 | 14.336 8 | 0.281 5 | |
PISE | 0.768 2 | 18.520 8 | 11.514 4 | 0.208 0 | — | — | — | — |
SPIG | 0.775 8 | 18.586 7 | 12.702 7 | 0.210 2 | 0.313 9 | 14.489 4 | 23.057 3 | 0.277 7 |
DPTN | 19.149 2 | 11.466 4 | 0.285 4 | 18.994 6 | 0.271 1 | |||
本文模型 | 0.779 8 | 0.193 6 | 14.572 7 | 24.690 7 |
1 | GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems — Volume 2. Cambridge: MIT Press, 2014: 2672-2680. |
2 | KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. (2022-12-10) [2023-03-17].. 10.1561/2200000056 |
3 | MA L, JIA X, SUN Q, et al. Pose guided person image generation[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 405-415. |
4 | ESSER P, SUTTER E. A variational U-Net for conditional appearance and shape generation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8857-8866. 10.1109/cvpr.2018.00923 |
5 | LI Y, HUANG C, LOY C C. Dense intrinsic appearance flow for human pose transfer[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3688-3697. 10.1109/cvpr.2019.00381 |
6 | REN Y, YU X, CHEN J, et al. Deep image spatial transformation for person image generation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7687-7696. 10.1109/cvpr42600.2020.00771 |
7 | LV Z, LI X, LI X, et al. Learning semantic person image generation by region-adaptive normalization[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10801-10810. 10.1109/cvpr46437.2021.01066 |
8 | ZHANG J, LI K, LAI Y K, et al. PISE: person image synthesis and editing with decoupled GAN[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 7978-7986. 10.1109/cvpr46437.2021.00789 |
9 | TANG H, BAI S, ZHANG L, et al. XingGAN for person image generation[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12370. Cham: Springer, 2020: 717-734. |
10 | ZHU Z, HUANG T, SHI B, et al. Progressive pose attention transfer for person image generation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2342-2351. 10.1109/cvpr.2019.00245 |
11 | ZHANG P, YANG L, LAI J, et al. Exploring dual-task correlation for pose guided person image generation[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 7703-7712. 10.1109/cvpr52688.2022.00756 |
12 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
13 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
14 | LI X, WANG W, HU X, et al. Selective kernel networks[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 510-519. 10.1109/cvpr.2019.00060 |
15 | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11211. Cham: Springer, 2018: 3-19. |
16 | SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck Transformers for visual recognition[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 16514-16524. 10.1109/cvpr46437.2021.01625 |
17 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-03) [2022-06-17].. |
18 | LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision Tansformer using shifted windows[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002. 10.1109/iccv48922.2021.00986 |
19 | DONG X, BAO J, CHEN D, et al. CSWin Transformer: a general vision Transformer backbone with cross-shaped windows[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 12114-12124. 10.1109/cvpr52688.2022.01181 |
20 | VASWANI A, RAMACHANDRAN P, SRINIVAS A, et al. Scaling local self-attention for parameter efficient visual backbones[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12889-12899. 10.1109/cvpr46437.2021.01270 |
21 | LI Y, YAO T, PAN Y, et al. Contextual Transformer networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 45(2): 1489-1500. |
22 | DAI Z, LIU H, LE Q V, et al. CoAtNet: marrying convolution and attention for all data sizes[C]// Proceedings of the 35th Conference on Neural Information Processing Systems (2021) [2022-06-17].. |
23 | PAN X, GE C, LU R, et al. On the integration of self-attention and convolution[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 805-815. 10.1109/cvpr52688.2022.00089 |
24 | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention, LNCS 9351. Cham: Springer, 2015: 234-241. |
25 | JIANG Y, CHANG S, WANG Z. TransGAN: two pure transformers can make one strong GAN, and that can scale up[C]// Proceedings of the 35th Conference on Neural Information Processing Systems (2021) [2022-06-17].. |
26 | HUDSON D A, ZITNICK C L. Generative adversarial transformers[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 4487-4499. |
27 | JOHNSON J, ALAHI A, LI F F. Perceptual losses for real-time style transfer and super-resolution[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9906. Cham: Springer, 2016: 694-711. |
28 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10) [2022-06-17].. |
29 | ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5967-5976. 10.1109/cvpr.2017.632 |
30 | GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 5769-5779. |
31 | LIU Z, LUO P, QIU S, et al. DeepFashion: powering robust clothes recognition and retrieval with rich annotations[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1096-1104. 10.1109/cvpr.2016.124 |
32 | ZHENG L, SHEN L, TIAN L, et al. Scalable person re-identification: a benchmark[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1116-1124. 10.1109/iccv.2015.133 |
33 | CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1302-1310. 10.1109/cvpr.2017.143 |
34 | WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. 10.1109/tip.2003.819861 |
35 | HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6629-6640. 10.48550/arXiv.1706.08500 |
36 | ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 586-595. 10.1109/cvpr.2018.00068 |
37 | KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30) [2022-06-17].. |
38 | MEN Y, MAO Y, JIANG Y, et al. Controllable person image synthesis with attribute-decomposed GAN[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5083-5092. 10.1109/cvpr42600.2020.00513 |
[1] | Chuanlin PANG, Rui TANG, Ruizhi ZHANG, Chuan LIU, Jia LIU, Shibo YUE. Distributed power allocation algorithm based on graph convolutional network for D2D communication systems [J]. Journal of Computer Applications, 2024, 44(9): 2855-2862. |
[2] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[3] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[4] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[5] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[6] | Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI. Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network [J]. Journal of Computer Applications, 2024, 44(9): 2952-2957. |
[7] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[8] | Yubo ZHAO, Liping ZHANG, Sheng YAN, Min HOU, Mao GAO. Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation [J]. Journal of Computer Applications, 2024, 44(8): 2421-2429. |
[9] | Chunxue ZHANG, Liqing QIU, Cheng’ai SUN, Caixia JING. Purchase behavior prediction model based on two-stage dynamic interest recognition [J]. Journal of Computer Applications, 2024, 44(8): 2365-2371. |
[10] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[11] | Caiqin WANG, Yuhao ZHOU, Shunxiang ZHANG, Yanhui WANG, Xiaolong WANG. Aspect-opinion pair extraction of new energy vehicle complaint text based on context enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2430-2436. |
[12] | Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557. |
[13] | Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization [J]. Journal of Computer Applications, 2024, 44(7): 1987-1994. |
[14] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[15] | Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||