Unsupervised face attribute editing method based on dynamic convolutional autoencoder

doi:10.11772/j.issn.1001-9081.2025040398

Abstract

Abstract:

Unsupervised face attribute editing methods based on the latent space of Generative Adversarial Networks （GANs） offer advantages of high efficiency and only label-free data required， but they still face challenges in terms of decoupling and controllability， for instance， modifying a specific face attribute may alter other attributes inadvertently， thereby affecting editing quality， and precise control over the degree of attribute modification remains difficult. To address these issues， a dynamic convolutional Autoencoder-based Unsupervised Face Attribute Editing （AUFAE） method was proposed to achieve precise face attribute editing by learning effective semantic vectors in the latent space. Specifically， a Dynamic Convolutional AutoEncoder Network （DCAE-Net） was designed as the backbone， where Dynamic Convolution （DyConv） was utilized by the encoder to extract local latent-space features adaptively， thereby learning semantic vectors with local characteristics. A Channel Attention （CA） mechanism was incorporated into the decoder to establish nonlinear dependencies between channels， thereby allowing the model to focus on feature channels relevant to different semantics autonomously and enhancing the independence of semantic vector learning. To improve decoupling and controllability of semantic vectors， an attribute boundary vector-based loss function was introduced to train DCAE-Net. Additionally， a soft orthogonality loss was applied to ensure mutual independence of semantic vectors， thereby further boosting decoupling performance. Experimental results show that on three pre-trained GAN generation models， compared with three mainstream face attribute editing methods， AUFAE has the Fréchet Inception Distance （FID） decreased by 37.43%-50.21%， the Learned Perceptual Image Patch Similarity （LPIPS） decreased by 23.61%-42.85%， and the Structural Similarity Index Measure （SSIM） increased by 7.04%-13.42%. On intuitive vision， AUFAE does not exhibit attribute coupling during face attribute editing process. It can be seen that AUFAE can alleviate the attribute coupling in face editing process effectively and achieve more accurate face attribute editing.

Key words: Generative Adversarial Network (GAN), semantic vector, face attribute editing, attribute boundary vector, Dynamic Convolution (DyConv)

摘要：

基于生成对抗网络（GAN）潜空间的无监督人脸属性编辑方法具有效率高和无需标注数据的优点，然而这类方法在解耦性和可控性方面仍面临挑战，如在操控特定人脸属性时，可能会引起其他属性的意外变化，从而影响编辑效果，以及难以精确控制所编辑人脸属性的变化程度。针对上述问题，提出基于动态卷积自编码器的无监督人脸属性编辑（AUFAE）方法。该方法通过在潜空间中学习有效的语义向量，实现对人脸属性的精准编辑。具体地，设计动态卷积自编码器网络（DCAE-Net）作为主干网络，该网络的编码器部分采用动态卷积（DyConv）的方式动态提取潜空间的局部特征，从而学习具有局部特性的语义向量；在解码器部分则融入通道注意力（CA）机制建立通道间的非线性依赖关系，使模型能够自主地聚焦不同语义相关的特征通道，有效促进语义向量的独立性学习。为了增强语义向量的解耦性和可控性，引入基于属性边界向量的损失函数训练DCAE-Net。此外，引入软正交损失确保语义向量之间相互独立，进一步提升解耦性能。实验结果表明，在3个预训练GAN生成模型上，与3种主流的人脸属性编辑方法相比，AUFAE的弗雷歇距离（FID）减小了37.43%~50.21%，学习感知图像块相似度（LPIPS）减小了23.61%~42.85%，结构相似性指数（SSIM）提高了7.04%~13.42%。在直观视觉上，AUFAE在人脸属性编辑过程中也未出现属性耦合现象。可见，AUFAE能够有效地缓解人脸编辑过程中的属性耦合现象，并实现更精确的人脸属性编辑。

关键词: 生成对抗网络, 语义向量, 人脸属性编辑, 属性边界向量, 动态卷积

CLC Number:

TP391

Xuan CUI, Bo LIU. Unsupervised face attribute editing method based on dynamic convolutional autoencoder[J]. Journal of Computer Applications, 2026, 46(4): 1300-1308.

崔选, 刘波. 基于动态卷积自编码器的无监督人脸属性编辑方法[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1300-1308.

Figures/Tables 12

Fig. 1 Framework of AUFAE

Fig. 2 Network structure of DCAE-Net

Fig. 3 Schematic diagram of attribute boundary vectors

Fig. 4 Changes of loss functions on three pre-trained GAN models （ProGAN， StyleGAN， and StyleGAN2）

Tab. 1 Experimental environment configuration

环境	项目	具体内容
硬件配置	GPU	NVIDIA A100 80 GB PCIe GPU
硬件配置	CPU	Intel Xeon Platinum 8358
软件配置	操作系统	Ubuntu 18.04.1
软件配置	开发环境	PyTorch 1.10

Tab. 2 Comparison of facial attribute editing effects by different methods

生成模型	属性名	InterFaceGAN			AdaTrans			SDFlow			AUFAE
生成模型	属性名	FID	LPIPS	SSIM	FID	LPIPS	SSIM	FID	LPIPS	SSIM	FID	LPIPS	SSIM
ProGAN	Age	79.10	0.32	0.69	70.65	0.29	0.72	62.95	0.26	0.74	34.54	0.18	0.81
	Hair	50.31	0.25	0.77	76.77	0.30	0.71	60.76	0.24	0.77	38.06	0.20	0.79
	Gender	89.52	0.36	0.65	76.47	0.30	0.71	74.35	0.31	0.69	47.44	0.25	0.73
	Pose	65.06	0.39	0.61	—	—	—	—	—	—	45.90	0.28	0.70
	Smile	45.82	0.21	0.80	75.86	0.32	0.69	60.66	0.23	0.78	33.76	0.18	0.81
	平均	65.96	0.31	0.70	74.94	0.30	0.71	64.68	0.26	0.75	39.94	0.22	0.77
StyleGAN	Age	68.93	0.33	0.68	62.34	0.26	0.74	59.39	0.24	0.72	41.57	0.21	0.80
	Hair	68.63	0.32	0.71	72.55	0.35	0.66	66.50	0.25	0.71	35.78	0.19	0.81
	Gender	82.38	0.38	0.63	78.59	0.33	0.68	68.08	0.26	0.69	46.36	0.24	0.76
	Pose	75.65	0.41	0.61	—	—	—	—	—	—	43.25	0.25	0.76
	Smile	30.65	0.17	0.84	68.21	0.33	0.68	57.42	0.22	0.73	39.59	0.20	0.81
	平均	65.25	0.32	0.69	70.42	0.32	0.69	62.85	0.24	0.71	41.31	0.22	0.79
StyleGAN2	Age	69.39	0.37	0.76	74.03	0.26	0.79	52.41	0.18	0.81	28.98	0.15	0.85
	Hair	61.16	0.34	0.78	81.85	0.25	0.78	45.57	0.28	0.78	37.70	0.20	0.80
	Gender	62.77	0.33	0.79	79.92	0.26	0.79	59.00	0.22	0.77	27.23	0.14	0.86
	Pose	74.95	0.38	0.76	—	—	—	—	—	—	30.18	0.16	0.84
	Smile	69.95	0.36	0.77	44.18	0.25	0.80	50.21	0.26	0.78	30.29	0.16	0.84
	平均	67.64	0.37	0.77	70.00	0.26	0.79	51.80	0.24	0.79	30.88	0.16	0.84

Fig. 5 Visualization comparison of age attribute editing on different generative models

Fig. 6 Visualization comparison of gender attribute editing on different generative models

Fig. 7 Visualization comparison of hair attribute editing on different generative models

Fig. 8 Visualization comparison of smile attribute editing ondifferent generative models

Tab. 3 Quantitative results of ablation experiments

方法序号	组成	FID	LPIPS	SSIM
1	DCAE-Net $+$ no- $L o r t h$	47.16	0.24	0.78
2	MLP	45.01	0.24	0.77
3	DCAE-Net $+$ no- $L b$	53.24	0.28	0.71
4	DCAE-Net $+$ no- $L r e c o n$	42.34	0.23	0.77
5	DCAE-Net	41.31	0.22	0.79

Tab. 3 Quantitative results of ablation experiments

方法序号	组成	FID	LPIPS	SSIM
1	DCAE-Net $+$ no- $L o r t h$	47.16	0.24	0.78
2	MLP	45.01	0.24	0.77
3	DCAE-Net $+$ no- $L b$	53.24	0.28	0.71
4	DCAE-Net $+$ no- $L r e c o n$	42.34	0.23	0.77
5	DCAE-Net	41.31	0.22	0.79

Fig. 9 Visualization results of ablation experiments

References 36

[1]	靳聪，周满玲，林美秀，等. VR/AR-AdaptFace：面向虚拟现实与增强现实的自适应多模态面部替换模型［J］. 中国传媒大学学报（自然科学版）， 2024， 31（4）： 55-63.
	JIN C， ZHOU M L， LIN M X， et al. VR/AR-AdaptFace： an adaptive multimodal face replacement model for virtual and augmented reality ［J］. Journal of Communication University of China （Science and Technology）， 2024， 31（4）： 55-63.
[2]	XU Y， YIN Y， JIANG L， et al. TransEditor： Transformer-based dual-space GAN for highly controllable facial editing ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 7673-7682.
[3]	HE X， ZHU M， CHEN D， et al. Diff-Privacy： diffusion-based face privacy protection ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2024， 34（12）： 13164-13176.
[4]	ZHU J Y， PARK T， ISOLA P， et al. Unpaired image-to-image translation using cycle-consistent adversarial networks ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2242-2251.
[5]	TORBUNOV D， HUANG Y， YU H， et al. UVCGAN： UNet vision Transformer cycle-consistent GAN for unpaired image-to-image translation ［C］// Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2023： 702-712.
[6]	HUANG X， LIU M Y， BELONGIE S， et al. Multimodal unsupervised image-to-image translation ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11207. Cham： Springer， 2018： 179-196.
[7]	MIRZA M， OSINDERO S. Conditional generative adversarial nets［EB/OL］. ［2025-04-10］..
[8]	LIU M， DING Y， XIA M， et al. STGAN： a unified selective transfer network for arbitrary image attribute editing ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3668-3677.
[9]	PERNUŠ M， ŠTRUC V， DOBRIŠEK S. MaskFaceGAN： high-resolution face editing with masked GAN latent code optimization ［J］. IEEE Transactions on Image Processing， 2023， 32： 5893-5908.
[10]	陶玲玲，刘波，李文博，等. 有闭解的可控人脸编辑算法［J］. 计算机应用， 2023， 43（2）： 601-607.
	TAO L L， LIU B， LI W B， et al. Controllable face editing algorithm with closed-form solutions［J］. Journal of Computer Applications， 2023， 43（2）： 601-607.
[11]	CHOI Y， CHOI M， KIM M， et al. StarGAN： unified generative adversarial networks for multi-domain image-to-image translation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8789-8797.
[12]	CHOI Y， UH Y， YOO J， et al. StarGAN v2： diverse image synthesis for multiple domains ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 8185-8194.
[13]	HE Z， ZUO W， KAN M， et al. AttGAN： facial attribute editing by only changing what you want ［J］. IEEE Transactions on Image Processing， 2019， 28（11）： 5464-5478.
[14]	NAVEH C. Multi-directional subspace editing in style-space ［C］// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2023： 7104-7114.
[15]	ZHUANG P， KOYEJO O， SCHWING A G. Enjoy your editing： controllable GANs for image editing via latent space navigation［EB/OL］. ［2025-04-14］. .
[16]	SHEN Y， YANG C， TANG X， et al. InterFaceGAN： interpreting the disentangled face representation learned by GANs ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（4）： 2004-2018.
[17]	HÄRKÖNEN E， HERTZMANN A， LEHTINEN J， et al. GANSpace： discovering interpretable GAN controls ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 9841-9850.
[18]	CHOI J， LEE J， YOON C， et al. Do not escape from the manifold： discovering the local coordinates on the latent space of GANs［EB/OL］. ［2025-04-14］..
[19]	SHEN Y， ZHOU B. Closed-form factorization of latent semantics in GANs ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 1532-1540.
[20]	HUANG Z， MA S， ZHANG J， et al. Adaptive nonlinear latent transformation for conditional face editing ［C］// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2023： 20965-20974.
[21]	LI B， HUANG Z， SHAN H， et al. Semantic latent decomposition with normalizing flows for face editing ［C］// Proceedings of the 2024 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2024： 4165-4169.
[22]	ABDAL R， ZHU P， MITRA N J， et al. StyleFlow： attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows ［J］. ACM Transactions on Graphics， 2021， 40（3）： No.21.
[23]	BAYKAL A C， ANEES A B， CEYLAN D， et al. CLIP-guided StyleGAN inversion for text-driven real image editing ［J］. ACM Transactions on Graphics， 2023， 42（5）： No.172.
[24]	PATASHNIK O， WU Z， SHECHTMAN E， et al. StyleCLIP： text-driven manipulation of StyleGAN imagery ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 2065-2074.
[25]	KINGMA D P， WELLING M. Auto-encoding variational Bayes［EB/OL］. ［2025-03-11］..
[26]	RADFORD A， METZ L， CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks［EB/OL］. ［2025-02-15］..
[27]	JOGIN M， MOHANA， MADHULIKA M S， et al. Feature extraction using Convolution Neural Networks （CNN） and deep learning ［C］// Proceedings of the 3rd IEEE International Conference on Recent Trends in Electronics， Information and Communication Technology. Piscataway： IEEE， 2018： 2319-2323.
[28]	MENG C， YANG J， LIN W， et al. CTA-Net： a CNN-Transformer aggregation network for improving multi-scale feature extraction［EB/OL］. ［2025-04-09］..
[29]	CUI T， LI J， LIU L. TAOTF： a two-stage approximately orthogonal training framework in deep neural networks ［C］// Proceedings of the 26th European Conference on Artificial Intelligence/ 12th Conference on Prestigious Applications of Intelligent Systems. Amsterdam： IOS Press， 2023： 509-516.
[30]	KARRAS T， AILA T， LAINE S， et al. Progressive growing of GANs for improved quality， stability， and variation ［EB/OL］. ［2025-04-14］..
[31]	KARRAS T， LAINE S， AILA T. A style-based generator architecture for generative adversarial networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4396-4405.
[32]	KARRAS T， LAINE S， AITTALA M， et al. Analyzing and improving the image quality of StyleGAN ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8107-8116.
[33]	LIU Z， LUO P， WANG X， et al. Deep learning face attributes in the wild ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 3730-3738.
[34]	HEUSEL M， RAMSAUER H， UNTERTHINER T， et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6629-6640.
[35]	ZHANG R， ISOLA P， EFROS A A， et al. The unreasonable effectiveness of deep features as a perceptual metric ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 586-595.
[36]	WANG Z， BOVIK A C， SHEIKH H R， et al. Image quality assessment： from error visibility to structural similarity ［J］. IEEE Transactions on Image Processing， 2004， 13（4）： 600-612.