基于潜在空间的动漫人脸风格迁移与编辑方法

doi:10.16182/j.issn1004731x.joss.24-FZ0797

摘要/Abstract

摘要：

为解决现有图像仿真中动漫风格迁移网络存在图像失真和风格单一等问题，提出了适用于动漫人脸风格迁移和编辑的TGFE-TrebleStyleGAN(text-guided facial editing with TrebleStyleGAN)网络框架。利用潜在空间的向量引导生成人脸图像，并在TrebleStyleGAN中设计了细节控制模块和特征控制模块来约束生成图像的外观。迁移网络生成的图像不仅用作风格控制信号，还用作约束细粒度分割后的编辑区域。引入文本生成图像技术，捕捉风格迁移图像和语义信息的关联性。通过在开源数据集和自建配对标签的动漫人脸数据集上的实验表明：相较于基线模型DualStyleGAN，该模型的FID降低了2.819，SSIM与NIMA分别提升了0.028和0.074。集成风格迁移与编辑的方法能够确保在生成过程中既保留原有动漫人脸细节风格，又具备灵活的编辑能力，减少了图像的失真问题，在生成图像特征的一致性和动漫人脸图像风格相似性中表现更优。

关键词: 动漫风格迁移, 生成对抗网络, 潜在空间, 动漫人脸编辑, 文本引导图像生成

Abstract:

To address issues such as image distortion and style uniformity in existing anime style transfer networks within the field of image simulation, we propose the TGFE-TrebleStyleGAN (text-guided facial editing with TrebleStyleGAN) for anime facial style transfer and editing. This framework leverages vector guidance within the latent space to generate facial imagery and incorporates a detail control module and a feature control module to constrain the aesthetic attributes of the generated images. The images generated by the transfer network serve as style control signals and constraints for fine-grained segmentation. Text-to-image generation technology captures correlations between style-transferred images and semantic information. Experimental results on both open-source datasets and self-constructed datasets with paired attribute tags for anime faces demonstrate that the proposed model reduces the FID score by 2.819 compared to DualStyleGAN, improve the SSIM and NIMA scores by 0.028 and 0.074 respectively. Combining style transfer and editing retains anime facial details while allowing flexible adjustments, minimizing distortion and enhancing feature consistency and style similarity.

Key words: anime style transfer, GAN, latent space, anime facial editing, text-guided image generation

中图分类号:

TP391.9

邓海欣,张凤全,王楠等 . 基于潜在空间的动漫人脸风格迁移与编辑方法[J]. 系统仿真学报, 2024, 36(12): 2834-2849.

Deng Haixin,Zhang Fengquan,Wang Nan,et al . Research on Latent Space-based Anime Face Style Transfer and Editing Techniques[J]. Journal of System Simulation, 2024, 36(12): 2834-2849.

图/表 19

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

表1

图12

图13

表2

图14

图15

图16

图17

参考文献 31

1	Karras T, Laine S, Aila T. A Style-based Generator Architecture for Generative Adversarial Networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 4396-4405.
2	Rombach Robin, Blattmann Andreas, Lorenz Dominik, et al. High-resolution Image Synthesis with Latent Diffusion Models[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 10674-10685.
3	Jang Wonjong, Ju Gwangjin, Jung Yucheol, et al. StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation[J]. ACM Transactions on Graphics, 2021, 40(4): 116.
4	Gatys Leon A, Ecker Alexander S, Bethge Matthias. Image Style Transfer Using Convolutional Neural Networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 2414-2423.
5	Creswell A, White Tom, Dumoulin Vincent, et al. Generative Adversarial Networks: An Overview[J]. IEEE Signal Processing Magazine, 2018, 35(1): 53-65.
6	Kim Junho, Kim Minjae, Kang Hyeonwoo, et al. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-instance Normalization for Image-to-image Translation[EB/OL]. (2020-04-08) [2024-07-01]. .
7	Cho Hansam, Lee Jonghyun, Chang Seunggyu, et al. One-shot Structure-aware Stylized Image Synthesis[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 8302-8311.
8	Liu Songhua, Lin Tianwei, He Dongliang, et al. AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 6629-6638.
9	Chong M J, Forsyth D. GANs N' Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)[EB/OL]. (2021-06-11) [2024-06-17]. .
10	Yang Shuai, Jiang Liming, Liu Ziwei, et al. Pastiche Master: Exemplar-based High-resolution Portrait Style Transfer[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 7683-7692.
11	Zeng Wei, Ren Xiaozhe, Su Teng, et al. PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation[EB/OL]. (2021-04-26) [2024-06-17]. .
12	Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models from Natural Language Supervision[C]//Proceedings of the 38th International Conference on Machine Learning. Chia Laguna Resort: PMLR, 2021: 8748-8763.
13	Ramesh A, Dhariwal P, Nichol A, et al. Hierarchical Text-conditional Image Generation with CLIP Latents[EB/OL]. (2022-04-13) [2024-06-18]. .
14	Yu Tao, Feng Runseng, Feng Ruoyu, et al. Inpaint Anything: Segment Anything Meets Image Inpainting[EB/OL]. (2023-04-13) [2024-07-05]. .
15	Kirillov A, Mintun E, Ravi N, et al. Segment Anything[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2023: 3992-4003.
16	Suvorov Roman, Logacheva Elizaveta, Mashikhin Anton, et al. Resolution-robust Large Mask Inpainting with Fourier Convolutions[C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE, 2022: 3172-3182.
17	Collins Edo, Bala R, Price B, et al. Editing in Style: Uncovering the Local Semantics of GANs[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 5770-5779.
18	Shen Yujun, Zhou Bolei. Closed-form Factorization of Latent Semantics in GANs[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 1532-1540.
19	Kim Gwanghyun, Kwon Taesung, Chul Ye Jong. DiffusionCLIP: Text-guided Diffusion Models for Robust Image Manipulation[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 2416-2425.
20	Zhang Lümin, Rao Anyi, Agrawala M. Adding Conditional Control to Text-to-image Diffusion Models[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2023: 3813-3824.
21	Ronneberger Olaf, Fischer Philipp, Brox Thomas. U-net: Convolutional Networks for Biomedical Image Segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Springer International Publishing, 2015: 234-241.
22	Huang Ziqi, C K Chan Kelvin, Jiang Yuming, et al. Collaborative Diffusion for Multi-modal Face Generation and Editing[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 6080-6090.
23	Pinkney J N M, Adler D. Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains[EB/OL]. (2020-11-21) [2024-06-18]. .
24	Johnson J, Alahi A, Li Feifei. Perceptual Losses for Real-time Style Transfer and Super-resolution[C]//Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016: 694-711.
25	Deng Jiankang, Guo Jia, Xue Niannan, et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 4685-4694.
26	Song Shuang, Liang Yuanbang, Wu Jing, et al. Feature Proliferation-the "Cancer" in StyleGAN and Its Treatments[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2023: 2360-2370.
27	Kingma Diederik P, Welling Max. Auto-encoding Variational Bayes[EB/OL]. (2022-12-10) [2024-07-05]. .
28	Liu Shilong, Zeng Zhaoyang, Ren Tianhe, et al. Grounding DINO: Marrying DINO with Grounded Pre-training for Open-set Object Detection[EB/OL]. (2024-07-19) [2024-07-06]. .
29	Hu E J, Shen Yelong, Wallis P, et al. LoRA: Low-rank Adaptation of Large Language Models[EB/OL]. (2021-10-16) [2024-07-06]. .
30	Branwen Gwern, Arfafax, Presser Shawn, et al. Anime Crop Datasets: Faces, Figures, & Hands[EB/OL]. (2020-08-05) [2024-07-10]. .
31	Talebi H, Milanfar P. NIMA: Neural Image Assessment[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3998-4011.

方法	FID	SSIM	NIMA
本文	158.564	0.821	4.682
DualStyle-GAN	161.383	0.793	4.608
U-GAT-IT	206.971	0.699	4.215
GNR	188.846	0.568	4.318
Diffusion	167.615	0.761	4.752

方法	风格相似	结构一致	编辑合理
本文	8.4	9.1	8.8
DualStyle-GAN	8.3	8.8	6.7
StyleGAN2	6.3	7.7	6.8
Diffusion	5.6	5.1	7.3

[1]	郭力强, 马亮, 张会, 杨静, 李连峰, 翟雅琪. 基于模型融合和生成网络的有效阵位智能决策方法[J]. 系统仿真学报, 2024, 36(7): 1573-1585.
[2]	刘万军, 程裕茜, 曲海成. 基于生成对抗网络的图像自增强去雾算法[J]. 系统仿真学报, 2024, 36(5): 1093-1106.
[3]	张凤全, 曹铎, 马晓寒, 陈柏君, 张江霄. 一种面向戏曲妆容细节生成的风格迁移网络[J]. 系统仿真学报, 2023, 35(9): 2064-2076.
[4]	刘书刚, 张林坤, 杜昊东, 王洪涛. 雾天条件下改进YOLOv4的目标检测[J]. 系统仿真学报, 2023, 35(8): 1681-1691.
[5]	蔡兴泉, 李治均, 奚梦瑶, 孙海燕. 基于神经网络的手绘服饰图纹上色及风格迁移[J]. 系统仿真学报, 2023, 35(3): 604-615.
[6]	曹建芳, 贾一鸣, 闫敏敏, 田晓东. 稳定增强生成对抗网络在壁画的超分辨率重建[J]. 系统仿真学报, 2022, 34(5): 1076-1089.
[7]	吴曦, 孟祥林, 杨镜宇. 下一代战略博弈推演系统研究[J]. 系统仿真学报, 2021, 33(9): 2017-2024.
[8]	程文聪, 史小康, 王志刚. 基于生成对抗网络的仿真卫星云图生成方法[J]. 系统仿真学报, 2021, 33(6): 1297-1306.