系统仿真学报 ›› 2024, Vol. 36 ›› Issue (12): 2834-2849.doi: 10.16182/j.issn1004731x.joss.24-FZ0797

• 论文 • 上一篇    

基于潜在空间的动漫人脸风格迁移与编辑方法

邓海欣1, 张凤全1, 王楠1, 张万才2, 雷劼睿3   

  1. 1.北京邮电大学,北京 100876
    2.国电南瑞科技股份有限公司,江苏 南京 211106
    3.北方工业大学,北京 100144
  • 收稿日期:2024-07-20 修回日期:2024-09-29 出版日期:2024-12-20 发布日期:2024-12-20
  • 通讯作者: 张凤全
  • 第一作者简介:邓海欣(2000-),女,硕士生,研究方向为非遗文化智能计算、生成式人工智能。
  • 基金资助:
    北京市人文社科基金(24YTB014);教育部人文社科基金(19YJC760150);北京邮电大学校级项目(2023YB22)

Research on Latent Space-based Anime Face Style Transfer and Editing Techniques

Deng Haixin1, Zhang Fengquan1, Wang Nan1, Zhang Wancai2, Lei Jierui3   

  1. 1.Beijing University of Posts and Telecommunications, Beijing 100876, China
    2.NARI Technology Co. , Ltd, Nanjing 211106, China
    3.North China University of Technology, Beijing 100144, China
  • Received:2024-07-20 Revised:2024-09-29 Online:2024-12-20 Published:2024-12-20
  • Contact: Zhang Fengquan

摘要:

为解决现有图像仿真中动漫风格迁移网络存在图像失真和风格单一等问题,提出了适用于动漫人脸风格迁移和编辑的TGFE-TrebleStyleGAN(text-guided facial editing with TrebleStyleGAN)网络框架。利用潜在空间的向量引导生成人脸图像,并在TrebleStyleGAN中设计了细节控制模块和特征控制模块来约束生成图像的外观迁移网络生成的图像不仅用作风格控制信号,还用作约束细粒度分割后的编辑区域引入文本生成图像技术,捕捉风格迁移图像和语义信息的关联性。通过在开源数据集和自建配对标签的动漫人脸数据集上的实验表明:相较于基线模型DualStyleGAN,该模型的FID降低了2.819,SSIM与NIMA分别提升了0.028和0.074。集成风格迁移与编辑的方法能够确保在生成过程中既保留原有动漫人脸细节风格,又具备灵活的编辑能力,减少了图像的失真问题,在生成图像特征的一致性和动漫人脸图像风格相似性中表现更优。

关键词: 动漫风格迁移, 生成对抗网络, 潜在空间, 动漫人脸编辑, 文本引导图像生成

Abstract:

To address issues such as image distortion and style uniformity in existing anime style transfer networks within the field of image simulation, we propose the TGFE-TrebleStyleGAN (text-guided facial editing with TrebleStyleGAN) for anime facial style transfer and editing. This framework leverages vector guidance within the latent space to generate facial imagery and incorporates a detail control module and a feature control module to constrain the aesthetic attributes of the generated images. The images generated by the transfer network serve as style control signals and constraints for fine-grained segmentation. Text-to-image generation technology captures correlations between style-transferred images and semantic information. Experimental results on both open-source datasets and self-constructed datasets with paired attribute tags for anime faces demonstrate that the proposed model reduces the FID score by 2.819 compared to DualStyleGAN, improve the SSIM and NIMA scores by 0.028 and 0.074 respectively. Combining style transfer and editing retains anime facial details while allowing flexible adjustments, minimizing distortion and enhancing feature consistency and style similarity.

Key words: anime style transfer, GAN, latent space, anime facial editing, text-guided image generation

中图分类号: