Multi-head mutual-attention CycleGAN for unpaired image-to-image translation

被引:21
作者
Ji, Wei [1 ]
Guo, Jing [1 ]
Li, Yun [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Telecommun & Informat Engn, Nanjing, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci & Technol, Nanjing, Peoples R China
关键词
learning (artificial intelligence); realistic images; virtual reality; language translation; image processing; unpaired image-to-image translation; source image domain; multihead mutual-attention CycleGAN model; image size; multihead mutual-attention mechanism; photorealistic images; translation quality; long-range dependency modelling; MMA-CycleGAN architecture;
D O I
10.1049/iet-ipr.2019.1153
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
The image-to-image translation, i.e. from source image domain to target image domain, has made significant progress in recent years. The most popular method for unpaired image-to-image translation is CycleGAN. However, it always cannot accurately and rapidly learn the key features in target domains. So, the CycleGAN model learns slowly and the translation quality needs to be improved. In this study, a multi-head mutual-attention CycleGAN (MMA-CycleGAN) model is proposed for unpaired image-to-image translation. In MMA-CycleGAN, the cycle-consistency loss and adversarial loss in CycleGAN are still used, but a mutual-attention (MA) mechanism is introduced, which allows attention-driven, long-range dependency modelling between the two image domains. Moreover, to efficiently deal with the large image size, the MA is further improved to the multi-head mutual-attention (MMA) mechanism. On the other hand, domain labels are adopted to simplify the MMA-CycleGAN architecture, so only one generator is required to perform bidirectional translation tasks. Experiments on multiple datasets demonstrate MMA-CycleGAN is able to learn rapidly and obtain photo-realistic images in a shorter time than CycleGAN.
引用
收藏
页码:2395 / 2402
页数:8
相关论文
共 34 条
[1]
Chang Z, 2018, IEEE WCNC
[2]
Attention-GAN for Object Transfiguration in Wild Images [J].
Chen, Xinyuan ;
Xu, Chang ;
Yang, Xiaokang ;
Tao, Dacheng .
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 :167-184
[3]
CartoonGAN: Generative Adversarial Networks for Photo Cartoonization [J].
Chen, Yang ;
Lai, Yu-Kun ;
Liu, Yong-Jin .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9465-9474
[4]
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation [J].
Choi, Yunjey ;
Choi, Minje ;
Kim, Munyoung ;
Ha, Jung-Woo ;
Kim, Sunghun ;
Choo, Jaegul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8789-8797
[5]
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[6]
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[7]
Han Z., 2019, INT C MACH LEARN, P7354, DOI DOI 10.48550/ARXIV.1805.08318
[8]
He D., 2016, P 30 INT C NEURAL IN, P820, DOI DOI 10.5555/3157096.3157188
[9]
ISOLA P, 2017, PROC CVPR IEEE, P5967, DOI [10.1109/CVPR.2017.632, DOI 10.1109/CVPR.2017.632]
[10]
Kim T., 2017, LEARNING DISCOVER CR, P1857