Multi-head mutual-attention CycleGAN for unpaired image-to-image translation

被引：21

作者：

Ji, Wei ^{[1
]}

Guo, Jing ^{[1
]}

Li, Yun ^{[2
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Sch Telecommun & Informat Engn, Nanjing, Peoples R China

[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci & Technol, Nanjing, Peoples R China

来源：

IET IMAGE PROCESSING | 2020年 / 14卷 / 11期

关键词：

learning (artificial intelligence); realistic images; virtual reality; language translation; image processing; unpaired image-to-image translation; source image domain; multihead mutual-attention CycleGAN model; image size; multihead mutual-attention mechanism; photorealistic images; translation quality; long-range dependency modelling; MMA-CycleGAN architecture;

D O I：

10.1049/iet-ipr.2019.1153

中图分类号：

TP18 [人工智能理论];

学科分类号：

140502 [人工智能];

摘要：

The image-to-image translation, i.e. from source image domain to target image domain, has made significant progress in recent years. The most popular method for unpaired image-to-image translation is CycleGAN. However, it always cannot accurately and rapidly learn the key features in target domains. So, the CycleGAN model learns slowly and the translation quality needs to be improved. In this study, a multi-head mutual-attention CycleGAN (MMA-CycleGAN) model is proposed for unpaired image-to-image translation. In MMA-CycleGAN, the cycle-consistency loss and adversarial loss in CycleGAN are still used, but a mutual-attention (MA) mechanism is introduced, which allows attention-driven, long-range dependency modelling between the two image domains. Moreover, to efficiently deal with the large image size, the MA is further improved to the multi-head mutual-attention (MMA) mechanism. On the other hand, domain labels are adopted to simplify the MMA-CycleGAN architecture, so only one generator is required to perform bidirectional translation tasks. Experiments on multiple datasets demonstrate MMA-CycleGAN is able to learn rapidly and obtain photo-realistic images in a shorter time than CycleGAN.

引用

页码：2395 / 2402

页数：8

共 34 条

[11]

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network [J].

Ledig, Christian ;

Theis, Lucas ;

Huszar, Ferenc ;

Caballero, Jose ;

Cunningham, Andrew ;

Acosta, Alejandro ;

Aitken, Andrew ;

Tejani, Alykhan ;

Totz, Johannes ;

Wang, Zehan ;

Shi, Wenzhe .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :105-114

[12]

Li M, 2018, 2018 4TH INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT AND INFORMATION TECHNOLOGY (ICEMIT 2018), P1184

[13]

Liu MY, 2016, ADV NEURAL INFORM PR, P469, DOI DOI 10.48550/ARXIV.1606.07536

[14]

LIU MY, 2017, P ASM 36 INT C OC

[15]

Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965

[16]

Lu GS, 2019, AAAI CONF ARTIF INTE, P4432

[17]

Mejjati Y.A., 2018, ADV NEURAL INFORM PR, P3693

[18]

Parmar N., 2018, INT C MACH LEARN STO

[19]

Polyak A., 2016, INT C LEARN REPR

[20]

Reed S., 2016, Advances in Neural Information Processing Systems

← 1 2 3 4 →