๐ŸŒŒ Deep Learning/๋…ผ๋ฌธ ๋ฆฌ๋ทฐ [KOR]

[๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ๋ฆฌ๋ทฐ] Rethinking the Truly Unsupervised Image-to-Image Translation (TUNIT) (ICCV 2021)

๋ณต๋งŒ 2021. 9. 1. 17:50

2021 ICCV์— Accept๋œ ๋…ผ๋ฌธ์ธ "Rethinking the Truly Unsupervised Image-to-Image Translation"์„ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. Naver CLOVA AI์—์„œ ์ž‘์„ฑ๋œ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

์ด์ „๊นŒ์ง€์˜ Unsupervised model (cycleGAN ๋“ฑ)์€ ์‚ฌ์‹ค Semi-supervised ๋ชจ๋ธ์ด๋ผ๊ณ  ํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์–˜๊ธฐํ•˜๋ฉฐ, Data collection(labeling)์ด ํ•„์š”ํ•˜์ง€ ์•Š์€ Truly unsupervised model์ธ TUNIT์„ ์ œ์•ˆํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/pdf/2006.06500

Official code: https://github.com/clovaai/tunit

 


 

1. Levels of Supervision in Generative Models

 

์ด์ „๊นŒ์ง€๋Š” Generative model์„ ๋ฐ์ดํ„ฐ์˜ ์ข…๋ฅ˜์— ๋”ฐ๋ผ Supervisied์™€ Unsupervised ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆด์Šต๋‹ˆ๋‹ค.

  • Image์™€ Label์ด ์Œ์œผ๋กœ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ๋Š” Supervised๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. Conditional GAN ๋“ฑ์ด ์—ฌ๊ธฐ์— ์†ํ•˜๊ณ , ํ•™์Šต์ด ๋น„๊ต์  ์‰ฝ์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋Š” ์–ป๊ธฐ ํž˜๋“ค๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • Task์— ๋”ฐ๋ผ, Image์™€ Label ๋ฐ์ดํ„ฐ๋ฅผ ์Œ์œผ๋กœ ์–ป๊ธฐ ํž˜๋“ค๊ฑฐ๋‚˜ ์•„์˜ˆ ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์—๋Š” ๊ฐ Domain ๋ณ„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค. ๊ผญ Image์™€ Label ๋ฐ์ดํ„ฐ๊ฐ€ ์Œ์œผ๋กœ ์กด์žฌํ•˜์ง€ ์•Š์•„๋„ ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ Unsupervised๋ผ๊ณ  ํ•˜๊ณ , cycleGAN ๋“ฑ์ด ์ด์— ์†ํ•ฉ๋‹ˆ๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Generative model์˜ Supervision์˜ ์ •๋„๋ฅผ ์„ธ ๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „๊นŒ์ง€ Unsupervised๋ผ๊ณ  ํ–ˆ๋˜ ๊ฒƒ ์—ญ์‹œ Domain ๋ณ„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๊ณ , ๋ผ๋ฒจ๋ง์„ ํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์—์„œ Semi-supervised๋ผ๊ณ  ๋ถˆ๋Ÿฌ์•ผ ํ•˜๊ณ , ์ง„์ •ํ•œ Unsupervised ๋ชจ๋ธ์€ ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ์—†์ด๋„ Image-to-image translation์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค.

 


 

Truly Unsupervised Learning

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š”, True unsupervised learning์ด๋ž€ ๊ฐ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ผ๋ฒจ(Class)์ด ์—†๋Š”, ์—ฌ๋Ÿฌ Domain์˜ ์ด๋ฏธ์ง€๋“ค๋กœ ๊ตฌ์„ฑ๋œ ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋ผ๊ณ  ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

 

 

True unsupervised learning์˜ ์žฅ์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ๋ณ„๋„์˜ Data annotation์„ ํ•˜์ง€ ์•Š์•„๋„ ๋ฉ๋‹ˆ๋‹ค.
  • ๋”ฐ๋ผ์„œ, ์ด๋Ÿฌํ•œ ๋ผ๋ฒจ๋ง ์ž‘์—…์—์„œ ์˜ค๋Š” Noise๋ฅผ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Semi-supervised model์— ๋Œ€ํ•œ ๊ฐ•๋ ฅํ•œ Baseline์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 


 

TUNIT Architecture

TUNIT์€ ๋‘ ๊ฐ€์ง€์˜ ๊ตฌ์กฐ, ์„ธ ๊ฐ€์ง€์˜ Network๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

Guiding network๋Š” Input image์˜ Domain(label)์„ ๋ถ„๋ฅ˜(Clustering)ํ•˜๊ณ  Style code๋ฅผ ์ถ”์ถœํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

GAN์€ Input image๋ฅผ Target domain์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์—ญํ• , ์ฆ‰ Mapping function์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

 

 

 

Guiding network๋Š” ๋‘ ๊ฐœ์˜ Branch๋ฅผ ๊ฐ€์ง„ ํ•˜๋‚˜์˜ Encoder๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•˜๋‚˜์˜ Branch($E_C$)์—์„œ๋Š” Clustering ๊ฒฐ๊ณผ(Pseudo label)๋ฅผ ์ถœ๋ ฅํ•˜๊ณ ,

๋‹ค๋ฅธ ํ•˜๋‚˜์˜ Branch($E_S$)๋Š” Style code๋ฅผ ๋‹ด์€ vector๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

 

Style code๋Š” GAN์˜ Generator๊ฐ€ ์ด๋ฏธ์ง€๋ฅผ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐ์— ์‚ฌ์šฉ๋˜๊ณ ,

Pseudo label์€ Discriminator๊ฐ€ ๋ณ€ํ™˜๋œ ์ด๋ฏธ์ง€์˜ Real/Fake๋ฅผ ํŒ๋‹จํ•˜๋Š” ๋ฐ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

 


 

Training Guiding Network

Guiding network์˜ ํ•™์Šต ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋จผ์ € ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

์•ž์„œ ๋งํ•œ๋Œ€๋กœ, Guiding network๋Š” ๋™์ผํ•œ Encoder๋ฅผ ๊ณต์œ ํ•˜๋Š” ๋‘ ๊ฐœ์˜ Branch๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

Pseudo label์„ ์ƒ์„ฑํ•˜๋Š” $E_C$๋Š” Mutual information (MI)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” Clustering ๊ธฐ๋ฒ•์„ ์ด์šฉํ•˜๊ณ ,

Style code๋ฅผ ์ƒ์„ฑํ•˜๋Š” $E_S$๋Š” Contrastive loss๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

 

Training $E_C$ : use differentiable clustering method based on mutual information (MI) maximization

 

 

 

ํ•™์Šต ๋ฐฉ๋ฒ•

- Input image $x$์™€, ์ด๋ฅผ Randomํ•˜๊ฒŒ Augmentationํ•œ Image $x^+$๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

- ์ด๋“ค์„ $E_C$์— input์œผ๋กœ ์ค€ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ๊ฐ $p$, $p+$๋ผ๊ณ  ํ•˜๋ฉฐ, ์ด๋Š” ๊ฐ K๊ฐœ์˜ domain ๊ฐ๊ฐ์— ์†ํ•  ํ™•๋ฅ ์„ ๋‹ด์€ vector, ์ฆ‰ Pseudo label์ž…๋‹ˆ๋‹ค. ($p=E_C(x)$) 

- $p$, $p+$์˜ Mutual information์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ Encoder๋ฅผ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.

- Loss function์˜ ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

(Maximize) $L_{MI} = I(p,p^+) = I(P) = \sum^K_{i=1}\sum^K_{j=1}P_{ij}ln\frac{P_{ij}}{P_iP_j}$

 

 

์ฆ‰, $x$์™€ $x+$์˜ Pseudo label์˜ Mutual information์ด ์ตœ๋Œ€๊ฐ€ ๋˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Mutual information์€ ๋‘ variable์˜ ์ƒํ˜ธ์˜์กด์„ฑ์„ ์ธก์ •ํ•œ ๊ฒƒ์œผ๋กœ, ์œ„ํ‚ค๋ฐฑ๊ณผ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

 

Mutual information is therefore the reduction in uncertainty about variable X , or the expected reduction in the number of yes/no questions needed to guess X after observing Y .

 

 

Mutual information์— ๋Œ€ํ•œ ๋ณด๋‹ค ์ž์„ธํ•œ ์„ค๋ช…์€ ์•„๋ž˜ ๋งํฌ๋ฅผ ์ฐธ๊ณ  ๋ฐ”๋ž๋‹ˆ๋‹ค.

http://www.scholarpedia.org/article/Mutual_information

 

๊ฐ„๋‹จํžˆ ๋งํ•˜์ž๋ฉด, Mutual information์„ ์ตœ๋Œ€๊ฐ€ ๋˜๊ฒŒ ํ•จ์œผ๋กœ์จ, Encoder์€ ๋‘ Image $x$์™€ $x+$๊ฐ€ ๊ฐ™์€ label์— ์†ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Clustering์˜ ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ์จ Mutual information maximization์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋‚˜, ๋‹ค๋ฅธ Clustering ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ด๋„ ๋œ๋‹ค๊ณ  ์–˜๊ธฐํ•ฉ๋‹ˆ๋‹ค.

 


 

Training $E_S$ : use contrastive loss

 

 

ํ•™์Šต ๋ฐฉ๋ฒ•

- MoCo๋ผ๋Š” ๋ชจ๋ธ์—์„œ ๊ฐ€์ ธ์˜จ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

- $E_C$๋ฅผ ํ•™์Šต์‹œํ‚ฌ ๋•Œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, randomly augmented sample์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

- Input image $x$๋ฅผ Randomly augmentํ•œ Image $x+$์„ Positive sample๋กœ ์ด์šฉํ•˜๊ณ , ๋‹ค๋ฅธ Image๋“ค์„ Negative sample๋กœ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค ($x_n^-$).

- 1๊ฐœ์˜ Positive sample $x+$๊ณผ N๊ฐœ์˜ Negative sample $x_n^-$, ์ด N+1๊ฐœ์˜ sample์„ $E_S$ ์— ๋„ฃ์€ Output์„ ์ด์šฉํ•ด N+1 way classification์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

- ์ฆ‰, $s=E_S(x)$๋ผ๊ณ  ํ•  ๋•Œ, Positive pair์˜ style vector ($s$, $s^+$) ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๊ณ , Negative pair์˜ style vector ($s$, $s^-$) ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

- Loss function์˜ ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

(Minimize) $L^E_{style} = -log\frac{exp(s\cdot s^+ / \tau)}{\sum^N_{i=0}exp(s \cdot s^-_i / \tau)}$

 


 

Training Guiding Network

 

์ •๋ฆฌํ•˜์ž๋ฉด,

- Guiding Network๋Š” ๋™์ผํ•œ Encoder๋ฅผ ๊ณต์œ ํ•˜๋ฉฐ, Pseudo label์„ ์ƒ์„ฑํ•˜๋Š” $E_C$์™€, Style code๋ฅผ ์ƒ์„ฑํ•˜๋Š” $E_S$ ๋‘ ๊ฐ€์ง€ Branch๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์œผ๋ฉฐ,

- ๊ฐ๊ฐ์˜ Loss function $L_{MI}$์™€ $L_{style}$์„ ์ด์šฉํ•ด ๊ฐ Branch๋ฅผ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฌ๋‚˜ ๊ฐ Branch๋ฅผ ๋”ฐ๋กœ๋”ฐ๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๊ณ , ๋‘ Loss function์„ ํ•ฉ์ณ ํ•จ๊ป˜ (Jointly) ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ๋‘ Task๋ฅผ ํ•œ ๋ฒˆ์— ํ•™์Šต์‹œ์ผฐ์„ ๋•Œ์˜ ์žฅ์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

- Clustering์€ Style code ํ•™์Šต ๊ณผ์ •์—์„œ ํ•™์Šตํ•˜๋Š” Rich representation์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

- Style code๋Š” Clustering ํ•™์Šต ๊ณผ์ •์—์„œ ํ•™์Šตํ•˜๋Š” Domain-specific nature๊ณผ, ๊ฐ™์€ Domain์— ์†ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋“ค์˜ ์œ ์‚ฌ์„ฑ์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด๋Ÿฌํ•œ ์ด์œ ๋กœ Joint training์„ ์ด์šฉํ•˜์˜€๊ณ , Guiding network์˜ Loss function์€ ๋‹ค์Œ๊ณผ ๊ฐ™๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

(Minimize) $-L_{MI} + L_{style}$

 

์‹ค์ œ๋กœ $L_{MI}$๋ฅผ ์ด์šฉํ•ด $E_C$๋ฅผ ํ•™์Šต์‹œ์ผฐ์„ ๋•Œ๋ณด๋‹ค, $L_{style}$๊นŒ์ง€ ํ™œ์šฉํ•ด Joint training์„ ํ–ˆ์„ ๋•Œ ์ƒ์„ฑ๋œ Pseudo label์˜ ์ •ํ™•๋„๊ฐ€ ๋†’์•˜๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

IIC : $L_{MI}$ / Eq.(2) : $L_{style}$

 


 

Training Generative Network (GAN)

Generative Network (GAN)์€ 3๊ฐ€์ง€์˜ Loss function์„ ์ด์šฉํ•ด ํ•™์Šต์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

 

1) Realisticํ•œ Image๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ Adversarial Loss์™€,

2) Style code๋ฅผ ์ž˜ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•œ Style Contrastive Loss,

3) cycleGAN์˜ identity loss์™€ ์œ ์‚ฌํ•œ Image Reconstruction Loss

์„ธ ๊ฐ€์ง€๋ฅผ ์ด์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

 

Adversarial Loss

 

 

์ผ๋ฐ˜์ ์ธ GAN Loss์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. 

์ฐจ์ด์ ์€, Generator์— Reference image์˜ style code $\widetilde{s}$๊ฐ€ input์œผ๋กœ ๋“ค์–ด๊ฐ„๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Generator์€ Style code $\widetilde{s}$๋ฅผ ๋ฐ˜์˜ํ•˜๋ฉด์„œ Input image $x$๋ฅผ Target domain $\widetilde{y}$๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค.

 

์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

์ฐธ๊ณ ๋กœ, Style code๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ AdaIN์„ ์ด์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

Generator ๊ตฌ์กฐ์— 5๊ฐœ์˜ AdaIN layer๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋Š”๋ฐ, 

    1) Style vector์„ MLP์— ํ†ต๊ณผ์‹œ์ผœ ๊ฐ AdaIN์— ์‚ฌ์šฉํ•  Parameter๋“ค์„ ๋ฝ‘์•„๋‚ธ ๋‹ค์Œ,

    2) ์ด๋ฅผ ๊ฐ AdaIN layer์— ์ ์šฉ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ Input image์— Style์„ ์ž…ํ˜”์Šต๋‹ˆ๋‹ค.

Generator Architecture

 

 


 

Style Contrastive Loss

 

 

Style Code๋ฅผ ์ž˜ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•œ ์ถ”๊ฐ€์ ์ธ Loss์ž…๋‹ˆ๋‹ค.

Generator์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ Image์˜ Style code $s'=E_S(G(x, \widetilde{s}))$์™€, Reference image์˜ Style code $\widetilde{s}$๊ฐ„์˜ Contrastive Loss๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

์ด๋Š” ์ƒ์„ฑ๋œ Image๊ฐ€ Input์œผ๋กœ ์ฃผ์–ด์ง„ Style code๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ๋ณด์กดํ•˜๊ณ  ์žˆ๋Š”์ง€๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

 

์ˆ˜์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 


 

Image Reconstruction Loss

 

 

Reference image๋ฅผ Source image์™€ ๋™์ผํ•˜๊ฒŒ ์ฃผ์—ˆ์„ ๋•Œ, ์ƒ์„ฑ๋œ Image๊ฐ€ ์›๋ณธ๊ณผ ์–ผ๋งˆ๋‚˜ ๋™์ผํ•œ์ง€๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” Loss์ž…๋‹ˆ๋‹ค.

์ด๋Š” Source Image์˜ Style code๋ฅผ ์ด์šฉํ•œ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—, ๋„คํŠธ์›Œํฌ๊ฐ€ ์ž˜ ํ•™์Šต๋˜์–ด ์žˆ๋‹ค๋ฉด ์›๋ณธ๊ณผ ๋™์ผํ•œ Image๊ฐ€ ๋‚˜์˜ค๋Š” ๊ฒƒ์ด ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค.

CycleGAN์˜ Identity Loss์™€ ์œ ์‚ฌํ•˜๋‹ค๊ณ  ๋Š๊ผˆ์Šต๋‹ˆ๋‹ค.

 

์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 


 

Train All Network Jointly

์•ž์„œ, Guiding network๋ฅผ ํ•™์Šตํ•  ๋•Œ ๋‘ Loss function๋“ค์„ ํ•œ๋ฒˆ์— ํ•™์Šตํ•œ๋‹ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค.

GAN์˜ ํ•™์Šต๋„ ๋ชจ๋“  Loss function์„ ํ•œ ๋ฒˆ์— ํ•™์Šต์‹œํ‚ค๊ณ ,

์ด ๋ฟ ์•„๋‹ˆ๋ผ Guiding network์™€ GAN์˜ ํ•™์Šต ์—ญ์‹œ ๋™์‹œ์— ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.

๋‹ค์‹œ ๋งํ•ด, ๋ชจ๋“  ํ•™์Šต์ด end-to-end๋กœ ํ•œ ๋ฒˆ์— ์ง„ํ–‰๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

์ด๋Š” Guiding network์—์„œ Clustering๊ณผ Style code๊ฐ€ ์„œ๋กœ์˜ ํ•™์Šต์— ๋„์›€์„ ์ฃผ์—ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ,

GAN๊ณผ Guiding network ์—ญ์‹œ ์„œ๋กœ์˜ ํ•™์Šต ๊ณผ์ •์— ๋„์›€์„ ์ค€๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 


 

Experiments & Results

์‹คํ—˜์€ Supervised Translation์—์„œ SOTA ๋ชจ๋ธ์ธ FUNIT์„ Unsupervised setting์œผ๋กœ ๋ฐ”๊พผ ๊ฒƒ๊ณผ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์™ธ์—๋„ ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ์ง€๋งŒ, ์ผ๋ถ€ ๊ฒฐ๊ณผ๋งŒ ๊ฐ„๋žตํžˆ ์†Œ๊ฐœํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ž์„ธํ•œ ์‹คํ—˜๋‚ด์šฉ๊ณผ ๊ฒฐ๊ณผ๋Š” ๋…ผ๋ฌธ์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

Labeled Dataset์— ๋Œ€ํ•œ ์‹คํ—˜

 

 

Unlabeled Dataset์— ๋Œ€ํ•œ ์‹คํ—˜

 

 


 

Conclusion

๋…ผ๋ฌธ์—์„œ ์ฃผ์žฅํ•˜๋Š” Contribution์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€๊ฐ€ ์žˆ์ง€๋งŒ, ๊ทธ ์ค‘์—์„œ๋„ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ Unsupervised image-to-image translation์„ ์žฌ์ •์˜ํ•˜๊ณ , ์ด๋Ÿฌํ•œ Task๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” End-to-end model์„ ์ œ์‹œํ–ˆ๋‹ค๋Š” ์ ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. 

 

 

๋ณธ ๊ฒŒ์‹œ๋ฌผ์—์„œ๋Š” ์ตœ๋Œ€ํ•œ ๊ฐ„๋žตํ•˜๊ฒŒ ๋…ผ๋ฌธ์„ ์ •๋ฆฌํ–ˆ์ง€๋งŒ, ๋ณด๋‹ค ๋””ํ…Œ์ผํ•œ ๋ถ€๋ถ„์ด๋‚˜ ๊ตฌํ˜„ ๊ด€๋ จ ๋‚ด์šฉ๋“ค์€ ๋…ผ๋ฌธ๊ณผ ๊ณต์‹ ๊นƒํ—™ ์ฝ”๋“œ๋ฅผ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ฐ˜์‘ํ˜•