๐ŸŒŒ Deep Learning/๋…ผ๋ฌธ ๋ฆฌ๋ทฐ [KOR]

[๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ๋ฆฌ๋ทฐ] Mixed Precision Training (ICLR 2018)

๋ณต๋งŒ 2020. 12. 10. 16:10

NVIDIA์™€ Baidu์—์„œ ์—ฐ๊ตฌํ•˜๊ณ  ICLR 2018์— ๋ฐœํ‘œ๋œ ๋…ผ๋ฌธ์ธ

Mixed Precision Training์„ ๋ฐ”ํƒ•์œผ๋กœ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

 

๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต ๊ณผ์ •์—์„œ Mixed Precision์„ ์ด์šฉํ•˜์—ฌ GPU resource๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

 

(NVIDIA ๋ธ”๋กœ๊ทธ ์ •๋ฆฌ๊ธ€: developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/)

 

 

 

Floating Point Format

์‹ค์ˆ˜๋ฅผ ์ปดํ“จํ„ฐ๋กœ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐฉ๋ฒ•์— ๊ณ ์ •์†Œ์ˆ˜์ (Fixed Pint) ๋ฐฉ์‹๊ณผ ๋ถ€๋™์†Œ์ˆ˜์ (Floating Point) ๋ฐฉ์‹์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

(๋ถ€๋™์†Œ์ˆ˜์  ๋ฐฉ์‹์€ ๋– ๋Œ์ด ์†Œ์ˆ˜์  ๋ฐฉ์‹์ด๋ผ๊ณ ๋„ ํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ท€์—ฝ๋„ค์š”)

 

๊ณ ์ •์†Œ์ˆ˜์  ๋ฐฉ์‹์€ ์ •์ˆ˜๋ถ€์™€ ์†Œ์ˆ˜๋ถ€๋ฅผ ๋‹ด์„ ๋น„ํŠธ์˜ ์ˆ˜๋ฅผ ๊ณ ์ •ํ•ด์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹.

์ •ํ™•ํ•˜๊ณ  ์—ฐ์‚ฐ์ด ๋น ๋ฅด์ง€๋งŒ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•œ ๋ฒ”์œ„๊ฐ€ ๋ถ€๋™์†Œ์ˆ˜์  ๋ฐฉ์‹์— ๋น„ํ•ด ์ข์Šต๋‹ˆ๋‹ค.

 

๋ถ€๋™์†Œ์ˆ˜์  ๋ฐฉ์‹์€ ํ‘œํ˜„ํ•˜๊ณ ์ž ํ•˜๋Š” ์ˆ˜๋ฅผ ์ •๊ทœํ™”ํ•˜์—ฌ ๊ฐ€์ˆ˜๋ถ€(exponent)์™€ ์ง€์ˆ˜๋ถ€(fraction/mantissa)๋ฅผ ๋”ฐ๋กœ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์ž๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

5.6875๋ฅผ 2์ง„๋ฒ•์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด 101.1011
์ด๋ฅผ 1.011011 * 2^2๋กœ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์„ ์ •๊ทœํ™”๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. (์ •์ˆ˜๋ถ€์— ํ•œ ์ž๋ฆฌ๋งŒ ๋‚จ๊ธฐ๋Š” ๊ฒƒ)

์ด ๋•Œ 1.011011์„ ๊ฐ€์ˆ˜๋ถ€, 2์˜ ์ง€์ˆ˜์ธ 2๋ฅผ ์ •์ˆ˜๋ถ€๋ผ๊ณ  ํ•˜๊ณ ,
๋ถ€๋™์†Œ์ˆ˜์  ๋ฐฉ์‹์€ ์ด ๊ฐ€์ˆ˜๋ถ€์™€ ์ •์ˆ˜๋ถ€๋ฅผ ๊ฐ๊ฐ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

 

์ถœ์ฒ˜: https://hoya012.github.io/blog/Mixed-Precision-Training/

 

๋ถ€๋™์†Œ์ˆ˜์  ๋ฐฉ์‹์€ IEEE754 ํ‘œ์ค€์ด ๊ฐ€์žฅ ๋„๋ฆฌ ์“ฐ์ด๊ณ  ์žˆ๋Š”๋ฐ, ๊ทธ ์ข…๋ฅ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

  • FP32 (Single Precision, ๋‹จ์ •๋ฐ€๋„)
  • FP64 (Double Precision)
  • FP128 (Quadruple Precision)
  • FP16 (Half Precision)

 

FP ๋’ค์˜ ์ˆซ์ž๋Š” ๋ช‡ bit๋ฅผ ์ด์šฉํ•˜๋Š”์ง€๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. FP32๋Š” 32bit๋ฅผ ์ด์šฉํ•˜์—ฌ ์‹ค์ˆ˜๋ฅผ ์ €์žฅํ•˜๋Š” ๊ฒƒ.

๋‹น์—ฐํžˆ ์ด์šฉํ•˜๋Š” ๋น„ํŠธ ์ˆ˜๊ฐ€ ๋งŽ์„์ˆ˜๋ก ๋” ๋†’์€ ์ •๋ฐ€๋„(Precision)๋กœ ์‹ค์ˆ˜๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

(FP64๋Š” FP32์˜ ๋‘ ๋ฐฐ์˜ ์ •๋ฐ€๋„๋ฅผ ๊ฐ–๋Š”๋‹ค๋Š” ๋œป์—์„œ Double Precision์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๊ฒƒ์ด๊ฒŸ์ฃ ?)

 

ํ˜„๋Œ€ ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต ๊ณผ์ •์—์„œ๋Š” Single Precision(FP32) ํฌ๋งท์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

(weight ์ €์žฅ, gradient ๊ณ„์‚ฐ ๋“ฑ)

 

ํ•˜์ง€๋งŒ Single Precision(FP32) ๋ฐฉ์‹์ด ์•„๋‹Œ, Half Precision(FP16) ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค๋ฉด,

ํ•œ์ •๋œ GPU์˜ resource๋ฅผ ์•„๋‚„ ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ์š”?

 

 

 

Mixed Precision

Half Precision(FP16) ๋ฐฉ์‹์„ ์ด์šฉํ•ด ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋ฉด ๋‹น์—ฐํžˆ ์ €์žฅ๊ณต๊ฐ„๋„ ์•„๋ผ๊ณ , ์—ฐ์‚ฐ ์†๋„๋„ ๋นจ๋ผ์ง‘๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ Half Precision ๋ฐฉ์‹์€ Single Precision ๋ฐฉ์‹๋ณด๋‹ค ์ •๋ฐ€๋„๊ฐ€ ํ˜„์ €ํžˆ ๋–จ์–ด์ง€์ฃ .

 

 

์ถœ์ฒ˜: https://cloud.google.com/tpu/docs/bfloat16

 

 

๋”ฐ๋ผ์„œ gradient๊ฐ€ ๋„ˆ๋ฌด ํฐ ๊ฒฝ์šฐ, ํ˜น์€ ๋„ˆ๋ฌด ์ž‘์€ ๊ฒฝ์šฐ ์˜ค์ฐจ๊ฐ€ ๋ฐœ์ƒํ•˜๊ฒŒ ๋˜๊ณ , ์ด ์˜ค์ฐจ๋Š” ๋ˆ„์ ๋˜์–ด ๊ฒฐ๊ตญ ํ•™์Šต์ด ์ž˜ ์ง„ํ–‰๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

 

์ถœ์ฒ˜: https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/

 

์œ„ ๊ทธ๋ฆผ์—์„œ ๊ฒ€์ •์ƒ‰ ์„ ์€ FP32๋ฅผ ์ด์šฉํ•ด ํ•™์Šต์‹œํ‚จ ๊ฒฐ๊ณผ, ํšŒ์ƒ‰ ์„ ์€ FP16์„ ์ด์šฉํ•ด ํ•™์Šต์‹œํ‚จ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

Y ์ถ•์€ training loss์ธ๋ฐ, FP16์œผ๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒฝ์šฐ loss๊ฐ€ ์ค„์–ด๋“ค๋‹ค๊ฐ€ ์ˆ˜๋ ดํ•˜์ง€ ๋ชปํ•˜๊ณ  ๋‹ค์‹œ ์ปค์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์ฃ .

 

 

 

์œ„ ๊ทธ๋ฆผ์€ ์‹ค์ œ๋กœ FP32๋ฅผ ์ด์šฉํ•ด ๋„คํŠธ์›Œํฌ๋ฅผ ํ•™์Šต์‹œํ‚ค๊ณ , ์ž„์˜์˜ gradient ๊ฐ’๋“ค์„ sampling ํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

๋นจ๊ฐ„ ์„  ์™ผ์ชฝ์˜ gradient๋“ค์€ FP16์—์„œ ํ‘œํ˜„ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ •๋ฐ€๋„์˜ ๊ฐ’๋“ค์ด๊ธฐ ๋•Œ๋ฌธ์—, FP16์—์„œ๋Š” 0์œผ๋กœ ํ‘œํ˜„๋˜๊ณ , ์ด๋Ÿฌํ•œ ์˜ค์ฐจ๋“ค์ด ๋ˆ„์ ๋˜์–ด ๋ชจ๋ธ ํ•™์Šต์— ์–ด๋ ค์›€์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

 

 

 

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ Mixed Precision Training์€ FP32์™€ FP16์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฅผ ๊ทน๋ณตํ•ฉ๋‹ˆ๋‹ค.

= Mixed Precision

 

 

 

Implementation

๋ฐฉ๋ฒ•์€ ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ gradient ๊ฐ’๋“ค์ด ๋งค์šฐ ์ž‘์€ ๊ฐ’์— ๋ชฐ๋ ค ์žˆ์–ด์„œ FP16์œผ๋กœ casting ์‹œ 0์ด ๋˜์–ด ๋ฒ„๋ฆฝ๋‹ˆ๋‹ค.

 

์ถœ์ฒ˜: https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/

 

 

์ฆ‰, FP16์˜ ํ‘œํ˜„ ๊ฐ€๋Šฅ ๋ฒ”์œ„ ๋ฐ–์— gradient๊ฐ€ ๋ถ„ํฌํ•ด์„œ ์ƒ๊ธด ๋ฌธ์ œ์ธ๋ฐ,

๊ทธ๋ ‡๋‹ค๋ฉด ๋‹จ์ˆœํžˆ Scaling์„ ํ†ตํ•ด gradient๋ฅผ FP16์˜ ํ‘œํ˜„ ๊ฐ€๋Šฅ ๋ฒ”์œ„ ์•ˆ์œผ๋กœ ์ด๋™์‹œ์ผœ ์ฃผ๋ฉด ๋˜์ง€ ์•Š์„๊นŒ์š”?

 

์ข€ ๋” ์ž์„ธํžˆ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

https://www.slideshare.net/lablup/jmi-techtalk-how-to-use-gpu-for-developing-ai

 

Step 1. FP32 weight์— ๋Œ€ํ•œ FP16 copy weight์„ ๋งŒ๋“ ๋‹ค.

(์ด FP16 copy weight์€ forward pass, backward pass์— ์ด์šฉ๋œ๋‹ค.)

 

Step 2. FP16 copy weight์„ ์ด์šฉํ•ด forward pass๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.

 

Step 3. forward pass๋กœ ๊ณ„์‚ฐ๋œ FP16 prediction ๊ฐ’์„ FP32๋กœ castingํ•œ๋‹ค.

 

Step 4. FP32 prediction์„ ์ด์šฉํ•ด FP32 loss๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์—ฌ๊ธฐ์— scaling factor S๋ฅผ ๊ณฑํ•œ๋‹ค.

 

Step 5. scaled FP32 loss๋ฅผ FP16์œผ๋กœ castingํ•œ๋‹ค.

 

Step 6. scaled FP16 loss๋ฅผ ์ด์šฉํ•˜์—ฌ backward propagation์„ ์ง„ํ–‰ํ•˜๊ณ , gradient๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

 

Step 7. FP16 gradient๋ฅผ FP32๋กœ castingํ•œ๊ณ , ์ด๋ฅผ scaling factor S๋กœ ๋‹ค์‹œ ๋‚˜๋ˆˆ๋‹ค.

(chain rule์— ์˜ํ•ด ๋ชจ๋“  gradient๋Š” ๊ฐ™์€ ํฌ๊ธฐ๋กœ scaling๋œ ์ƒํƒœ์ž„)

 

Step 8. FP32 gradient๋ฅผ ์ด์šฉํ•ด FP32 weight๋ฅผ updateํ•œ๋‹ค.

 

 

์ •๋ฆฌํ•˜์ž๋ฉด, FP32 weight์€ ๊ณ„์† ์ €์žฅํ•ด ๋‘๊ณ ,

FP16 copy weight๋ฅผ ๋งŒ๋“ค์–ด ์ด๋ฅผ ์ด์šฉํ•ด forward/backward pass๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

FP16 copy weight์œผ๋กœ ์–ป์€ gradient๋ฅผ ์ด์šฉํ•ด FP32 weight๋ฅผ updateํ•ฉ๋‹ˆ๋‹ค.

 

 

์ถœ์ฒ˜: https://nvidia.github.io/OpenSeq2Seq/html/mixed-precision.html#mp-2018

 

* ์ด ๋•Œ Scaling Factor์€ ์–ด๋–ป๊ฒŒ ์ •ํ• ๊นŒ์š”?

๋…ผ๋ฌธ์—์„œ๋Š” ๋‹จ์ˆœํžˆ ๊ฒฝํ—˜์ ์ธ ๊ฐ’์„ ์„ ํƒํ•˜๊ฑฐ๋‚˜,

gradient์˜ ํ†ต๊ณ„ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ gradient์˜ maximum absolute value๊ฐ€ 65,504(FP16์ด ํ‘œํ˜„๊ฐ€๋Šฅํ•œ ์ตœ๋Œ€๊ฐ’)๊ฐ€ ๋˜๋„๋ก ๋งž์ถฐ ์ฃผ๋ฉด ๋œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Scaling factor์ด ํฌ๋‹ค๊ณ  ํ•ด์„œ ๋‚˜์œ ์ ์€ ์—†์ง€๋งŒ, overflow๊ฐ€ ์ผ์–ด๋‚˜์ง€ ์•Š๋„๋ก ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค!

 

 

 

Experiment & Result

Classification, detection ๋“ฑ ๊ฐ„๋‹จํ•œ task๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์„œ GAN ๊นŒ์ง€ ์•„์ฃผ ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. (Method๊ฐ€ ๋„ˆ๋ฌด ๋‹จ์ˆœํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ผ๊นŒ์š”?)

๋ช‡ ๊ฐ€์ง€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

 

* Baseline: FP32 / MP: Mixed Precision(FP32+FP16)

 

ILSVRC12 Classification

 

Detection

 

DCGAN

 

Mixed Precision์—์„œ ์„ฑ๋Šฅ์ด ์˜คํžˆ๋ ค ์˜ค๋ฅธ ๊ฒƒ๋„ ์žˆ๊ณ , ์ „๋ฐ˜์ ์œผ๋กœ FP32์— ๋’ค์ง€์ง€ ์•Š๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€ ๊ฒƒ ๊ฐ™์ฃ ?

๋…ผ๋ฌธ์—์„œ ์ž์„ธํ•œ ์‹คํ—˜ setting๊ณผ ๋” ๋งŽ์€ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

 

PyTorch Implementation

 

PyTorch์—์„œ ๊ณต์‹์ ์œผ๋กœ Mixed Precision Training์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

 

Automatic Mixed Precision(AMP) ๋ผ๋Š” ์ด๋ฆ„์œผ๋กœ, ๋ช‡ ์ค„์˜ ์ฝ”๋“œ๋งŒ ์ถ”๊ฐ€ํ•˜๋ฉด ์†์‰ฝ๊ฒŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๊ณต์‹ ๋ฌธ์„œ: pytorch.org/docs/stable/amp.html

 

Automatic Mixed Precision package - torch.cuda.amp — PyTorch 1.7.0 documentation

The following lists describe the behavior of eligible ops in autocast-enabled regions. These ops always go through autocasting whether they are invoked as part of a torch.nn.Module, as a function, or as a torch.Tensor method. If functions are exposed in mu

pytorch.org

 

 

Github์— ์ž˜ ์ •๋ฆฌ๋œ ์ฝ”๋“œ๊ฐ€ ์žˆ์–ด ๊ฐ€์ ธ์™”์Šต๋‹ˆ๋‹ค.

์ถœ์ฒ˜: github.com/hoya012/automatic-mixed-precision-tutorials-pytorch

 

์ผ๋ฐ˜์ ์ธ ํ•™์Šต ์ฝ”๋“œ
for batch_idx, (inputs, labels) in enumerate(data_loader):
  optimizer.zero_grad()

  outputs = model(inputs)
  loss = criterion(outputs, labels)

  loss.backward()
  optimizer.step()

 

AMP๋ฅผ ์ ์šฉํ•œ ์ฝ”๋“œ
""" define loss scaler for automatic mixed precision """
# Creates a GradScaler once at the beginning of training.
scaler = torch.cuda.amp.GradScaler()

for batch_idx, (inputs, labels) in enumerate(data_loader):
  optimizer.zero_grad()

  with torch.cuda.amp.autocast():
    # Casts operations to mixed precision 
    outputs = model(inputs)
    loss = criterion(outputs, labels)

  # Scales the loss, and calls backward() 
  # to create scaled gradients 
  scaler.scale(loss).backward()

  # Unscales gradients and calls 
  # or skips optimizer.step() 
  scaler.step(self.optimizer)

  # Updates the scale for next iteration 
  scaler.update()

 

ํ•™์Šต์„ ์‹œ์ž‘ํ•˜๊ธฐ ์ „ scaler์„ ์„ ์–ธํ•ด์ฃผ๊ณ ,amp.autocast()๋ฅผ ์ด์šฉํ•˜์—ฌ casting ๊ณผ์ •์„ ๊ฑฐ์น˜๋ฉฐ foward pass๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.backward pass, optimization, weight update ๋“ฑ์˜ ๊ณผ์ •์ด ๋ชจ๋‘ scaler์„ ํ†ตํ•ด ์ง„ํ–‰๋˜๋Š” ํ˜•ํƒœ์ธ ๊ฒƒ ๊ฐ™์ฃ ?

 

 


 

์œ„ Github์—์„œ ์‹ค์ œ๋กœ Torch์˜ AMP๋ฅผ ์ด์šฉํ•ด ์‹คํ—˜์„ ์ง„ํ–‰ํ•ด ๋ณด์…จ์–ด์š”.

GTX 1080 Ti์™€ RTX 2080 Ti๋ฅผ ์ด์šฉํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•ด ๋ณด์…จ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

 

Torch AMP๋ฅผ ์ด์šฉํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ

 

 

1080 Ti๋ฅผ ์ด์šฉํ–ˆ์„ ๋•Œ, 2080 Ti๋ฅผ ์ด์šฉํ–ˆ์„ ๋•Œ ๋ชจ๋‘ GPU ๋ฉ”๋ชจ๋ฆฌ๋Š” ๋‹น์—ฐ ์ ๊ฒŒ ์‚ฌ์šฉํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๊ณ ,

Training Time์€ 2080 Ti๋ฅผ ์ด์šฉํ–ˆ์„ ๋•Œ์—๋งŒ ์ค„์–ด๋“ค์—ˆ๋„ค์š”.

Test Accuracy๋Š” ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘ ์ค„์–ด๋“ค์ง€ ์•Š์•„ ์„ฑ๋Šฅ์ €ํ•˜๋Š” ์—†์—ˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

RTX 2080 Ti์€ Tensor Core์ด ํƒ‘์žฌ๋˜์–ด FP16์˜ ๊ณ„์‚ฐ์ด ํš๊ธฐ์ ์œผ๋กœ ๋น ๋ฅด๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๋•Œ๋ฌธ์— Tensor Core์ด ํƒ‘์žฌ๋œ GPU๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ Torch AMP๊ฐ€ ์‹œ๊ฐ„ ์ธก๋ฉด์—์„œ๋„ ๋น›์„ ๋ฐœํ•  ๊ฒƒ ๊ฐ™๋„ค์š”.

(Tensor Core์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ www.nvidia.com/ko-kr/data-center/tensor-cores/)

 

 

Tensor Core์€ TF32๋ผ๋Š” ์ž์ฒด ์ •๋ฐ€๋„๋ฅผ ์ด์šฉํ•ด์„œ FP32๋ณด๋‹ค ์ตœ๋Œ€ 20๋ฐฐ๊นŒ์ง€ ๊ฐ€์†์ด ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํ•˜๋Š”๋ฐ

์œ„ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋ฉด FP32๋ฅผ ์ด์šฉํ•œ Baseline์—์„œ๋Š” ์†๋„ ๋ฉด์—์„œ ํฐ ์ฐจ์ด๊ฐ€ ์—†๋„ค์š”.. ์™œ์ผ๊นŒ์š”?

 

 

 

 

Conclusion

GPU์˜ resource๋ฅผ ์•„๋‚„ ์ˆ˜ ์žˆ๊ณ , ํ•™์Šต ์‹œ๊ฐ„๊นŒ์ง€ ๋‹จ์ถ•์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” Mixed Precision Training์— ๋Œ€ํ•ด ๋ฆฌ๋ทฐํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค.

PyTorch์—์„œ ๋งค์šฐ ๊ฐ„๋‹จํ•˜๊ฒŒ ๊ตฌํ˜„๋„ ๊ฐ€๋Šฅํ•ด์„œ, ์ •๋ง ์•ˆ ์“ธ ์ด์œ ๊ฐ€ ์—†์„ ๊ฒƒ์ฒ˜๋Ÿผ ๋Š๊ปด์ง€๋„ค์š”.

 

FP32์˜ ์ •๋ฐ€๋„๋Š” ์œ ์ง€ํ•˜๋ฉด์„œ FP16์„ ์ด์šฉํ•ด ์ €์žฅ๊ณต๊ฐ„์„ ์•„๋ผ๋Š” ๋ฐฉ๋ฒ•์ด์—ˆ๋Š”๋ฐ,

๊ทธ๋ ‡๋‹ค๋ฉด FP32๋ฅผ ์ด์šฉํ•ด์„œ FP64์˜ ์ •๋ฐ€๋„๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ๊ฐ€๋Šฅํ•˜์ง€ ์•Š์„๊นŒ์š”?

์ •๋ฐ€๋„๊ฐ€ ์ค‘์š”ํ•œ task์—์„œ๋Š” ํ•œ๋ฒˆ์ฏค ์‹œ๋„ํ•ด ๋ณด๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

๋ฐ˜์‘ํ˜•