๐ŸŒŒ Deep Learning/Overview

[Overview] YOLO ๊ณ„์—ด Object Detection ์ •๋ฆฌ - (1) YOLO

๋ณต๋งŒ 2021. 1. 19. 15:51

์ˆœ์„œ:

(1) YOLO (2016)

(2) YOLOv2

(3) YOLOv3

(4) YOLOv4

 

 

 

YOLO (2016)

 

Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

 

Paper: www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf

Official code: pjreddie.com/darknet/yolo/

 

 

Model Architecture

 

๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ ๋ชจ๋ธ ๊ตฌ์กฐ๋Š” ์œ„ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์Œ. 448*448*3 Input์„ ๋ฐ›์•„ 7*7*30 Output์„ ์ถœ๋ ฅํ•œ๋‹ค.

Activation function์œผ๋กœ๋Š” alpha=0.1์ธ LeackyReLU๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

 

 

 

Input Image๋ฅผ S*S grid๋กœ ๋‚˜๋ˆ„๊ณ , ๊ฐ๊ฐ์˜ grid cell์— ๋Œ€ํ•ด

๊ฐ grid cell์„ ์ค‘์‹ฌ์œผ๋กœ ํ•˜๋Š” Bounding box coordinate x, y, w, h์™€ ๊ฐ box์˜ confidence Pr(object)๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  --> B*5

๊ฐ grid cell์— ๋Œ€ํ•œ Class probability map Pr(Class_i)์„ ๊ณ„์‚ฐํ•œ๋‹ค. -> C

 

B: ์˜ˆ์ธกํ•  Bounding box์˜ ๊ฐœ์ˆ˜ (๋…ผ๋ฌธ์—์„œ๋Š” 2๊ฐœ)
C: Class ๊ฐœ์ˆ˜ (๋…ผ๋ฌธ์—์„œ๋Š” 20๊ฐœ)

 

* bbox confidence๋Š” class์— ๋Œ€ํ•œ confidence๊ฐ€ ์•„๋‹ˆ๋ผ, object๊ฐ€ ์žˆ๋Š”์ง€ ์—†๋Š”์ง€์— ๋Œ€ํ•œ confidence์ž„.

 

 


 

๊ฐ ๊ตฌ์กฐ๋ฅผ ๋” ์ž์„ธํžˆ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 

(์ถœ์ฒ˜: docs.google.com/presentation/d/1aeRvtKG21KHdD5lg6Hgyhx5rPq_ZOsGjG5rJ1HP7BbA/pub?start=false&loop=false&delayms=3000&slide=id.g137784ab86_4_1318)

 

 

1) ๊ฐ grid cell์˜ channel๋ณ„ ๊ฐ’์„ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

- ์ฒซ 5๊ฐœ์˜ channel์€ ์ฒซ ๋ฒˆ์งธ bbox์˜ ์ขŒํ‘œ์™€ confidence, ๋‹ค์Œ 5๊ฐœ์˜ channel์€ ๋‘ ๋ฒˆ์งธ bbox์˜ ์ขŒํ‘œ์™€ confidence๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ ๋งˆ์ง€๋ง‰ 20๊ฐœ์˜ channel์€ ํ•ด๋‹น grid cell์˜ ๊ฐ class์— ๋Œ€ํ•œ probability๋ฅผ ๋‚˜ํƒ€๋ƒ„.

 

 

 

 

2) ๊ฐ bbox์˜ Class probability map์„ ๊ตฌํ•œ๋‹ค.

- ๊ฐ bbox์˜ object probability Pr(object)์™€ ํ•ด๋‹น grid cell์˜ class probability map Pr(class_i)๋ฅผ ๊ณฑํ•˜๋ฉด bbox์˜ class probability map์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. 

Grid cell ๋‹น bbox๊ฐ€ ๋‘ ๊ฐœ์”ฉ ์žˆ์œผ๋ฏ€๋กœ ์ด 7*7*2 ๊ฐœ์˜ vector์„ ์–ป์Œ

 

 

 

 

3) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด bbox๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค.

 

*NMS: Non-Maximum Suppression

 

 


 

 

ํ•™์Šต ๋ฐฉ๋ฒ•์€, ์•ž์˜ 20๊ฐœ์˜ convolution layer์€ ImageNet์„ ์ด์šฉํ•ด pretrain ์‹œํ‚ค๊ณ , ์ง์ ‘ ์ œ์ž‘ํ•œ Loss function์„ ์ด์šฉํ•ด ๋‚˜๋จธ์ง€๋ฅผ ํ•™์Šต์‹œ์ผฐ๋‹ค.

 

Model Architecture

 

 

ํ•™์Šต์— ์‚ฌ์šฉํ•œ Loss function์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

x, y (bbox ์ค‘์•™ ์ขŒํ‘œ), w, h (bbox์˜ ๋„ˆ๋น„์™€ ๋†’์ด), C (bbox์— object๊ฐ€ ์กด์žฌํ•  ํ™•๋ฅ ), p_i(c) (๊ฐ class์— ์†ํ•  ํ™•๋ฅ ) ์˜ error์„ ๊ฐ๊ฐ ๊ณ„์‚ฐํ•ด ๋”ํ•œ๋‹ค.

 

 

 

๊ฐ„๋‹จํ•œ MSE์™€ ๊ฑฐ์˜ ์œ ์‚ฌํ•˜์ง€๋งŒ, ๋‹ค๋ฅธ ์ ์€ ๊ฐ ํ•ญ๋ชฉ๋งˆ๋‹ค ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ์—ˆ๋‹ค๋Š” ์ ์ด๋‹ค.

  • Localization error๊ณผ classification error์— ์„œ๋กœ ๋‹ค๋ฅธ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ์—ˆ์Œ.
  • Object๊ฐ€ ์—†๋Š” cell์˜ ๊ฐœ์ˆ˜๊ฐ€ ํ›จ์”ฌ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์—, ์ด์ชฝ์œผ๋กœ gradient๊ฐ€ ์น˜์šฐ์น˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด object๊ฐ€ ์žˆ๋Š” cell๊ณผ ์—†๋Š” cell์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋‹ค๋ฅด๊ฒŒ ์ฃผ์—ˆ์Œ.
  • Box์˜ ํฌ๊ธฐ๊ฐ€ ํด ์ˆ˜๋ก error๊ฐ€ ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์ด ์ž‘์•„์•ผ ํ•˜๋ฏ€๋กœ, width์™€ height์˜ error์„ ์ธก์ •ํ•  ๋•Œ์—๋Š” root๋ฅผ ์”Œ์›Œ ์คŒ.
๋ฐ˜์‘ํ˜•