๐ŸŒŒ Deep Learning/DL & ML ์กฐ๊ฐ ์ง€์‹

[ML] Kernel Density Estimation (KDE)์™€ Kernel Regression (KR)

๋ณต๋งŒ 2021. 10. 18. 15:51

I. Kernel Density Estimation (KDE)

 

KDE๋Š” kernel ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ํ•˜๋‚˜์˜ ์˜ˆ์‹œ๋กœ, ๊ธธ๊ฑฐ๋ฆฌ์—์˜ ๋ฒ”์ฃ„ ๋ฐœ์ƒ๋Ÿ‰์„ ๋‚˜ํƒ€๋‚ธ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•˜์ž.

 

Crime Location
1 15
2 12
3 10
... ...

 

์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด, ๊ธธ๊ฑฐ๋ฆฌ ๊ฐ ์ง€์ ์—์„œ์˜ ๋ฒ”์ฃ„ ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์„ ์ถ”์ •ํ•˜๊ณ  ์‹ถ๋‹ค๊ณ  ํ•˜์ž. ํŠน์ • ์ง€์ ์—์„œ ๋ฒ”์ฃ„๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค๋ฉด, ๊ทธ ๊ทผ์ฒ˜์—๋„ ๋ฒ”์ฃ„๊ฐ€ ๋ฐœ์ƒํ–ˆ์„ ํ™•๋ฅ ์ด ๋†’๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ๊ณผ๊ฑฐ์— ๋ฒ”์ฃ„๊ฐ€ ๋ฐœ์ƒํ–ˆ๋˜ ์œ„์น˜๋งˆ๋‹ค kernel์„ ์Œ“๊ณ , ์ด๋ฅผ ๋ชจ๋‘ ๋”ํ•˜์—ฌ ๋ฒ”์ฃ„์œจ์— ๋Œ€ํ•œ ๋ฐ€๋„ํ•จ์ˆ˜(KDE)๋ฅผ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

$\hat{f}_h(x)=\frac{q}{nh}\sum_{i-1}^nK(\frac{x-x_i}{h})$

 

 

Kernel ํ•จ์ˆ˜ $K$๋ฅผ ์–ด๋–ป๊ฒŒ ์„ค์ •ํ•˜๋Š๋ƒ์— ๋”ฐ๋ผ KDE์˜ ๋ชจ์–‘๋„ ๋ฐ”๋€๋‹ค. ์ข์€ kernel ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜๋ฉด KDE๋Š” ๋พฐ์กฑํ•ด์งˆ ๊ฒƒ์ด๊ณ , ๋„“์€ kernel ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด KDE๋Š” ๋งค๋„๋Ÿฌ์šด ํ˜•ํƒœ๊ฐ€ ๋  ๊ฒƒ์ด๋‹ค. KDE๊ฐ€ overfitting/underfitting ๋˜์ง€ ์•Š๋„๋ก kernel ํ•จ์ˆ˜์˜ bandwidth๋ฅผ ์ž˜ ์กฐ์ ˆํ•ด์•ผ ํ•œ๋‹ค.

 

์•„๋ž˜๋Š” kernelํ•จ์ˆ˜์˜ bandwidth์™€ kernelํ•จ์ˆ˜์˜ ์ข…๋ฅ˜์— ๋”ฐ๋ฅธ KDE์˜ ํ˜•ํƒœ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๊ทธ๋ฆผ์ด๋‹ค.

 

Gaussian kernel์„ ์ด์šฉํ–ˆ์„ ๋•Œ bandwidth์— ๋”ฐ๋ฅธ KDE์˜ ์ฐจ์ด

 

Rectangular kernel์„ ์ด์šฉํ–ˆ์„ ๋•Œ bandwidth์— ๋”ฐ๋ฅธ KDE์˜ ์ฐจ์ด

 


 

II. Kernel Regression (KR)

 

Kernel Regression์€ non-parametricํ•œ regression๋ฐฉ๋ฒ•์œผ๋กœ, kernel ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด ์œ ์‚ฌํ•œ ์ง€์ ๋“ค์˜ weighted average๋กœ ์ถ”์ •๊ฐ’์„ ์˜ˆ์ธกํ•œ๋‹ค.

 

ํŠน์ •ํ•œ ์ง€์ ์—์„œ์˜ ์˜ˆ์ธก๊ฐ’์˜ ๊ธฐ๋Œ“๊ฐ’ $g(x)$๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

$g(x) = E(Y|X=x)=\intyf(y|x)dy$

 

Bayes' rule์— ๋”ฐ๋ฅด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

$\int yf(y|x)dy = \int\frac{yf(x,y)dy}{f(x)}$

 

์œ„ ์‹์—์„œ $f(x,y)$์™€ $f(x)$๋ฅผ kernel ํ•จ์ˆ˜๋กœ ๋Œ€์ฒดํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

$\hat{g}(x) = \int\frac{\sum_{i=1}^ny_iK(\frac{x-x_i)}{h}}{\sum_{i=1}^nK(\frac{x-x_i)}{h}}$

 

์ด๋Š” ๊ฐ data point๋กœ๋ถ€ํ„ฐ ์˜ˆ์ธกํ•  ์ง€์ ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ์— ๋น„๋ก€ํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์ค€ weighted average ๊ฐ’์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

 

 

์ถœ์ฒ˜: https://rpubs.com/sandipan/238698

๋ฐ˜์‘ํ˜•

'๐ŸŒŒ Deep Learning > DL & ML ์กฐ๊ฐ ์ง€์‹' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Transformer์˜ positional encoding (PE)  (0) 2021.10.18
Convolution layer์˜ parameter ๊ฐœ์ˆ˜  (0) 2021.10.04
MSE Loss (L2 Loss) vs. MAE Loss (L1 Loss)  (0) 2021.01.19