๋ฐ˜์‘ํ˜•

๐ŸŒŒ Deep Learning/DL & ML ์กฐ๊ฐ ์ง€์‹ 4

Transformer์˜ positional encoding (PE)

Transformer์„ ๊ตฌ์„ฑํ•˜๋Š” Multi-Head Self-Attention layer๋Š” permutation equivariantํ•œ ํŠน์„ฑ์„ ๊ฐ–๊ธฐ ๋•Œ๋ฌธ์—, postitional encoding์ด ํ•„์ˆ˜์ ์œผ๋กœ ํ•„์š”ํ•˜๋‹ค. Transformer์—์„œ ์‚ฌ์šฉํ•˜๋Š” positional encoding ์šฐ์„ , Transformer์—์„œ ์‚ฌ์šฉํ•˜๋Š” positional encoding์˜ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. $PE_{(pos,2i)}=sin(pos/10000^{2i/d_{model}})$ $PE_{(pos,2i+1)}=cos(pos/10000^{2i/d_{model}})$ ์ด๋ฅผ ํ’€์–ด ์“ฐ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๋ฅผ ๊ฐ–๊ฒŒ ๋˜๊ณ , ์ด๋ฅผ ์‹œ๊ฐํ™”ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๋ณธ ๊ธ€์—์„œ๋Š” ์™œ transformer์˜ positional encoding์ด ์ด..

[ML] Kernel Density Estimation (KDE)์™€ Kernel Regression (KR)

I. Kernel Density Estimation (KDE) KDE๋Š” kernel ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ํ•˜๋‚˜์˜ ์˜ˆ์‹œ๋กœ, ๊ธธ๊ฑฐ๋ฆฌ์—์˜ ๋ฒ”์ฃ„ ๋ฐœ์ƒ๋Ÿ‰์„ ๋‚˜ํƒ€๋‚ธ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•˜์ž. Crime Location 1 15 2 12 3 10 ... ... ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด, ๊ธธ๊ฑฐ๋ฆฌ ๊ฐ ์ง€์ ์—์„œ์˜ ๋ฒ”์ฃ„ ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์„ ์ถ”์ •ํ•˜๊ณ  ์‹ถ๋‹ค๊ณ  ํ•˜์ž. ํŠน์ • ์ง€์ ์—์„œ ๋ฒ”์ฃ„๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค๋ฉด, ๊ทธ ๊ทผ์ฒ˜์—๋„ ๋ฒ”์ฃ„๊ฐ€ ๋ฐœ์ƒํ–ˆ์„ ํ™•๋ฅ ์ด ๋†’๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ๊ณผ๊ฑฐ์— ๋ฒ”์ฃ„๊ฐ€ ๋ฐœ์ƒํ–ˆ๋˜ ์œ„์น˜๋งˆ๋‹ค kernel์„ ์Œ“๊ณ , ์ด๋ฅผ ๋ชจ๋‘ ๋”ํ•˜์—ฌ ๋ฒ”์ฃ„์œจ์— ๋Œ€ํ•œ ๋ฐ€๋„ํ•จ์ˆ˜(KDE)๋ฅผ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. $\hat{f}_h(x)=\frac{q}{nh}\sum_{i-1}^nK(\frac{x-x_i}{h})$ Kernel ํ•จ์ˆ˜ $K..

Convolution layer์˜ parameter ๊ฐœ์ˆ˜

Input channel์ด 1์ด๊ณ , output channel๋„ 1์ธ ๊ฒฝ์šฐ๋Š” ๋‹จ์ˆœํžˆ single-channel image์— convolution์„ ํ•˜๊ณ  bias term์„ ๋”ํ•ด์ฃผ๋ฉด ๋œ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ๋Š” bias term์€ ์ƒ๋žต๋์ง€๋งŒ, 2*2 kernel์„ ์‚ฌ์šฉํ•˜๊ณ  ์ถ”๊ฐ€๋กœ bias term์ด ํ•˜๋‚˜ ์žˆ์œผ๋‹ˆ ์ด ๊ฒฝ์šฐ ์ด parameter ์ˆ˜๋Š” 5์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ๋กœ deep learning model์— convolution layer์„ ์ด์šฉํ•  ๋•Œ์—๋Š” input channel์˜ ์ˆ˜๋„ ์—ฌ๋Ÿฌ ๊ฐœ์ด๊ณ , output channel์˜ ์ˆ˜๋„ ์—ฌ๋Ÿฌ ๊ฐœ์ธ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ์ด ๊ฒฝ์šฐ parameter ๊ฐœ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฒ•์„ ์„ค๋ช…ํ•˜๊ฒ ๋‹ค. Input channel์˜ ์ˆ˜๋ฅผ $C_{in}$, output channel์˜ ์ˆ˜๋ฅผ $C_{ou..

MSE Loss (L2 Loss) vs. MAE Loss (L1 Loss)

heartbeat.fritz.ai/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0 ์˜ ์ผ๋ถ€๋ฅผ ๋ฒˆ์—ญ & ์š”์•ฝํ•œ ๊ธ€ MSE Loss and MAE Loss MSE(Mean Squared Loss, L2 Loss)๋Š” Error์˜ ์ œ๊ณฑ์˜ ํ‰๊ท ์„ ๋‚ธ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ True value๋ฅผ 100์ด๋ผ๊ณ  ํ–ˆ์„ ๋•Œ Predicted value๋ฅผ -10,000์—์„œ 10,000๊นŒ์ง€ ๋ณ€ํ™”์‹œ์ผœ ๊ฐ€๋ฉฐ ๊ทธ๋ฆฐ ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค. x์ถ•์€ Predicted value, y์ถ•์€ MSE Loss์˜ ๊ฐ’์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. MAE(Mean Absolute Loss, L1 Loss)๋Š” Error์˜ ์ ˆ๋Œ€๊ฐ’์˜ ํ‰๊ท ์„ ๋‚ธ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์ˆ˜์‹์œผ๋กœ..

๋ฐ˜์‘ํ˜•