๐ŸŒŒ Deep Learning/Overview

[Overview] Attention ์ •๋ฆฌ - (1) LSTM

๋ณต๋งŒ 2021. 1. 26. 11:03

์ˆœ์„œ:

(1) LSTM

(2) seq2seq, +attention

(3) Show, Attend and Tell


reference: colah.github.io/posts/2015-08-Understanding-LSTMs/

 

Recurrent Neural Network

 

 

๊ธฐ๋ณธ์ ์ธ RNN์˜ ๊ตฌ์กฐ๋Š” ์œ„์™€ ๊ฐ™๋‹ค. ์ด์ „์˜ state๋ฅผ ํ•จ๊ป˜ input์œผ๋กœ ์ฃผ์–ด ์ด์ „ input๊ณผ์˜ ์—ฐ๊ด€์„ฑ์„ ํ•จ๊ป˜ ํ•™์Šตํ•ด ๋‚˜๊ฐ„๋‹ค.

 

ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๋Š”, input์˜ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก, ๋„คํŠธ์›Œํฌ์˜ ๋’ท๋ถ€๋ถ„์œผ๋กœ ๊ฐˆ ์ˆ˜๋ก ์•ž ๋ถ€๋ถ„์˜ ์ •๋ณด๋ฅผ ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค.

 

 

LSTM์€ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ œ์‹œ๋˜์—ˆ๋‹ค.

 


Long Short Term Memory

 

The repeating module in a standard RNN contains a single layer.
The repeating module in an LSTM contains four interacting layers.

 

์œ„๋Š” ๊ฐ„๋‹จํ•œ RNN์˜ ๊ตฌ์กฐ, ์•„๋ž˜๋Š” LSTM์˜ ๊ตฌ์กฐ์ด๋‹ค. ๊ฐ ๊ธฐํ˜ธ์˜ ์˜๋ฏธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 


The Core Idea Behind LSTM

 

cell state

LSTM์—์„œ ํ•ต์‹ฌ์ด ๋˜๋Š” ์•„์ด๋””์–ด๋Š” Cell state์ด๋‹ค. LSTM์—์„œ Cell state์˜ ํ๋ฆ„๋งŒ ๋ณด๋ฉด, ๊ฐ„๋‹จํ•œ ์„ ํ˜• ์—ฐ์‚ฐ๋งŒ์ด ๊ฐ€ํ•ด์ง€๋ฉด์„œ ๋‹ค์Œ state๋กœ ์ •๋ณด๊ฐ€ ์ „๋‹ฌ๋œ๋‹ค.

 

gate

LSTM์€ gate๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ๋‹ค์Œ cell๋กœ ํ๋ฅผ ์ •๋ณด์˜ ์–‘์„ ์ œ์–ดํ•œ๋‹ค. Sigmoid layer์„ ๊ฑฐ์นœ ๊ฐ’์ด ๊ณฑํ•ด์ง€๋Š” ๊ตฌ์กฐ๋กœ ๋˜์–ด ์žˆ๋‹ค.

Sigmoid๋Š” 0๋ถ€ํ„ฐ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•˜๋ฏ€๋กœ, ์ด๋Š” '์–ผ๋งŒํผ์˜ ์ •๋ณด๋ฅผ cell state์— ์ „๋‹ฌํ• ์ง€'๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด sigmoid๋ฅผ ๊ฑฐ์นœ ๊ฐ’์ด 0์ด ๋œ๋‹ค๋ฉด '๋‹ค์Œ cell๋กœ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜์ง€ ์•Š์Œ'์„ ์˜๋ฏธํ•˜๊ณ ,sigmoid๋ฅผ ๊ฑฐ์นœ ๊ฐ’์ด 1์ด ๋œ๋‹ค๋ฉด '๋ชจ๋“  ์ •๋ณด๋ฅผ ๊ทธ๋Œ€๋กœ ์ „๋‹ฌํ•จ'์„ ์˜๋ฏธํ•˜๊ฒŒ ๋œ๋‹ค.

 

LSTM์€ 3๊ฐœ์˜ gate๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๊ณ ,  ๊ฐ๊ฐ์˜ ์˜๋ฏธ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ๋‹ค.

 


Three Gates of LSTM

 

first gate

์ฒซ ๋ฒˆ์งธ gate๋Š” ์ด์ „ cell state์—์„œ ๋‹ค์Œ cell state๋กœ ์ „๋‹ฌํ•  ์ •๋ณด์˜ ์–‘์„ ๊ฒฐ์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

์ด์ „ cell์˜ hidden state์™€ ์ด๋ฒˆ cell์˜ input ๊ฐ’์„ ์ด์šฉํ•ด 0๋ถ€ํ„ฐ 1 ์‚ฌ์ด ๊ฐ’์„ ์ถœ๋ ฅํ•˜๊ณ , ์ด๋ฅผ ์ด์ „ cell state์— ๊ณฑํ•ด์ค€๋‹ค.

 

 

second gate

๋‘ ๋ฒˆ์งธ gate๋Š” ๋‹ค์Œ cell state์— ์ถ”๊ฐ€ํ•  ์ •๋ณด๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

Tanh layer์„ ํ†ตํ•ด cell state์˜ ํ›„๋ณด(candidate value)๊ฐ€ ๋˜๋Š” ๊ฐ’์„ ๊ฒฐ์ •ํ•˜๊ณ ,sigmoid layer์„ ํ†ตํ•ด ์ด ํ›„๋ณด ๊ฐ’ ์ค‘ ์–ด๋–ค ๊ฐ’์„ ์–ผ๋งŒํผ ์ถ”๊ฐ€ํ•  ๊ฒƒ์ธ์ง€๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค.๋‘ ๊ฐ’์„ ๊ณฑํ•ด cell state์— ๋”ํ•ด์ค€๋‹ค.

 

third gate

๋งˆ์ง€๋ง‰์œผ๋กœ hidden state ๊ฐ’์„ ๊ฒฐ์ •ํ•œ๋‹ค. Cell state๊ฐ€ ๋‹ค์Œ cell๋กœ ๊ณ„์† ํ˜๋Ÿฌ๊ฐ€๋Š” ๊ฐ’์ด๋ผ๋ฉด, hidden state๋Š” ์ผ๋ฐ˜์ ์ธ CNN์—์„œ์™€ ๊ฐ™์ด ์ค‘๊ฐ„ layer์˜ output ๊ฐ’์ด๋‹ค. Hidden state๋Š” cell์˜ output์œผ๋กœ ์ถœ๋ ฅ๋˜๊ธฐ๋„ ํ•˜๊ณ , ๋‹ค์Œ cell๋กœ ์ „๋‹ฌ๋˜๊ธฐ๋„ ํ•œ๋‹ค.

๋‘ ๋ฒˆ์งธ layer๊ณผ ๋ฐ˜๋Œ€ ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค.

Cell state์˜ ๊ฐ’์„ tanh layer์— ํ†ต๊ณผ์‹œ์ผœ ๊ฐ’์„ ์ถ”์ถœํ•˜๊ณ ,

sigmoid layer์„ ํ†ตํ•ด ์–ด๋–ค ๊ฐ’์„ ์–ผ๋งŒํผ ์‚ฌ์šฉํ•  ๊ฒƒ์ธ์ง€๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค.

๋‘ ๊ฐ’์„ ๊ณฑํ•ด ๋‹ค์Œ hidden state์˜ ๊ฐ’์„ ๊ฒฐ์ •ํ•œ๋‹ค.

 

-

 

๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•˜๋ฉด, gate์—์„œ tanh layer์€ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ์—ญํ• ์„ ํ•˜๊ณ ,

sigmoid layer์€ ์‚ฌ์šฉํ•  ์ •๋ณด์˜ ์–‘์„ ๊ฒฐ์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

๋ฐ˜์‘ํ˜•