๐ŸŒŒ Deep Learning/Overview

[Overview] Attention ์ •๋ฆฌ - (2) seq2seq, +attention

๋ณต๋งŒ 2021. 1. 26. 11:56

์ˆœ์„œ:

(1) LSTM

(2) seq2seq, +attention

(3) Show, Attend and Tell

 

 

Reference: Visualization of seq2seqmodel

 

Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

Translations: Chinese (Simplified), Japanese, Korean, Russian, Turkish Watch: MIT’s Deep Learning State of the Art lecture referencing this post May 25th update: New graphics (RNN animation, word embedding graph), color coding, elaborated on the final at

jalammar.github.io

 

 

seq2seq

 

[1] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." arXiv preprint arXiv:1409.3215 (2014).

[2] Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).

 

๋ง ๊ทธ๋Œ€๋กœ sequence๋กœ ์ž…๋ ฅ์„ ๋ฐ›์•„ sequence๋กœ ์ถœ๋ ฅํ•œ๋‹ค.

input ๊ธธ์ด์™€ output ๊ธธ์ด์˜ ์ œํ•œ์ด ์—†์–ด ์œ ์šฉํ•˜๋‹ค.

 

https://jeddy92.github.io/JEddy92.github.io/ts_seq2seq_intro/

 

๊ธฐ๋ณธ์ ์œผ๋กœ Encoder-Decoder ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„๋‹ค.

Encoder์ด input์œผ๋กœ context vector์„ ์ถ”์ถœํ•˜๊ณ ,

Decoder์€ context๋ฅผ ๋ฐ›์•„ output์„ ์ƒ์„ฑํ•œ๋‹ค.

Encoder๊ณผ Decoder์€ ๊ฐ๊ฐ RNN์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

 

Encoder์˜ ๊ฐ cell๋“ค์€ input๊ณผ ์ด์ „ cell์˜ hidden state๋ฅผ ๋ฐ›์•„ ๋‹ค์Œ hidden state๋ฅผ ์ถœ๋ ฅํ•˜๊ณ ,

Decoder์˜ ๊ฐ cell๋“ค์€ ์ด์ „ cell์˜ hidden state์™€ output์„ ๋ฐ›์•„ output๊ณผ ๋‹ค์Œ hidden state๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค.

Decoder์˜ ์ฒซ cell์€ context vector๊ณผ <sos> ํ† ํฐ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๋Š”๋‹ค.

 

 

 

seq2seq + attention

 

[3] Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).

[4] Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).

 

 

- seq2seq์˜ context vector์ด bottleneck์œผ๋กœ ์ž‘์šฉํ•œ๋‹ค. ์ด์— long sentence์— ์ œ๋Œ€๋กœ ๋™์ž‘ํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.

 

- seq2seq์— attention ๊ฐœ๋…์„ ์ ์šฉํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค.

 

- Attention์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„๋œ๋‹ค.
Attention(Q, K, V) : ์ฃผ์–ด์ง„ Query Q์— ๋Œ€ํ•ด ๋ชจ๋“  Key K์™€์˜ Score value V๋ฅผ ๊ตฌํ•˜๊ณ , ์ด๋ฅผ ๋ชจ๋‘ ๋”ํ•ด์„œ returnํ•œ๋‹ค.

 

- ๋งˆ์ง€๋ง‰ hidden vector(=context vector)์„ decoder์— ์ „๋‹ฌํ•˜๋Š” seq2seq์™€ ๋‹ฌ๋ฆฌ, ๋ชจ๋“  cell์˜ hidden state๋ฅผ decoder์— ์ „๋‹ฌํ•œ๋‹ค.

 

 

seq2seq
seq2seq-attention (https://towardsdatascience.com/day-1-2-attention-seq2seq-models-65df3f49e263)

 


 

Decoder์˜ ๊ฐ cell๋“ค์€ forward ๊ณผ์ • ์ดํ›„ Encoder์˜ hidden state๋“ค์„ ์ด์šฉํ•ด ๋‹ค์Œ์˜ ์ถ”๊ฐ€์ ์ธ ์ž‘์—…์„ ๊ฑฐ์นœ๋‹ค.

 

jalammar.github.io/images/attention_tensor_dance.mp4

 

1. Encoder์—์„œ ๋ชจ๋“  hidden state๋ฅผ ๋ฐ›๋Š”๋‹ค.

 

 

2. ๊ฐ Encoder hidden state์— ํ˜„์žฌ ์‹œ์ ์˜ Decoder hidden state์— ๋Œ€ํ•œ  score์„ ๋ถ€์—ฌํ•˜๊ณ , softmax๋ฅผ ์ทจํ•œ๋‹ค. => Attention

 

- Score์€ ํ˜„์žฌ ์‹œ์ ์˜ Decoder hidden state์— ๋Œ€ํ•œ ๊ฐ Encoder hidden state์˜ ์ค‘์š”๋„๋ฅผ ์˜๋ฏธํ•œ๋‹ค. 

- ์ฆ‰, ๊ฐ Encoder hidden state์˜ score์€ ํ•ด๋‹น input์— ๋Œ€ํ•œ weight๋กœ ์ž‘์šฉํ•˜๋Š” ๊ฒƒ.

- ๋งค๋ฒˆ Decoder output์„ ๊ณ„์‚ฐํ•  ๋•Œ๋งˆ๋‹ค Encoder์˜ ๋ชจ๋“  ์‹œ์ ์˜ hidden state๋ฅผ ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

- Softmax๋ฅผ ์ทจํ•œ score ๊ฐ’๋“ค์„ Attention distribution์ด๋ผ๊ณ  ํ•œ๋‹ค.

 

* Scoring์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋Š”๋ฐ, ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ decoder hidden state์™€ encoder hidden state๋ฅผ dot product ํ•˜๋Š” ๊ฒƒ.

incredible.ai/nlp/2020/02/20/Sequence-To-Sequence-with-Attention/ ์—์„œ ๋ณด๋‹ค ์ž์„ธํ•œ ์„ค๋ช…๊ณผ ์˜ˆ์‹œ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

* ํ˜น์€, decoder hidden state์™€ encoder hidden state๋ฅผ neural network์— ๋„ฃ์–ด score์„ ๊ณ„์‚ฐํ•  ์ˆ˜๋„ ์žˆ๋‹ค. [4]

* lilianweng.github.io/lil-log/2018/06/24/attention-attention.html#whats-wrong-with-seq2seq-model์—์„œ ๋‹ค์–‘ํ•œ attention์˜ ์ข…๋ฅ˜๋ฅผ ํ‘œ๋กœ ์ •๋ฆฌํ•ด ๋†“์•˜๋‹ค.

 

(์ขŒ) https://wikidocs.net/22893 (์šฐ) [4]

 

 

3. ์œ„์—์„œ ์–ป์€ (softmax) score๊ณผ Encoder hidden state๋ฅผ ๊ณฑํ•œ ํ›„ ๋ชจ๋“  ๊ฐ’์„ ๋”ํ•ด ํ•˜๋‚˜์˜ vector์„ ๋งŒ๋“ ๋‹ค.

์ด๋Š” ์ค‘์š”ํ•œ hidden state๋Š” ๊ฐ•์กฐํ•˜๊ณ , ์ค‘์š”ํ•˜์ง€ ์•Š์€ hidden state๋Š” ์ค„์ด๋Š” ๊ฒƒ์ด๋‹ค. ์ด๊ฒƒ์ด attention์˜ ๊ฐœ๋…์ด๋‹ค.

์ด๊ฒƒ์ด ๊ฐ decoder cell์˜ context vector์ด ๋˜๋Š”๊ฒƒ.

 

 

์—ฌ๊ธฐ๊นŒ์ง€์˜ ๊ณผ์ •์„ ์ˆ˜์‹์œผ๋กœ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html#whats-wrong-with-seq2seq-model

 

 

4. context vector๊ณผ Decoder hidden state๋ฅผ concat ํ›„ Dense layer์— ํ†ต๊ณผ์‹œ์ผœ output์„ ์ถœ๋ ฅํ•œ๋‹ค.

 

 

๋‹ค์‹œ ๋งํ•ด, ๊ฐ cell์—์„œ๋Š”

1) ์ด์ „ hidden state์™€ ์ด์ „ output์„ ์ด์šฉํ•ด ์ƒˆ๋กœ์šด hidden state๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ ,

2) Encoder hidden state set์„ ์ด์šฉํ•ด context vector์„ ๊ณ„์‚ฐํ•˜๊ณ ,

3) ์ƒˆ๋กœ์šด hidden state์™€ context vector์„ ์ด์šฉํ•ด output๋ฅผ ๊ตฌํ•˜๋Š” ๊ณผ์ •์„ ๊ฐ๊ฐ ๊ฑฐ์น˜๋Š” ๊ฒƒ์ด๋‹ค.

 

 


 

[์ฐธ๊ณ ์ž๋ฃŒ]

๊ตฌํ˜„์— ๋„์›€์ด ๋  ๋งŒํ•œ ๊ธ€๋“ค.

๋ฐ˜์‘ํ˜•