๋ฐ˜์‘ํ˜•

๐ŸŒŒ Deep Learning 50

์นด์นด์˜ค๋ธŒ๋ ˆ์ธ Multimodal LLM Honeybee ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

์นด์นด์˜ค๋ธŒ๋ ˆ์ธ์—์„œ ์ž‘๋…„ ๋ง Multimodal LLM์ธ Honeybee๋ฅผ ๋ฐœํ‘œํ–ˆ๋‹ค. ์•„์‰ฝ๊ฒŒ๋„ ํ•œ๊ตญ์–ด ๋ชจ๋ธ์€ ์•„๋‹ˆ๊ณ  ์˜์–ด ๋ชจ๋ธ์ด๊ณ , 5๊ฐœ์˜ ๋ฒค์น˜๋งˆํฌ์—์„œ SoTA๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค๊ณ  ํ•ด์„œ ๋‰ด์Šค๊ฐ€ ์—„์ฒญ ๋งŽ์ด ๋‚˜์™”๋‹ค. ๋…ผ๋ฌธ: https://arxiv.org/pdf/2312.06742.pdf ๊นƒํ—™: https://github.com/kakaobrain/honeybee GitHub - kakaobrain/honeybee: The official implementation of project "Honeybee" The official implementation of project "Honeybee". Contribute to kakaobrain/honeybee development by creating an account o..

[๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ๋ฆฌ๋ทฐ] MeZO: Fine-Tuning Language Models with Just Forward Passes (NeurIPS 2023)

๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/pdf/2305.17333.pdf ๋ฐœํ‘œ ์˜์ƒ: https://neurips.cc/virtual/2023/poster/71437 ์ฝ”๋“œ: https://github.com/princeton-nlp/MeZO NeurIPS 2023 Abstract: Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients us..

[๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ๋ฆฌ๋ทฐ] AIM: Scalable Pre-training of Large Autoregressive Image Models (Apple, 2024)

Apple์—์„œ 2024๋…„ 1์›” large pretrained image model์ธ AIM(Autoregressive Image Models)์„ ๋ฐœํ‘œํ–ˆ๋‹ค. ์ฝ”๋“œ์™€ model weight์ด Github์— ๊ณต๊ฐœ๋˜์–ด ์žˆ๋‹ค. ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/pdf/2401.08541.pdf GitHub: https://github.com/apple/ml-aim/tree/main AIM์€ LLM์— ์˜๊ฐ์„ ๋ฐ›์•„ ๋งŒ๋“ค์–ด์ง„ ๋Œ€๊ทœ๋ชจ vision ๋ชจ๋ธ์ด๋‹ค. BEiT (2021), Masked autoencoder(MAE) (2021) ๋“ฑ์ด masked language modeling (MLM)์„ ํ†ตํ•ด ์‚ฌ์ „ํ•™์Šต ์‹œํ‚จ ๊ฒƒ๊ณผ ๋‹ค๋ฅด๊ฒŒ, ์ฃผ์–ด์ง„ ํŒจ์น˜๋กœ ๋‹ค์Œ ํŒจ์น˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” autoregressive object๋ฅผ ์ด์šฉ..

ํ•œ๊ตญ์–ด ์˜คํ”ˆ์†Œ์Šค ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ ๋ชจ์Œ (image-text)

ํ˜น์€ awesome-korean-multimodal ๊ฐ™์€๊ฒƒ ์‚ฌ์‹ค ํ•œ๊ตญ์–ด LLM๋„ ๋งŽ์ด ์—†๊ฑฐ๋‹ˆ์™€, ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœ๋œ ํ•œ๊ตญ์–ด ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ LLM(MLLM)์€ ์ •๋ง ์–ผ๋งˆ ์•ˆ๋˜๋Š”๋“ฏ ํ•˜๋‹ค. (์ฐธ๊ณ : ํ•œ๊ตญ์–ด LLM ๋ชจ๋ธ ๋ชจ์Œ - awesome-korean-llm) GitHub - NomaDamas/awesome-korean-llm: Awesome list of Korean Large Language Models. Awesome list of Korean Large Language Models. Contribute to NomaDamas/awesome-korean-llm development by creating an account on GitHub. github.com ํ•œ๊ตญ์–ด multimodal llm ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ m..

Apple์˜ Multimodal LLM Ferret ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

Apple์—์„œ 2023๋…„ 10์›” ๋‚ด๋†“์€ Multimodal LLM์ธ Ferret์˜ ๋…ผ๋ฌธ์ด๋‹ค. ๋ชจ๋ธ ํฌ๊ธฐ๋Š” 7B, 13B ๋‘๊ฐ€์ง€์ด๋ฉฐ Github์— ์ฝ”๋“œ์™€ checkpoint๊ฐ€ ๊ณต๊ฐœ๋˜์–ด ์žˆ๊ณ , ๋น„์ƒ์—…์  ์šฉ๋„๋กœ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•˜๋‹ค. ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/pdf/2310.07704.pdf Github: https://github.com/apple/ml-ferret GitHub - apple/ml-ferret Contribute to apple/ml-ferret development by creating an account on GitHub. github.com Introduction Vision-language learning ๋ชจ๋ธ์˜ ์ฃผ์š”ํ•œ ๋‘ capability๋Š” referring๊ณผ groun..

[๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ๋ฆฌ๋ทฐ] AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights (Naver AI Lab, ICLR 2021)

Adam์˜ ๋ฌธ์ œ์ ์„ ๊ทน๋ณตํ•˜๋Š” ์ƒˆ๋กœ์šด optimizer์ธ AdamP๋ฅผ ์ œ์•ˆํ•˜๋Š” ๋…ผ๋ฌธ์œผ๋กœ, Naver AI Lab & Naver Clova์—์„œ ICLR 2021์— ๋ฐœํ‘œํ•˜์˜€๋‹ค. Paper: https://arxiv.org/pdf/2006.08217.pdf Project page: https://clovaai.github.io/AdamP/ Code: https://github.com/clovaai/adamp 1. Adam์˜ ๋ฌธ์ œ์  ์š”์•ฝ: Adam์„ ๋น„๋กฏํ•œ momentum-based gradient descent optimzer๋“ค์€ ํ•™์Šต ๋„์ค‘ weight norm์„ ํฌ๊ฒŒ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ๊ทธ ์›์ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ๋“ค์—์„œ Batch normalization ๋“ฑ์˜ normalization ๊ธฐ๋ฒ•๋“ค์„ ์‚ฌ์šฉํ•ด we..

fastMRI ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ ๋ฐ ์‚ฌ์šฉ๋ฒ• Tutorial

fastMRI๋Š” Facebook AI์™€ NYU Langone Health๊ฐ€ 2018๋…„ ๊ณต๊ฐœํ•œ MRI reconstruction์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹์ด๋‹ค. ๊ณต์‹ ์‚ฌ์ดํŠธ: ๋ฐ์ดํ„ฐ์…‹ ๊ฐœ์š”์™€ file description, ๋ฆฌ๋”๋ณด๋“œ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ: ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ๋ณด๋‹ค ์ž์„ธํ•œ ์„ค๋ช…์„ ํ™•์ธํ•˜๊ณ  ๋ฐ์ดํ„ฐ์…‹์„ ๋‹ค์šด๋กœ๋“œํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ณต์‹ Github repo: PyTorch ๋ฐ์ดํ„ฐ๋กœ๋”, ๋ชจ๋ธ ๋“ฑ ๊ด€๋ จ๋œ ์ฝ”๋“œ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. Paper: ๋ฐ์ดํ„ฐ ํš๋“ ๋ฐฉ๋ฒ•๊ณผ ์ดฌ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ๋“ฑ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 1. ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ ๋ฐ์ดํ„ฐ์…‹์˜ ์ข…๋ฅ˜๋Š” ๋‹ค์„ฏ๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ๋ณด๋‹ค ์ž์„ธํ•œ ์„ค๋ช…์€ Paper์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. Single-coil knee: 1594 scans (PD, PDFS) ..

[PyTorch Implementation] StyleGAN2

StyleGAN2(Analyzing and Improving the Image Quality of StyleGAN, 2020)์˜ PyTorch ์ฝ”๋“œ๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€. ์œ„ Repo๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ผ๋ถ€ ์ˆ˜์ •ํ–ˆ์œผ๋ฉฐ, ์ „์ฒด์ ์ธ ํ๋ฆ„ ์ดํ•ด๋ฅผ ์œ„ํ•œ ์ฝ”๋“œ๋กœ, logging ๋“ฑ ๋งŽ์€ ๋ถ€๋ถ„์ด ์ƒ๋žต๋˜์–ด ์žˆ์Œ. StyleGAN ์‹œ๋ฆฌ์ฆˆ ์„ค๋ช…: https://bo-10000.tistory.com/158 [StyleGAN ์‹œ๋ฆฌ์ฆˆ] ProGAN/PGGAN, StyleGAN, StyleGAN2 ProGAN๋ถ€ํ„ฐ StyleGAN2๊นŒ์ง€, style transfer์—์„œ ๊ฐ€์žฅ ์œ ๋ช…ํ•œ ๋ชจ๋ธ์ธ StyleGAN์˜ ๋ณ€์ฒœ์‚ฌ์™€ ๊ฐ ๋ชจ๋ธ์˜ ํŠน์ง•์„ ๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•ด ๋ณด๊ณ ์ž ํ•œ๋‹ค. 1. ProGAN/PGGAN (ICLR 2018) Paper: Progressive G..

[๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ๋ฆฌ๋ทฐ] Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

Multimodal (Audio, visual) ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด speech enhancement๋ฅผ ์ˆ˜ํ–‰ํ•œ ๋…ผ๋ฌธ์ด๋‹ค. ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/ftp/arxiv/papers/1703/1703.10893.pdf Introduction Speech enhancement (SE)๋ž€ speech signal์˜ ์žก์Œ ์ œ๊ฑฐ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ SE ๊ธฐ์ˆ ๋“ค์€ audio ๋ฐ์ดํ„ฐ๋งŒ์„ ์ด์šฉํ•˜์ง€๋งŒ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” visual ๋ฐ์ดํ„ฐ (์ž…๋ชจ์–‘ ์ด๋ฏธ์ง€)๋ฅผ ํ•จ๊ป˜ ์ด์šฉํ•ด SE์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐ ์„ฑ๊ณตํ–ˆ๋‹ค. Method ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” Audio-Visual Deep CNN (AVDCNN) SE ๋ชจ๋ธ์€ audio-visual encoder-decoder network ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„๋‹ค. 1. ์šฐ์„ , CNN์„ ..

[๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ๋ฆฌ๋ทฐ + ์ฝ”๋“œ] PointCutMix: Regularization Strategy for Point Cloud Classification (Neurocomputing 2022)

CutMix augmentation์„ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•œ ๋…ผ๋ฌธ์ด๋‹ค. ๋‘ ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์ผ๋Œ€์ผ ๋Œ€์‘๊ด€๊ณ„๋ฅผ ์ฐพ๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‘ ๋ฐ์ดํ„ฐ๋ฅผ ์„ž๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ–ˆ๋‹ค. Paper: https://arxiv.org/pdf/2101.01461.pdf Code: https://github.com/cuge1995/PointCutMix Introduction ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด mixed sample data augmentation (MSDA)๊ฐ€ ํ™œ๋ฐœํ•˜๊ฒŒ ์‚ฌ์šฉ๋˜์–ด ์™”๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๋Š” MixUp (Zhang et al., 2018)๊ณผ CutMix (Yun et al., 2019) ๊ฐ€ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํฌ์ธํŠธํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด CutMix๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” PointCutMix๋ฅผ ์ œ์•ˆํ•œ..

๋ฐ˜์‘ํ˜•