๐Ÿ’ฉ ์—๋Ÿฌ ํ•ด๊ฒฐ

[PyTorch/์—๋Ÿฌ ํ•ด๊ฒฐ] Dataparallel์ด complex tensor์„ real view๋กœ ์ „ํ™˜์‹œํ‚ค๋Š” ๋ฌธ์ œ

๋ณต๋งŒ 2022. 1. 10. 16:31

๋ฌธ์ œ: 

complex tensor์„ input์œผ๋กœ ๋ฐ›๋Š” ๋ชจ๋ธ์„ ์‚ฌ์šฉ ์ค‘์ด์—ˆ๊ณ ,

forward method๋ฅผ ํ…Œ์ŠคํŠธ ํ•  ๋•Œ๋Š” ์ž˜ ๋Œ์•„๊ฐ€๋‹ค๊ฐ€

์ „์ฒด train ์ฝ”๋“œ๋ฅผ ๋Œ๋ ธ๋”๋‹ˆ tensor ์ฐจ์›์ด ์•ˆ๋งž๋Š”๋‹ค๋Š” ์—๋Ÿฌ๋ฅผ ๋‚ด๋ฑ‰์—ˆ๋‹ค..

 

RuntimeError: The size of tensor a (2) must match the size of tensor b (232) at non-singleton dimension 3

 

๋ฐ”๋กœ ์ด๋ ‡๊ฒŒ..

 

๋””๋ฒ„๊น…์„ ํ•ด๋ณด๋‹ˆ complex tensor๊ฐ€ model ๋‚ด๋ถ€๋กœ ๋“ค์–ด๊ฐ€๋ฉด float์œผ๋กœ ๋ณ€ํ™˜๋˜๋ฉด์„œ real-imag part๊ฐ€ ๋ถ„๋ฆฌ๋˜๋Š” ๊ฒƒ์ด์—ˆ๋‹ค,,

๋”ฐ๋กœ model forward ์ฝ”๋“œ๋งŒ ๋Œ๋ฆด๋•Œ๋Š” ์ž˜๋งŒ ๋Œ์•„๊ฐ”๋Š”๋ฐ ?

 

 

 

์›์ธ:

๊ฒฐ๋ก ์€.. nn.DataParallel์ด ๋ฌธ์ œ์˜€๋‹ค 

nn.DataParallel๋กœ ๊ฐ์‹ผ ๋ชจ๋ธ์€ ๋‚ด๋ถ€๋กœ ์ „๋‹ฌ๋œ input์— torch.view_as_real์„ ํ˜ธ์ถœํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.

(๊ทผ๋ฐ ๊ทธ๋Ÿฌ๋ฉด ๋‹ค์‹œ ์›๋ž˜๋Œ€๋กœ ๋Œ๋ ค๋†”์•ผ ํ•˜๋Š”๊ฑฐ ์•„๋‹Œ๊ฐ€์š”?)

 

๊ทธ๋ž˜์„œ ์ด๋ ‡๊ฒŒ ๋œ๋‹ค.

 

์ž์„ธํ•œ ์„ค๋ช…์€ ์•„๋ž˜ ๋‘ ๋งํฌ์—์„œ ํ™•์ธ.

 

Data Parallel splits Complex Parameter · Issue #60931 · pytorch/pytorch

๐Ÿ› Bug I am testing a toy model that fourier transforms an image and do pointwise multiplication with a complex tensor and then inverse fourier transform. When I train the model with single GPU ever...

github.com

 

`DataParallel` (`broadcast_coalesced`) with complex tensors yield real views · Issue #55375 · pytorch/pytorch

๐Ÿ› Bug Using DataParallel on complex tensors (either parameters or inputs/outputs) yield real views. The expected behavior would be to obtain complex tensors on each replicate. Casting the views bac...

github.com

 

ํ•ด๊ฒฐ๋ฐฉ๋ฒ•:

DataParallel์€ maintainance mode์— ์žˆ๊ธฐ ๋•Œ๋ฌธ์— DistributedDataParallel (DDP)๋ฅผ ์“ฐ๋ผ๊ณ  ํ•œ๋‹ค

DataParallel์€ ์—ฌ๋Ÿฌ๋ชจ๋กœ ๋ฌธ์ œ๊ฐ€ ๋งŽ๊ตฌ๋‚˜

์•„๋‹ˆ๋ฉด GPU ํ•˜๋‚˜๋งŒ ์“ฐ๋˜์ง€..

๋ฐ˜์‘ํ˜•