๐Ÿ Python & library/PyTorch

[PyTorch] Enable anomaly detection (torch.autograd.detect_anomaly() / torch.autograd.set_detect_anomaly(True))

๋ณต๋งŒ 2022. 1. 29. 18:06

๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋˜ ์ค‘ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [CUDAComplexFloatType [1, 256, 232]], which is output 0 of SubBackward0, is at version 10; expected version 9 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

์นœ์ ˆํ•˜๊ฒŒ๋„ Hint๋ฅผ ์ค€๋‹ค..

 

 

๊ฐ’์ด ๋ฐ”๋€Œ๋Š” inplace operation์œผ๋กœ ์ธํ•ด gradient computation ๊ณผ์ •์—์„œ Runtime error๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค. torch.autograd.set_detect_anomaly(True)๋ฅผ ํ†ตํ•ด anomaly detection์„ ์‚ฌ์šฉํ•˜๋ฉด ์–ด๋””์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.

 

 

 

PyTorch Anomaly Detection ์‚ฌ์šฉ๋ฐฉ๋ฒ•

PyTorch์˜ anomaly detection์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

with torch.autograd.detect_anomaly():
    y_pred = model(x)
    loss = loss_f(y_pred, y)
    loss.backward()

์ฐธ๊ณ ๋กœ torch.autograd.detect_anomaly()์™€ torch.autograd.set_detect_anomaly(True)๋Š” ๊ฐ™์€ ์˜๋ฏธ์ด๋‹ค.

 

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด gradient ๊ณ„์‚ฐ ๊ฐ ๊ณผ์ •์—์„œ anomaly detection์„ ์ถ”๊ฐ€๋กœ ์ง„ํ–‰ํ•˜์—ฌ ์–ด๋””์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ๋Š”์ง€ ์•Œ๋ ค์ค€๋‹ค.

๋‹จ ์‹คํ–‰์‹œ๊ฐ„์ด ๊ธธ์–ด์ง„๋‹ค.

 

 

 

์‹คํ–‰ ๊ฒฐ๊ณผ ์˜ˆ์‹œ

๐ŸŽ anomaly detection ์ถ”๊ฐ€ ์ „:

Traceback (most recent call last):
  File "train.py", line 169, in <module>
    main(args)
  File "train.py", line 112, in main
    loss.backward()
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [CUDAComplexFloatType [1, 256, 232]], which is output 0 of SubBackward0, is at version 10; expected version 9 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

 

 

๐Ÿ anomaly detection ์ถ”๊ฐ€ ํ›„:

  File "train.py", line 170, in <module>
    main(args)
  File "train.py", line 108, in main
    y_pred = model(x)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/user/model.py", line 197, in forward
    x = self.dc(z)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/user/model.py", line 115, in forward
    rTr = torch.sum(r.conj()*r).real
 (function _print_stack)
  0%|                                                                                                                                                                                                        | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 170, in <module>
    main(args)
  File "train.py", line 113, in main
    loss.backward()
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [CUDAComplexFloatType [1, 256, 232]], which is output 0 of SubBackward0, is at version 10; expected version 9 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

 

๋งˆ์ง€๋ง‰์— Good luck! ๊นŒ์ง€ ๋นŒ์–ด์ค€๋‹ค.

 

์ฝ”๋“œ๋ฅผ ์ž˜๋ชป ์ง  ์‚ฌ์šฉ์ž๋ฅผ ์œ„ํ•ด ์ด๋Ÿฐ ๊ธฐ๋Šฅ๊นŒ์ง€ ์ œ๊ณตํ•ด์ฃผ๋Š” ํŒŒ์ดํ† ์น˜..

์ •๋ง ์นœ์ ˆํ•˜๋‹ค. ๐Ÿ™‚๐Ÿ‘

 

 


 

์ฐธ๊ณ ๋กœ ๋‚˜๋Š” model.py์˜ line 114~115๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™์•˜๋Š”๋ฐ

r -= alpha * Ap
rTr = torch.sum(r.conj()*r).real

 

๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ˆ˜์ •ํ–ˆ๋”๋‹ˆ ์ž˜ ๋™์ž‘ํ–ˆ๋‹ค.

r = r - alpha * Ap
rTr = torch.sum(r.conj()*r).real

 

์™œ์ธ์ง„ ๋ชจ๋ฆ„.. ^_^

๋ฐ˜์‘ํ˜•