[PyTorch] Enable anomaly detection (torch.autograd.detect_anomaly() / torch.autograd.set_detect

🐍 Python & library/PyTorch

[PyTorch] Enable anomaly detection (torch.autograd.detect_anomaly() / torch.autograd.set_detect_anomaly(True))

복만 2022. 1. 29. 18:06

딥러닝 모델을 학습시키던 중 다음과 같은 에러가 발생했다.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [CUDAComplexFloatType [1, 256, 232]], which is output 0 of SubBackward0, is at version 10; expected version 9 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

친절하게도 Hint를 준다..

값이 바뀌는 inplace operation으로 인해 gradient computation 과정에서 Runtime error가 발생했다고 한다. torch.autograd.set_detect_anomaly(True)를 통해 anomaly detection을 사용하면 어디서 오류가 발생했는지 알 수 있다고 한다.

PyTorch Anomaly Detection 사용방법

PyTorch의 anomaly detection은 다음과 같은 방법으로 사용할 수 있다.

with torch.autograd.detect_anomaly():
    y_pred = model(x)
    loss = loss_f(y_pred, y)
    loss.backward()

참고로 torch.autograd.detect_anomaly()와 torch.autograd.set_detect_anomaly(True)는 같은 의미이다.

이렇게 하면 gradient 계산 각 과정에서 anomaly detection을 추가로 진행하여 어디에서 오류가 발생했는지 알려준다.

단 실행시간이 길어진다.

실행 결과 예시

🍎 anomaly detection 추가 전:

Traceback (most recent call last):
  File "train.py", line 169, in <module>
    main(args)
  File "train.py", line 112, in main
    loss.backward()
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [CUDAComplexFloatType [1, 256, 232]], which is output 0 of SubBackward0, is at version 10; expected version 9 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

🍏 anomaly detection 추가 후:

  File "train.py", line 170, in <module>
    main(args)
  File "train.py", line 108, in main
    y_pred = model(x)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/user/model.py", line 197, in forward
    x = self.dc(z)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/user/model.py", line 115, in forward
    rTr = torch.sum(r.conj()*r).real
 (function _print_stack)
  0%|                                                                                                                                                                                                        | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 170, in <module>
    main(args)
  File "train.py", line 113, in main
    loss.backward()
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [CUDAComplexFloatType [1, 256, 232]], which is output 0 of SubBackward0, is at version 10; expected version 9 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

마지막에 Good luck! 까지 빌어준다.

코드를 잘못 짠 사용자를 위해 이런 기능까지 제공해주는 파이토치..

정말 친절하다. 🙂👍

참고로 나는 model.py의 line 114~115가 다음과 같았는데

r -= alpha * Ap
rTr = torch.sum(r.conj()*r).real

다음과 같이 수정했더니 잘 동작했다.

r = r - alpha * Ap
rTr = torch.sum(r.conj()*r).real

왜인진 모름.. ^_^

'🐍 Python & library > PyTorch' 카테고리의 다른 글

[PyTorch] nn.Conv의 padding과 padding_mode (2)	2022.03.24
[PyTorch] Weight clipping (0)	2022.01.29
[PyTorch] 모델 시각화 툴 세가지 - Torchviz, HiddenLayer, Netron (Model visualization) (2)	2022.01.13
[PyTorch/Tensorflow v1, v2] Gradient Clipping 추가하기 (0)	2022.01.12
[PyTorch] Scheduler 시각화하기 (Visualize scheduler) (2)	2021.11.24

현재글[PyTorch] Enable anomaly detection (torch.autograd.detect_anomaly() / torch.autograd.set_detect_anomaly(True))

🐬

Today :
Yesterday :

IBOK