关于以下Assetion failed错误的观察:../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: ...
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [15,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.RuntimeError: CUDA error
在使用PyTorch写代码时,可能会出现如下错误:
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [15,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [16,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [17,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [18,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [19,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
但这个错误是直接输出的,并不会抛异常,程序还继续跑,直到若干行之后突然冒出一个幺蛾子:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "....py", line 340, in batched_fusion
ref_intrinsics.inverse(),
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgetrfBatched( handle, n, dA_array, ldda, ipiv_array, info_array, batchsize)`
这是怎么回事呢?
既然是index out of bounds错误,那想必还是经典的数组越界问题了。你需要从抛异常(CUBLAS_STATUS_EXECUTION_FAILED
)的那一行往上检查,看哪里的运算出现了数组越界错误。不过,Python不应该会在数组越界时抛异常吗,怎么这里不抛了?
我做了一些实验:
>>> import torch
>>> x = torch.rand(5, 5, device='cuda')
>>> x[5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 5 is out of bounds for dimension 0 with size 5
>>> x[5, 5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 5 is out of bounds for dimension 0 with size 5
>>> x[torch.tensor(5)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 5 is out of bounds for dimension 0 with size 5
>>> x[torch.tensor([5])]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/zry/anaconda3/envs/mvsgs2/lib/python3.7/site-packages/torch/_tensor.py", line 427, in __repr__
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/home/zry/anaconda3/envs/mvsgs2/lib/python3.7/site-packages/torch/_tensor_str.py", line 637, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/home/zry/anaconda3/envs/mvsgs2/lib/python3.7/site-packages/torch/_tensor_str.py", line 568, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/home/zry/anaconda3/envs/mvsgs2/lib/python3.7/site-packages/torch/_tensor_str.py", line 328, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/home/zry/anaconda3/envs/mvsgs2/lib/python3.7/site-packages/torch/_tensor_str.py", line 116, in __init__
tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0)
RuntimeError: numel: integer multiplication overflow
>>> ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [4,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
我创建了在cuda上的tensor x
,先用单个数字访问,会抛异常;但把下标换成一个非标量的tensor,就会出现aten的Assertion failed了。
此后,我又把x放在CPU上,都正常地抛了异常:
>>> x = torch.rand(5, 5)
>>> x[torch.tensor([5])]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 5 is out of bounds for dimension 0 with size 5
所以,我的初步结论是:当访问的tensor在cuda上,且下标是非标量的tensor时,会不抛异常直接输出调试信息。(这么设计可能是因为用tensor做下标检查的工作量太大,会影响效率,所以干脆不检查了?)所以,应该检查用非标量的tensor作为下标的那几行代码。
欢迎来到由智源人工智能研究院发起的Triton中文社区,这里是一个汇聚了AI开发者、数据科学家、机器学习爱好者以及业界专家的活力平台。我们致力于成为业内领先的Triton技术交流与应用分享的殿堂,为推动人工智能技术的普及与深化应用贡献力量。
更多推荐
所有评论(0)