部署llama3时出现RuntimeError: “triu_tril_cuda_template“ not implemented for ‘BFloat16‘报错
部署llama3时出现RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'报错
·
问题描述
在部署llama3模型时,遇到如下报错:
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
0%| | 0/175 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/stage2/auto/evaluation/C-eval/evaluate_zh2.py", line 206, in <module>
main()
File "/stage2/auto/evaluation/C-eval/evaluate_zh2.py", line 202, in main
ceval.run(args.shot, args.split)
File "/stage2/auto/evaluation/C-eval/evaluate_zh2.py", line 112, in run
result, acc = self.run_single_task(task_name, shot, split)
File "/stage2/auto/evaluation/C-eval/evaluate_zh2.py", line 139, in run_single_task
output = self.model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1989, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2932, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1141, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 914, in forward
causal_mask = self._update_causal_mask(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1038, in _update_causal_mask
causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
问题解决
原因分析:torch版本与代码不符,半精度的表示方法不对,参考资料:https://github.com/meta-llama/llama3/issues/80
将torch安装至2.2.2版本即可解决问题:
pip install torch==2.2.2
欢迎来到由智源人工智能研究院发起的Triton中文社区,这里是一个汇聚了AI开发者、数据科学家、机器学习爱好者以及业界专家的活力平台。我们致力于成为业内领先的Triton技术交流与应用分享的殿堂,为推动人工智能技术的普及与深化应用贡献力量。
更多推荐
已为社区贡献1条内容
所有评论(0)