cuda--docker
处理nvidia-smi执行后结果显示很慢的问题,安装fabric-manager。
https://zhuanlan.zhihu.com/p/632912924
需要安装cuda工具包
https://developer.nvidia.com/cuda-toolkit-archive
配置环境变量,如果是本地安装
export PATH=/usr/local/cuda-12.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
source <(kubectl completion bash)
export LANGUAGE="en_US.UTF-8"
export LANG=en_US:zh_CN.UTF-8
export LC_ALL=C
Dockerfile nvidia
FROM nvcr.io/nvidia/pytorch:24.06-py3
RUN pip install vllm openai sse_starlette -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip install peft transformers datasets accelerate deepspeed tensorboard \
fire packaging ninja openai gradio -i https://pypi.tuna.tsinghua.edu.cn/simple
处理nvidia-smi执行后结果显示很慢的问题,安装fabric-manager
version=535.54.03
yum -y install yum-utils nvidia-docker2
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager-${version}-1 nvidia-fabric-manager-devel-${version}-1
安装cuda
安装 nvidia驱动 nvidia-docker2
cuda:https://developer.nvidia.com/cuda-toolkit-archive
nvidid: https://download.nvidia.com/
cuda-rhel7.repo
cat cuda-rhel7.repo
[cuda-rhel7-x86_64]
name=cuda-rhel7-x86_64
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64
enabled=1
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/D42D0685.pub
亦庄
FROM nvcr.io/nvidia/pytorch:23.10-py3
RUN pip install --upgrade pip && \
pip install --no-cache-dir vllm==0.4.3 openai sse_starlette spacy torch typer torch-tensorrt torchdata torchtext torchvision weasel --upgrade --upgrade-strategy=only-if-needed -i https://pypi.tuna.tsinghua.edu.cn/simple
nvidia-docker有时候拉取不下来
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
yum install --downloadonly nvidia-docker2 --downloaddir=/tmp/nvidia
nvidia-fabric-manager 加快调用nvidia
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager-${version}-1
yum install -y nvidia-fabric-manager-devel-${version}-1
https://developer.aliyun.com/mirror/centos?spm=a2c6h.13651102.0.0.3e221b116j42Ya
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
yum -y install epel-release
[base]
baseurl=http://mirror.centos.org/centos/releasever/os/releasever/os/releasever/os/basearch/
mirrorlist=http://mirrorlist.centos.org/?release=KaTeX parse error: Expected 'EOF', got '&' at position 11: releasever&̲arch=basearch&repo=os
gpgcheck=1
gpgkey=file:///etc/pki/rpm-pgg/RPM-GPG-KEY-CentOS-6
[update]
baseurl=http://mirror.centos.org/centos/releasever/updates/releasever/updates/releasever/updates/basearch/
mirrorlist=http://mirrorlist.centos.org/?release=KaTeX parse error: Expected 'EOF', got '&' at position 11: releasever&̲arch=basearch&repo=updates
处理nvidia-smi执行后结果显示很慢的问题
version=535.54.03
yum -y install yum-utils nvidia-docker2
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager-version−1nvidia−fabric−manager−devel−{version}-1 nvidia-fabric-manager-devel-version−1nvidia−fabric−manager−devel−{version}-1
import torch
torch.cuda.is_available()
cuda_version = torch.version.cuda
print(f"CUDA version: {cuda_version}")
limits:
cpu: "16"
memory: 50Gi
tencent.com/vcuda-core: "800"
tencent.com/vcuda-memory: "32"
requests:
cpu: "16"
memory: 50Gi
tencent.com/vcuda-core: "800"
tencent.com/vcuda-memory: "32"
import torch
# 检查CUDA是否可用
torch.cuda.is_available()
# 获取CUDA版本
cuda_version = torch.version.cuda
print(f"CUDA version: {cuda_version}")
指定系统架构下载镜像
docker pull --platform linux/arm64

欢迎来到由智源人工智能研究院发起的Triton中文社区,这里是一个汇聚了AI开发者、数据科学家、机器学习爱好者以及业界专家的活力平台。我们致力于成为业内领先的Triton技术交流与应用分享的殿堂,为推动人工智能技术的普及与深化应用贡献力量。
更多推荐
所有评论(0)