新手教学:Centos7 + NvidiaTitanX + cuda9.0 + cudnn7.3 + python3.6.8 + TensorFlow1.12.0
新手教学:Cenos7 + NvidiaTitanX + cuda9.0 + cudnn7.4 + python3.6.8 + TensorFlow1.12.01、安装Cenos7参考:U盘制作CentOS启动盘参考:解决CentOS7 用U盘安装卡住 无法进入安装界面参考:Centos 7.4 1708 系统安装教程2、安装依赖包(1)先转到root,避免频繁输入sudo(2)再安装更新(...
新手教学:Centos7 + NvidiaTitanX + cuda9.0 + cudnn7.4 + python3.6.8 + TensorFlow1.12.0
0、可能用到的技术
(1)参考:挂载u盘
https://blog.csdn.net/ido1ok/article/details/79620746
1、 安装Centos7,联网
(1)参考:U盘制作CentOS启动盘
https://jingyan.baidu.com/article/a681b0de5e33d03b1843460f.html
(2)参考:解决CentOS7 用U盘安装卡住 无法进入安装界面
https://blog.csdn.net/qq_39996062/article/details/79328540
(3)参考:Centos 7.4 1708 系统安装教程
http://baijiahao.baidu.com/s?id=1599601257937774752&wfr=spider&for=pc
(4)参考:PPPOE拨号上网
https://www.jianshu.com/p/43b10aff69ae
(5)参考:路由上网
https://jingyan.baidu.com/article/19192ad8f7c320e53e570728.html
2、 安装依赖包
(1) 先转到root,避免频繁输入sudo
ps.#代表root权限
$ su root
输入密码
(2)再安装更新
# yum -y update
漫长的等待。。。如果报错/var/run/yum.id已被锁定,解决办法:
# rm -rf /var/run/yum.pid
(3)依次更新
# yum -y install kernel-devel
# yum -y install epel-release
# yum -y install dkms
# yum -y install gcc-c++
# yum -y install gcc kernel-devel kernel-headers
3、 检测显卡驱动和型号并安装
(1) 先添加ELPepo源
# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
(2) NVIDIA驱动检测
# yum install nvidia-detect
# nvidia-detect -v
[root@localhost ripper]# nvidia-detect -v
Probing for supported NVIDIA devices…
[10de:17c2] NVIDIA Corporation GM200 [GeForce GTX TITAN X]
This device requires the current 418.74 NVIDIA driver kmod-nvidia
显卡驱动都是 418.74,登录NVIDIA官网 http://www.geforce.cn/drivers 设置驱动检索条件GeForce GTX TITAN X(注意尽量设置语言英文):
(3) NVIDIA驱动下载
检索结果出现418.74,点击下载获取下载链接
http://us.download.nvidia.com/XFree86/Linux-x86_64/418.74/NVIDIA-Linux-x86_64-418.74.run
检索结果出现xxx.xx,点击下载获取下载链接
http://us.download.nvidia.com/XFree86/Linux-x86_64/xxx.xx/NVIDIA-Linux-x86_64-xxx.xx.run
或者ssh到服务器,下载驱动到/downloads目录下:
创建 /downloads目录
# mkdir /downloads
跳转到 /downloads目录
# cd /downloads
下载,xxx.xx是你搜索到的版本
wget -r -np -nd https://us.download.nvidia.com/XFree86/Linux-x86_64/418.74/NVIDIA-Linux-x86_64-418.74.run
wget -r -np -nd https://us.download.nvidia.com/XFree86/Linux-x86_64/xxx.xx/NVIDIA-Linux-x86_64-xxx.xx.run
(4) 先解决自带nouveau驱动冲突问题,先检测一下
因为NVIDIA驱动会和系统自带nouveau驱动冲突,执行命令查看该驱动状态
# lsmod | grep nouveau
出现类似下图,是正常的。
(5) 解决显卡冲突
To be more specific
修改配置文件,如果报错# yum install vim
# vim /etc/default/grub
从,按a输入,输入后,按esc,输入:wq
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
修改到
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb rd.driver.blacklist=nouveau nouveau.modeset=0 quiet"
GRUB_DISABLE_RECOVERY="true"
~
~
:wq
生成配置
# grub2-mkconfig -o /boot/grub2/grub.cfg
修改文件,按a输入,在空文件中输入blacklist nouveau后,按esc,输入:wq
# vim /etc/modprobe.d/blacklist.conf
移动文件
# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
更新配置
# dracut /boot/initramfs-$(uname -r).img $(uname -r)
重启
reboot
再检测,应该返回空
# lsmod | grep nouveau
(6) 安装显卡驱动,在/downloads下应该存在NVIDIA-Linux-x86_64-xxx.xx.run文件
# cd /downloads
# chmod +x NVIDIA-Linux-x86_64-xxx.xx.run (以实际包名为准) 验证:通过ls 命令查看,包名高亮显示即可
# sh NVIDIA-Linux-x86_64-xxx.xx.run
类似下图,默认有错误
如果报错ERROR: You appear to be running an X server; please exit X before installing.,注销用户。
然后同时按键Ctrl、Alt和F2 键。
Localhostlogin: admin (以实际包名为准)
Password:
切换到根权限
[admin@localhost~]$ su root
输入init3进入文本模式
[root@localhost ~]# init 3
找到NVIDIA-Linux-x86_64-295.53.run所在的文件夹
[root@localhost ~]# cd /downloads
[root@localhost admin]# ls
NIVIDIA-Linux-x86_64-295.53.run
高亮状态下,运行安装文件
[root@localhost admin]# sh NIVIDIA-Linux-x86_64-xxx.xx.run
恢复原有图形模式
# init 5
(6) 检测安装状态
# nvidia-smi
类似下图
4、安装Cuda9.0
(1)下载Cuda,爱莫能助
官网下载cuda-rpm包 https://developer.nvidia.com/cuda-downloads ,一定要对应自己的版本。
下载cuda9.0,网址 https://developer.nvidia.com/cuda-90-download-archive
(2)安装Cuda9.0
把文件放在/downloads目录下
# cd /downloads
命令:chmod +x cuda_8.0.44_linux.run (以实际包名为准)
# chmod +x cuda_9.0.176_384.81_linux.run
验证:通过ls 命令查看,包名高亮显示即可
# ls
命令:sh cuda_8.0.44_linux.run (以实际包名为准)
# sh cuda_9.0.176_384.81_linux.run
一直回车Enter,直到出现以下疑问。注意第二个填n
(3)验证Cuda9.0
cuda-x.x视情况而定
# cd /usr/local/cuda-x.x/samples/1_Utilities/deviceQuery
# make
# ./deviceQuery
结果
[root@localhost deviceQuery]# ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX TITAN X"
CUDA Driver Version / Runtime Version 10.1 / 9.0
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 12210 MBytes (12802916352 bytes)
(24) Multiprocessors, (128) CUDA Cores/MP: 3072 CUDA Cores
GPU Max Clock rate: 1076 MHz (1.08 GHz)
Memory Clock rate: 3505 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 3145728 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
(4)cuda添加到bashprofile中
方法1
# vim .bashprofile
按a输入,输入后,按esc,输入:wq
PATH=$PATH:$HOME/bin:/usr/local/cuda/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/
CUDA_HOME=/usr/local/cuda
export PATH
export LD_LIBRARY_PATH
export CUDA_HOME
使环境变量立即生效
# source .bashprofile
方法2 貌似不太好用
# vim /etc/profile
按a输入,添加后,按esc,输入:wq
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
使环境变量立即生效
# source /etc/profile ;
(5)检验cuda添加到bashprofile中
查看nvcc版本号c
# nvcc -V
# nvcc --version
结果
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
# cuda ; 按两下 tab 键
结果
cudafe cuda-gdb cuda-install-samples-9.0.sh
cudafe++ cuda-gdbserver cuda-memcheck
5、安装cudnn7.3
官网教学
(1) 先看版本
参考 tensorflow各个版本的CUDA以及Cudnn版本对应关系
https://blog.csdn.net/qq_27825451/article/details/89082978
参考 tar -xzvf cudnn-9.0-linux-x64-v7.tgz
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-linux
参考 cudnn-8.0/9.0/10.0-linux-x64-v6.0/7.0/7.1/7.2/7.3/7.4.tgz下载
https://blog.csdn.net/xiangxianghehe/article/details/79177833
参考 cudnn7.3下载 for cuda9
https://download.csdn.net/download/godfyun/10682330
(2) 下载cudnn7.3
# cd /downloads
# wget http://developer.download.nvidia.com/compute/redist/cudnn/v7.3.0/cudnn-9.0-linux-x64-v7.3.0.29.tgz
(3) 安装cudnn7.3
# cd /downloads
# tar -xvzf cudnn-8.0-linux-x64-v6.0.tgz
(4) 复制cudnn7.3
# cd /downloads/cuda
# cp include/* /usr/local/cuda/include
# cp lib64/* /usr/local/cuda/lib64
# chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
6、安装Python3.6.8
参考: Linux安装python3.6
https://www.cnblogs.com/kimyeee/p/7250560.html
(1)查看是否已经安装Python
CentOS 7默认安装了python2.7,因为一些命令要用它比如yum它使用的是python2.7。
# python -V //查看一下是否安装Python
# which python //查看一下Python可执行文件的位置
(2)先安装相关包
# yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make
# yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
(3)下载Python包
官网下载编译安装包或者直接执行以下命令下载/downloads
# cd /downloads
# wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz
(4)解压Python
# cd /downloads
# tar -zxvf Python-3.6.8.tgz
(5)编译安装Python
# cd Python-3.6.8 //切换进入
# ./configure prefix=/usr/local/python3 //编译安装
make
# make
make install
# make install
安装完毕,/usr/local/目录下就会有python3
(6)软链到执行目录下/usr/bin
ln -s /usr/local/python3/bin/python3 /usr/bin/python3 //添加软链到执行目录下/usr/bin
(7)python添加到PATH
修改
# vim ~/.bash_profile
从
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
按a输入,输入后,按esc,输入:wq
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin:/usr/local/python3/bin
export PATH
~
:wq
(8)使环境变量立即生效
# source ~/.bash_profile
(9)检测python
# python3 -V //查看输出的是python3
# python2 -V //查看输出的是python2
结果
Python 3.6.8 (default, Jun 5 2019, 17:45:35)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
[3]+ 已停止 python3
7、TensorFlow1.12.0
(1)首先,pip工具安装
首先检查有没有安装python-pip包,直接执行:
# yum install python-pip
没有python-pip包就执行命令:
# yum -y install epel-release
执行成功之后,再次执行:
# yum install python-pip
对安装好的pip进行升级:
# pip install --upgrade pip
实在不行,还可以附源码安装 pip
# wget --no-check-certificate https://github.com/pypa/pip/archive/9.0.1.tar.gz # 下载源代码
# tar -zvxf 9.0.1 -C pip-9.0.1 # 解压文件
# cd pip-9.0.1
# python3 setup.py install # 使用 Python 3 安装
# sudo ln -s /usr/local/python3/bin/pip /usr/bin/pip3 #创建链接
# pip install --upgrade pip # 升级 pip
(2)用清华源下载tensorflow-gpu==1.12.0
pip3 install tensorflow-gpu==1.12.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
(3)测试TensorFlow
输入
# python3
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello)) //print注意语法()
结果
[root@localhost Python-3.6.8]# python3
Python 3.6.8 (default, Jun 5 2019, 17:45:35)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2019-06-05 18:08:39.292310: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-05 18:08:39.352566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-05 18:08:39.353113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:01:00.0
totalMemory: 11.92GiB freeMemory: 11.67GiB
2019-06-05 18:08:39.353137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-05 18:08:39.602924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-05 18:08:39.602960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-05 18:08:39.602984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-05 18:08:39.603087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11292 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0, compute capability: 5.2)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
到此全部结束,enjoy TensorFlow!
8、后续操作
(1)参考:创建用户
https://blog.csdn.net/xudailong_blog/article/details/80518266

欢迎来到由智源人工智能研究院发起的Triton中文社区,这里是一个汇聚了AI开发者、数据科学家、机器学习爱好者以及业界专家的活力平台。我们致力于成为业内领先的Triton技术交流与应用分享的殿堂,为推动人工智能技术的普及与深化应用贡献力量。
更多推荐
所有评论(0)