首页/作品列表/Tensorflow Docker 学习环境安装
Tensorflow Docker 学习环境安装
17520 0
主要内容

概述

Ubuntu 18.04,NVidia 2070 GPU卡,Tensorflow环境使用Docker,使用国内镜像提速。
系统已经安装完成,并且 SSH 远程访问已配置。

检查环境

检查 GPU 显卡信息,你可以在NVidia网站上查到所用的GPU是 2070。
$ lspci | grep -i nvidia

01:00.0 VGA compatible controller: NVIDIA Corporation Device 1f02 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f9 (rev a1)
01:00.2 USB controller: NVIDIA Corporation Device 1ada (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1adb (rev a1)
检查当前 Linux 版本
$ uname -m && cat /etc/*release

x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04 LTS"
NAME="Ubuntu"
VERSION="18.04 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
如果你需要在机器上直接安装AI开发环境所需的CUDA,还需要检查一下 GCC 版本
$ gcc --version

gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
以及 Linux 内核
$ uname -r

4.15.0-20-generic

使用清华 Tuna 镜像


Ubuntu 的软件源配置文件是/etc/apt/sources.list。将系统自带的该文件做个备份,将该文件替换为下面内容,即可使用 TUNA 的软件源镜像。
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse

# 预发布软件源,不建议启用
# deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse
具体操作指令如下
$ cd /etc/apt
$ sudo cp sources.list sources.list.old
$ sudo vi sources.list

$ sudo apt-get update

Hit:1 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic InRelease
Hit:2 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic-updates InRelease
Hit:3 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic-backports InRelease
Hit:4 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic-security InRelease
Reading package lists... Done

安装GPU驱动

Ubuntu 18.04上要安装驱动NVidia的驱动。
运行一下 nvidia-smi 你看看,这说明 nvidia 驱动还没装。
$ nvidia-smi

nvidia-smi: command not found
如果已经安装,但是需要修改驱动版本时,可以用下面的命令清除旧版本驱动和 nouveau。
$ sudo apt-get purge nvidia*
$ sudo apt-get --purge remove xserver-xorg-video-nouveau
利用下面的的命令得出当前OS所支持的GPU Driver各版本,然而这个信息对我们并没有什么用,因为2070使用的驱动必须需要大于410版本。
$ sudo apt-cache search nvidia | grep -E "nvidia-[0-9]{3}"
nvidia-331 - Transitional package for nvidia-331
nvidia-331-dev - Transitional package for nvidia-340-dev
nvidia-331-updates - Transitional package for nvidia-340
nvidia-331-updates-dev - Transitional package for nvidia-340-dev
nvidia-331-updates-uvm - Transitional package for nvidia-340
nvidia-331-uvm - Transitional package for nvidia-340
nvidia-340-dev - NVIDIA binary Xorg driver development files
nvidia-340-updates - Transitional package for nvidia-340
nvidia-340-updates-dev - Transitional package for nvidia-340-dev
nvidia-340-updates-uvm - Transitional package for nvidia-340-updates
nvidia-340-uvm - Transitional package for nvidia-340
nvidia-346 - Transitional package for nvidia-346
nvidia-346-dev - Transitional package for nvidia-352-dev
nvidia-346-updates - Transitional package for nvidia-346-updates
nvidia-346-updates-dev - Transitional package for nvidia-352-updates-dev
nvidia-352 - Transitional package for nvidia-361
nvidia-352-dev - Transitional package for nvidia-361-dev
nvidia-352-updates - Transitional package for nvidia-361
nvidia-352-updates-dev - Transitional package for nvidia-361-dev
nvidia-361 - Transitional package for nvidia-367
nvidia-361-dev - Transitional package for nvidia-367-dev
nvidia-361-updates - Transitional package for nvidia-361
nvidia-361-updates-dev - Transitional package for nvidia-361-dev
nvidia-367 - Transitional package for nvidia-375
nvidia-367-dev - Transitional package for nvidia-375-dev
nvidia-375 - Transitional package for nvidia-384
nvidia-375-dev - Transitional package for nvidia-384-dev
nvidia-384 - Transitional package for nvidia-driver-390
nvidia-384-dev - Transitional package for nvidia-driver-390
xserver-xorg-video-nvidia-390 - NVIDIA binary Xorg driver
nvidia-340 - NVIDIA binary driver - version 340.107
从NVidia官网寻找所需的驱动
参考下图选择所需的Driver。
得知最新版本是418.56
这里附下载地址,可以使用 wget下载。
$ wget https://us.download.nvidia.cn/XFree86/Linux-x86_64/418.56/NVIDIA-Linux-x86_64-418.56.run

--2019-03-27 12:16:38--  https://us.download.nvidia.cn/XFree86/Linux-x86_64/418.56/NVIDIA-Linux-x86_64-418.56.run
Resolving us.download.nvidia.cn (us.download.nvidia.cn)... 60.9.4.131, 60.9.4.139, 60.9.4.142
Connecting to us.download.nvidia.cn (us.download.nvidia.cn)|60.9.4.131|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 107195640 (102M) [application/octet-stream]
Saving to: 'NVIDIA-Linux-x86_64-418.56.run.1'

4-418.56.run.1            2%[                               ]   2.26M   340KB/s    eta 4m 45s
更新:20190805, 最新驱动
$ wget http://us.download.nvidia.com/XFree86/Linux-x86_64/430.40/NVIDIA-Linux-x86_64-430.40.run 

下载完成后,开始安装驱动,安装过程中会弹出字符交互界面,根据提示选择一下即可。
$ chmod +x NVIDIA-Linux-x86_64-418.56.run
$ sudo ./NVIDIA-Linux-x86_64-418.56.run
安装结束之后重启系统
$ sudo reboot
重启完成后应该能够看到GPU已经安装成功
$ nvidia-smi

Wed Mar 27 12:21:16 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   35C    P8     3W / 175W |    323MiB /  7949MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1164      G   /usr/lib/xorg/Xorg                            18MiB |
|    0      1196      G   /usr/bin/gnome-shell                          70MiB |
|    0      1511      G   /usr/lib/xorg/Xorg                            96MiB |
|    0      1639      G   /usr/bin/gnome-shell                         137MiB |
+-----------------------------------------------------------------------------+

安装Tensorflow Docker容器

Docker 使用容器创建虚拟环境,以便将 TensorFlow 安装与系统的其余部分隔离开来。TensorFlow 程序在此虚拟环境中运行,该环境能够与其主机共享资源(访问目录、使用 GPU、连接到互联网等)。系统会针对每个版本测试 TensorFlow Docker 映像。
Docker 是在 Linux 上启用 TensorFlow GPU 支持的最简单方法,因为只需在主机上安装 NVIDIA® GPU 驱动程序(无需安装 NVIDIA® CUDA® 工具包)。
TensorFlow Docker 要求
  • 在本地主机上安装 Docker。
  • 要在 Linux 上启用 GPU 支持,请安装 nvidia-docker。

安装Docker

如果你过去安装过 docker,先删掉:
$ sudo apt-get remove docker docker-engine docker.io
首先安装依赖:
$ sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common
安装信任 Docker 的 GPG 公钥:
$ curl -fsSL https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add -

OK

$ sudo apt-key fingerprint 0EBFCD88

pub   rsa4096 2017-02-22 [SCEA]
      9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
uid           [ unknown] Docker Release (CE deb) <docker@docker.com>
sub   rsa4096 2017-02-22 [S]
在此处选择和自己系统对应的源添加。
$ sudo add-apt-repository "deb [arch=amd64] https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

Hit:1 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic InRelease
Hit:2 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic-updates InRelease
Hit:3 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic-backports InRelease
Hit:4 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic-security InRelease
Get:5 https://download.docker.com/linux/ubuntu bionic InRelease [64.4 kB]
Get:6 https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages [5195 B]
Fetched 69.6 kB in 1s (46.7 kB/s)
Reading package lists... Done
如果不能翻墙,也可以将下载源修改为 https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/ubuntu
$ sudo add-apt-repository "deb [arch=amd64] https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
更新一下,开始安装
$ sudo apt-get update
$ sudo apt-get -y install docker-ce docker-ce-cli containerd.io
启动 Docker Domain
$ sudo systemctl enable docker

Synchronizing state of docker.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable docker
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = "zh_CN:zh",
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "zh_CN.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("zh_CN.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = "zh_CN:zh",
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "zh_CN.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("zh_CN.UTF-8").

$ sudo systemctl start docker
$ sudo usermod -aG docker $USER
退出,重新登录,就能够看到 Docker 已经安装好了。
$ docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

安装 nvidia-docker

安装条件

  1. GNU/Linux x86_64 with kernel version > 3.10
  2. Docker >= 1.12
  3. NVIDIA GPU with Architecture > Fermi (2.1)
  4. NVIDIA drivers~= 361.93 (untested on older versions)

删除 nvidia-docker 1.0

$ docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f

$ sudo apt-get purge nvidia-docker

Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package nvidia-docker

安装步骤

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

OK

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

$ sudo apt-get update
Hit:1 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic InRelease
Hit:2 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic-updates InRelease
Hit:3 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic-backports InRelease
Hit:4 https://mirrors.tuna.tsinghua.edu.cn/ubuntu bionic-security InRelease
Hit:5 https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/ubuntu bionic InRelease
Get:6 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  InRelease [1139 B]
Get:7 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  InRelease [1136 B]
Get:8 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  InRelease [1129 B]
Get:9 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  Packages [4076 B]
Get:10 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  Packages [2320 B]
Get:11 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages [1972 B]
Fetched 11.8 kB in 3s (4016 B/s)
Reading package lists... Done

$ sudo apt-get install -y nvidia-docker2

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libnvidia-container-tools libnvidia-container1 nvidia-container-runtime
  nvidia-container-runtime-hook
The following NEW packages will be installed:
  libnvidia-container-tools libnvidia-container1 nvidia-container-runtime
  nvidia-container-runtime-hook nvidia-docker2
0 upgraded, 5 newly installed, 0 to remove and 397 not upgraded.
...

$ sudo pkill -SIGHUP dockerd

检查当前支持的CUDA

还挺好,目前的CUDA版本都支持

验证

验证启动不同版本CUDA
$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi

Unable to find image 'nvidia/cuda:10.1-base' locally
10.1-base: Pulling from nvidia/cuda
Digest: sha256:c73daa5f6b8d6c972bdd5f0f29471fe26e09d00cd9522976620d2ec395e19e89
Status: Downloaded newer image for nvidia/cuda:10.1-base
Wed Mar 27 05:29:57 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   34C    P8     1W / 175W |    111MiB /  7949MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

$ docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi
$ docker run --runtime=nvidia --rm nvidia/cuda:9.2-base nvidia-smi
$ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

运行Tensorflow Docker

Pull一下所需的Tensorflow Docker Image,从下面的链接中获得对应的tag。
以下命令会将 TensorFlow 版本映像下载到计算机上:
$ docker pull tensorflow/tensorflow:1.8.0-gpu-py3
$ docker pull tensorflow/tensorflow:1.13.1-gpu-py3-jupyter
启动 TensorFlow Docker 容器检查一下
$ docker run --runtime=nvidia -it --rm tensorflow/tensorflow:1.8.0-gpu-py3 python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"

$ docker run --runtime=nvidia -it --rm tensorflow/tensorflow:1.13.1-gpu-py3-jupyter python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
设置支持 GPU 的映像可能需要一段时间。
完成收工。
所需硬件
暂无数据!
代码展示
暂无数据!
附件下载
暂无数据!
0
大牛,别默默的看了,快登录帮我点评一下吧!

立即注册