What is the NVIDIA Container Toolkit?

The NVIDIA Container Toolkit is a set of utilities and libraries provided by NVIDIA to enable containers to use and share NVIDIA GPU resources. The toolkit is container agnostic, supporting several common container runtimes including Docker and Podman.

It appears to present GPU/CUDA features to software running within containers on the host with the NVIDIA GPU. The supported platforms seem to be specifically GNU/Linux distributions (Debian, Ubuntu, Centos, Fedora, Open Suse, and Amazon’s Amazon Linux AMIs) on typical amd64, arm64/aarch64 hardware.

NVIDIA Container Toolkit Architecture Chart

NVIDIA GPU Operator #

This piece extends container runtimes to connect GPU containers to NVIDIA devices on a compatible host. This connection is done in a managed way by NVIDIA middleware rather than direct device pass-through or other ad-hoc runtime-specific ways. It also enables more automated management and provisioning of GPU containers in larger container orchestration clusters.

Installation #

Ensure that the system is equipped with NVIDIA GPU hardware and that the appropriate drivers have been installed in the host OS.

Similar procedures are available for yum/zypper package-based systems at Installing the NVIDIA Container Toolkit Docs, but Debian/Ubuntu examples are displayed here.

Setup apt Repository #

1
2
3
4
5
6
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
  && \
    sudo apt-get update

Install with apt #

1
sudo apt install nvidia-container-toolkit

Be aware that these NVIDIA tools need HTTP access for the installation and configuration of a host to download appropriate container images, libraries, and drivers.

Configuration for Docker #

Use nvidia-ctk utility #

The nvidia-ctk command will add some settings to the Docker daemon’s config file, so after configuring the runtime, the Docker daemon needs to be restarted. The command just requires specifying the appropriate container runtime, and then it knows what and how to do the configuration.

1
2
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Check the Installation and Configuration #

Check that everything is configured correctly and that the GPU device is available to the container with this command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

# Unable to find image 'ubuntu:latest' locally
# latest: Pulling from library/ubuntu
# aece8493d397: Already exists 
# Digest: sha256:2b7412e6465c3c7fc5bb21d3e6f1917c167358449fecac8176c6e496e5c1f05f
# Status: Downloaded newer image for ubuntu:latest
# Thu Nov 30 18:15:12 2023       
# +---------------------------------------------------------------------------------------+
# | NVIDIA-SMI 535.113.01             Driver Version: 535.113.01   CUDA Version: 12.2     |
# |-----------------------------------------+----------------------+----------------------+
# | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
# | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
# |                                         |                      |               MIG M. |
# |=========================================+======================+======================|
# |   0  NVIDIA GeForce GTX 1060 3GB    Off | 00000000:08:00.0  On |                  N/A |
# | 30%   31C    P2              29W / 120W |    986MiB /  3072MiB |      4%      Default |
# |                                         |                      |                  N/A |
# +-----------------------------------------+----------------------+----------------------+
                                                                                         
# +---------------------------------------------------------------------------------------+
# | Processes:                                                                            |
# |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
# |        ID   ID                                                             Usage      |
# |=======================================================================================|
# +---------------------------------------------------------------------------------------+

This should identify the host’s NVIDIA device(s) details and system drivers without any issues. In this case, it finds the test system’s (older) GeForce GTX 1060 3GB correctly.