Run Large Language Models Locally With *ollama.ai's* Docker Image

ollama is an open-source framework for self-hosting large language models (LLM) similar to ChatGPT or Google’s Bard. The official Docker image makes it painless to host any number of the supported models.

Shell script to get an Ollama model running #

The example “run_ollama_container.sh” script(s) below take the LLM model name as an argument (assigned to $LLMMODEL), but the pull and run docker exec commands can be called additional times to start other models as well within the same running container.

A subsequent post will describe using the Ollama REST API available at port 11434 that can be called from other applications.

Script to run Ollama with Nvidia GPU #

Use this script with an Nvidia GPU and the Nvidia Container Toolkit installed for the Docker container engine.

1
2
3
4
5
6
7
8
#!/bin/bash

LLMMODEL=$1

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

docker exec -i ollama ollama pull $LLMMODEL
docker exec -i ollama ollama run $LLMMODEL

Script to run Ollama with CPU-only #

The container can also be run without support for GPU sacrificing some performance with respect to the speed of responses.

1
2
3
4
5
6
7
8
#!/bin/bash

LLMMODEL=$1

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

docker exec -i ollama ollama pull $LLMMODEL
docker exec -i ollama ollama run $LLMMODEL

Call script for LLM model “mistral” #

1
bash run_ollama_container.sh mistral

Visit the library of available LLMs beyond the mistral model used here.