ollama is an open-source framework for self-hosting large language models (LLM) similar to ChatGPT or Google’s Bard. The official Docker image makes it painless to host any number of the supported models.
Shell script to get an Ollama model running #
The example “run_ollama_container.sh” script(s) below take the LLM model name as an argument (assigned to $LLMMODEL
), but the pull
and run
docker exec
commands can be called additional times to start other models as well within the same running container.
A subsequent post will describe using the Ollama REST API available at port 11434 that can be called from other applications.
Script to run Ollama with Nvidia GPU #
Use this script with an Nvidia GPU and the Nvidia Container Toolkit installed for the Docker container engine.
|
|
Script to run Ollama with CPU-only #
The container can also be run without support for GPU sacrificing some performance with respect to the speed of responses.
|
|
Call script for LLM model “mistral” #
|
|
Visit the library of available LLMs beyond the mistral model used here.