Local model serving - Using Docker model runner

In the local model serving landscape, we have already looked at Ollama and LM Studio. The other option I explored was Docker Model Runner (DMR). For any developer, Docker is a part of the workflow. The DMR makes running a local model as simple as running a container. This feature was introduced as a beta in the Docker Desktop 4.40 release. The key features of DMR include:

Serve models on OpenAI-compatible APIs
Pull and push models to and from Docker Hub.
Manage local models
Run and interact with models both from the command line and the Docker Desktop GUI.
Package model GGUF files as OCI artifacts and publish them to any container registry

To get started, you can install Docker Desktop or upgrade to a version above 4.40. This installs the docker-model CLI plugin.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
PS C:\> docker model
Usage:  docker model COMMAND

Docker Model Runner

Commands:
  df               Show Docker Model Runner disk usage
  inspect          Display detailed information on one model
  install-runner   Install Docker Model Runner (Docker Engine only)
  list             List the models pulled to your local environment
  logs             Fetch the Docker Model Runner logs
  package          Package a GGUF file into a Docker model OCI artifact, with optional licenses.
  ps               List running models
  pull             Pull a model from Docker Hub or HuggingFace to your local environment
  push             Push a model to Docker Hub
  requests         Fetch requests+responses from Docker Model Runner
  rm               Remove local models downloaded from Docker Hub
  run              Run a model and interact with it using a submitted prompt or chat mode
  status           Check if the Docker Model Runner is running
  tag              Tag a model
  uninstall-runner Uninstall Docker Model Runner
  unload           Unload running models
  version          Show the Docker Model Runner version

Run 'docker model COMMAND --help' for more information on a command.

This feature can be enabled or disabled using docker model CLI or Docker Desktop GUI.

Docker model catalog

You can run docker model pull command to pull a model locally. You can obtain a list of models on Docker Hub or use the Docker Desktop GUI to browse the model catalog.

Docker model catalog

1
2
3
PS C:\> docker model pull ai/llama3.2:3B-Q4_0
Downloaded 1.92GB of 1.92GB
Model pulled successfully

The model names follow the convention {model}:{parameters}-{quantization}.

To generate a response, you can use the docker model run command.

1
2
3
4
PS C:\> docker model run ai/llama3.2:3B-Q4_0 "In one sentence, what is a Llama?"
A llama is a domesticated mammal native to South America, closely related to alpacas, characterized by its long neck, soft fur, and distinctive ears.

Token usage: 45 prompt + 34 completion = 79 total

If you do not provide a prompt at the end of the command, an interactive chat session starts.

1
2
3
4
5
6
7
8
PS C:\> docker model run ai/llama3.2:3B-Q4_0
Interactive chat mode started. Type '/bye' to exit.
> In one sentence, what is a Llama?
A llama is a domesticated mammal native to South America, closely related to alpacas, characterized by its long neck, soft fur, and distinctive ears.

Token usage: 45 prompt + 34 completion = 79 total
> /bye
Chat session ended.

If you enable the host-side TCP support, you can use the DMR REST API programmatically to access and interact with the model.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "ai/llama3.2:3B-Q4_0",
    "messages": [
        {
            "role": "user",
            "content": "In one sentence, what is a Llama?"
        }
    ]
}'

Last updated: 30th September 2025