Local model serving - Using Docker model runner

In the local model serving landscape, we have already looked at Ollama and LM Studio. The other option I explored was Docker Model Runner (DMR). For any developer, Docker is a part of the workflow. The DMR makes running a local model as simple as running a container. This feature was introduced as a beta in the Docker Desktop 4.40 release. The key features of DMR include:

  • Serve models on OpenAI-compatible APIs
  • Pull and push models to and from Docker Hub.
  • Manage local models
  • Run and interact with models both from the command line and the Docker Desktop GUI.
  • Package model GGUF files as OCI artifacts and publish them to any container registry

To get started, you can install Docker Desktop or upgrade to a version above 4.40. This installs the docker-model CLI plugin.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
PS C:\> docker model
Usage:  docker model COMMAND

Docker Model Runner

Commands:
  df               Show Docker Model Runner disk usage
  inspect          Display detailed information on one model
  install-runner   Install Docker Model Runner (Docker Engine only)
  list             List the models pulled to your local environment
  logs             Fetch the Docker Model Runner logs
  package          Package a GGUF file into a Docker model OCI artifact, with optional licenses.
  ps               List running models
  pull             Pull a model from Docker Hub or HuggingFace to your local environment
  push             Push a model to Docker Hub
  requests         Fetch requests+responses from Docker Model Runner
  rm               Remove local models downloaded from Docker Hub
  run              Run a model and interact with it using a submitted prompt or chat mode
  status           Check if the Docker Model Runner is running
  tag              Tag a model
  uninstall-runner Uninstall Docker Model Runner
  unload           Unload running models
  version          Show the Docker Model Runner version

Run 'docker model COMMAND --help' for more information on a command.

This feature can be enabled or disabled using docker model CLI or Docker Desktop GUI.

Docker model catalog

You can run docker model pull command to pull a model locally. You can obtain a list of models on Docker Hub or use the Docker Desktop GUI to browse the model catalog.

Docker model catalog

1
2
3
PS C:\> docker model pull ai/llama3.2:3B-Q4_0
Downloaded 1.92GB of 1.92GB
Model pulled successfully

The model names follow the convention {model}:{parameters}-{quantization}.

To generate a response, you can use the docker model run command.

1
2
3
4
PS C:\> docker model run ai/llama3.2:3B-Q4_0 "In one sentence, what is a Llama?"
A llama is a domesticated mammal native to South America, closely related to alpacas, characterized by its long neck, soft fur, and distinctive ears.

Token usage: 45 prompt + 34 completion = 79 total

If you do not provide a prompt at the end of the command, an interactive chat session starts.

1
2
3
4
5
6
7
8
PS C:\> docker model run ai/llama3.2:3B-Q4_0
Interactive chat mode started. Type '/bye' to exit.
> In one sentence, what is a Llama?
A llama is a domesticated mammal native to South America, closely related to alpacas, characterized by its long neck, soft fur, and distinctive ears.

Token usage: 45 prompt + 34 completion = 79 total
> /bye
Chat session ended.

If you enable the host-side TCP support, you can use the DMR REST API programmatically to access and interact with the model.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "ai/llama3.2:3B-Q4_0",
    "messages": [
        {
            "role": "user",
            "content": "In one sentence, what is a Llama?"
        }
    ]
}'

Last updated: 30th September 2025