Ollama Local LLM Runner

The standard tool for running open-source LLMs locally. One command to download and run any model. OpenAI-compatible API on localhost:11434. Free, private, offline.

Core Concept

Ollama is the “Docker for LLMs” — model download, execution, and API exposure in a single command:

ollama run gemma4        # Google's latest (April 2026)
ollama run qwen3-coder   # Best open-source coding model
ollama run llama3.2      # Meta's general-purpose
ollama run deepseek-v3   # Strong reasoning

Why It Matters for Production

  • Zero cost after hardware — no API key, no subscription, no per-token billing
  • Fully private — data never leaves the machine
  • OpenAI-compatible API — drop-in replacement for any OpenAI SDK integration
  • Model switching — swap models without code changes
# Switch from OpenAI to local — one line change
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

Model Availability (April 2026)

See Open Source LLM Landscape 2026 for detailed model comparison. Key Ollama-supported models:

ModelCommandActive ParamsBest For
Gemma 4 26Bollama pull gemma43.8B (MoE)Best cost/quality ratio
Gemma 4 E4Bollama pull gemma4:e4b4BEdge/mobile
Qwen3-Coderollama pull qwen3-coder3.3B (MoE)Agentic coding
Llama 3.2ollama pull llama3.23B-90BGeneral purpose
DeepSeek-V3ollama pull deepseek-v3~37BReasoning

Caveat: Qwen 3.5 vision GGUF doesn’t work in Ollama (mmproj file issue). Use llama.cpp backends for Qwen 3.5 multimodal.

Hardware Guide

RAMWhat You Can Run
4GBGemma 4 E2B, TinyLlama
8GBGemma 4 E4B, Mistral 7B, Qwen3-Coder (3.3B active)
16GBGemma 4 26B MoE, Llama 3.2 8B
32GB+Larger dense models (13B-30B)
64GB+70B+ models

Rule of thumb: RAM ÷ 2 ≈ max model parameter size. GPU optional but dramatically faster (NVIDIA, AMD, Apple Silicon).

Custom Models (Modelfile)

FROM gemma4
SYSTEM "You are a Python coding assistant. Be concise."
PARAMETER temperature 0.3
PARAMETER num_ctx 8192
ollama create my-coder -f Modelfile
ollama run my-coder

Integration Points

  • n8n Workflow Engine: HTTP Request node → localhost:11434/api/generate
  • LangChain/LangGraph: ChatOllama provider
  • Any OpenAI SDK: Change base_url only
  • Docker compose: Sidecar container alongside your app

Installation

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
 
# Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama
 
# Windows — installer from ollama.com

Sources