Ollama - Browse /v0.11.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
sha256sum.txt	2025-08-05	1.1 kB	0
OllamaSetup.exe	2025-08-05	735.9 MB	5
Ollama.dmg	2025-08-05	46.4 MB	0
ollama-windows-arm64.zip	2025-08-05	21.6 MB	0
ollama-windows-amd64.zip	2025-08-05	1.3 GB	0
ollama-windows-amd64-rocm.zip	2025-08-05	257.5 MB	1
ollama-linux-arm64.tgz	2025-08-05	1.1 GB	0
ollama-linux-arm64-jetpack6.tgz	2025-08-05	362.3 MB	0
ollama-linux-arm64-jetpack5.tgz	2025-08-05	455.5 MB	0
ollama-linux-amd64.tgz	2025-08-05	1.3 GB	7
ollama-linux-amd64-rocm.tgz	2025-08-05	1.1 GB	0
Ollama-darwin.zip	2025-08-05	46.3 MB	0
ollama-darwin.tgz	2025-08-05	23.7 MB	0
README.md	2025-08-05	2.7 kB	0
v0.11.0 source code.tar.gz	2025-08-05	10.5 MB	0
v0.11.0 source code.zip	2025-08-05	10.8 MB	0
Totals: 16 Items		6.9 GB	13

ollama OpenAI gpt-oss

Welcome OpenAI's gpt-oss models

Ollama partners with OpenAI to bring its latest state-of-the-art open weight models to Ollama. The two models, 20B and 120B, bring a whole new local chat experience, and are designed for powerful reasoning, agentic tasks, and versatile developer use cases.

Feature highlights

Agentic capabilities: Use the models’ native capabilities for function calling, web browsing (Ollama is providing a built-in web search that can be optionally enabled to augment the model with the latest information), python tool calls, and structured outputs.
Full chain-of-thought: Gain complete access to the model's reasoning process, facilitating easier debugging and increased trust in outputs.
Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.

Quantization - MXFP4 format

OpenAI utilizes quantization to reduce the memory footprint of the gpt-oss models. The models are post-trained with quantization of the mixture-of-experts (MoE) weights to MXFP4 format, where the weights are quantized to 4.25 bits per parameter. The MoE weights are responsible for 90+% of the total parameter count, and quantizing these to MXFP4 enables the smaller model to run on systems with as little as 16GB memory, and the larger model to fit on a single 80GB GPU.

Ollama is supporting the MXFP4 format natively without additional quantizations or conversions. New kernels are developed for Ollama’s new engine to support the MXFP4 format.

Ollama collaborated with OpenAI to benchmark against their reference implementations to ensure Ollama’s implementations have the same quality.

Get started

You can get started by downloading the latest Ollama version (v0.11)

The model can be downloaded directly in Ollama’s new app or via the terminal:

:::ollama run gpt-oss:20b```

:::ollama run gpt-oss:120b```

What's Changed

kvcache: Enable SWA to retain additional entries by @jessegross in https://github.com/ollama/ollama/pull/11611
kvcache: Log contents of cache when unable to find a slot by @jessegross in https://github.com/ollama/ollama/pull/11658

Full Changelog: https://github.com/ollama/ollama/compare/v0.10.1...v0.11.0

Source: README.md, updated 2025-08-05

Ollama Files

Get up and running with Llama 2 and other large language models