Local inference for Windows machines
Ollama is an open-source Windows utility that lets you run language models directly on your computer. It uses your local CPU/GPU to produce responses without sending data to an external LLM service. Results are reproducible, although machines with limited compute will produce outputs more slowly.
Models you can add
- Gemma
- Mistral
- Phi 3
- Llama 3
Getting started: install and run
To install and launch a model from the command line, use the Ollama command with the model name, for example ollama run llama3. The same pattern works for other models—replace the model identifier to fetch and start the one you want. Installation is typically straightforward and handled through the console.
How conversations are handled
By default, Ollama communicates through the Windows Command Prompt (cmd.exe). You can interact with the model via standard console input and see responses printed in the terminal.
Practical system recommendations
- Keep at least twice the model's size available as free disk space to ensure smooth installation and operation.
- Ensure you have sufficient RAM and CPU/GPU resources; limited hardware will still work but with reduced speed.
- Prefer running in an updated Windows terminal for better usability, though cmd.exe is the default.
Technical
- Windows
- Mac
- Web App
- Arabic
- Chinese (Simplified)
- Dutch
- English
- French
- German
- Italian
- Japanese
- Korean
- Polish
- Portuguese
- Russian
- Spanish
- Swedish
- Turkish
- Free