Run large and small language models directly on Windows
Foundry Local is a Windows utility that lets you run small and large language models on your own machine. It uses Azure AI Foundry to enable on-device inference, ensuring model execution and data handling remain local to your hardware. That design helps reduce exposure of sensitive information and strengthens data protection for users who prioritize privacy.
Core capabilities
- Optimizes use of available processors and accelerators (standard CPUs as well as GPU and NPU setups) to improve runtime performance.
- Provides an interface compatible with the OpenAI-style API, making integration with existing tools and workflows straightforward.
- Keeps all processing on the host system so data does not leave your device, enhancing confidentiality and control.
- Uses Azure AI Foundry under the hood to support running larger models while managing resources efficiently.
Who benefits from this tool
Foundry Local is suitable for developers, hobbyists, and researchers who want to experiment with machine learning locally without depending on cloud-hosted inference. It’s useful when you need low-latency responses, want to test models against private datasets, or prefer to avoid transmitting data off-device.
An alternative for transferring model files
If your main requirement is moving models, weights, or datasets between devices rather than on-device inference, consider SHAREit (Free). It provides fast local file transfer and can simplify deploying model artifacts to target machines before running them with a local runtime like Foundry Local.
Setup pointers
- Verify your system drivers and GPU/NPU runtime libraries are up to date to get the best acceleration.
- Start with smaller models to validate your pipeline before scaling up to larger ones.
- Keep an eye on storage and memory usage when loading big models; offloading swap or using model quantization can help.
Technical
- Windows
- Free