ds4.c is a specialized local inference engine created by antirez for running DeepSeek V4 Flash models directly on Apple Silicon hardware using Metal acceleration. Unlike general-purpose inference runtimes, the project is intentionally optimized for a specific model family, enabling highly efficient execution and simplified architecture. The engine includes DS4-specific model loading, KV cache management, prompt rendering, and OpenAI-compatible server APIs for local deployment workflows. Built as a native low-level implementation, it focuses on performance, reduced abstraction overhead, and direct integration with Apple GPU acceleration through Metal compute graphs. The project also supports streaming inference behavior and local API serving for integration with external tools and AI applications. Overall, ds4 represents a minimalist high-performance approach to running large language models locally without relying on heavyweight inference frameworks.
Features
- Local DeepSeek V4 Flash inference engine
- Metal-accelerated execution on Apple Silicon
- OpenAI-compatible API server support
- Specialized KV cache and prompt management
- Native lightweight runtime architecture
- Streaming local inference workflows