Clean and efficient FP8 GEMM kernels with fine-grained scaling
Port of Facebook's LLaMA model in C/C++
Implementation of "MobileCLIP" CVPR 2024
Official repository for LTX-Video
Flux 2 image generation model pure C inference
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation
Access to Anthropic's safety-first language model APIs
Z80-μLM is a 2-bit quantized language model
MiniMax-M2, a model built for Max coding & agentic workflows