FlashMLA: Efficient Multi-head Latent Attention Kernels
A SOTA open-source image editing model
Fine-tuning ChatGLM-6B with PEFT
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Code release for ConvNeXt V2 model
A latent text-to-image diffusion model
Open agentic coding model optimized for local deployment
4-bit Command A+ model for enterprise agents and multilingual tasks
Frontier-scale 675B multimodal base model for custom AI training
Fast uncensored Gemma model optimized for local chat and coding
Lightweight multimodal translation model for 55 languages
OpenAI’s compact 20B open model for fast, agentic, and local use
OpenAI’s open-weight 120B model optimized for reasoning and tooling
Lightweight on-device model for private AI text redaction
Efficient MoE reasoning model for coding and math workloads
Compact 8B multimodal instruct model optimized for edge deployment
High-performance MoE model with MLA, MTP, and multilingual reasoning
Quantized 675B multimodal instruct model optimized for NVFP4
Small 3B-base multimodal model ideal for custom AI on edge hardware
High-precision 14B multimodal model built for advanced reasoning tasks
Efficient 14B multimodal instruct model with edge deployment and FP8
Frontier-scale 675B multimodal instruct MoE model for enterprise AIMis
Compact 3B-param multimodal model for efficient on-device reasoning
Versatile 8B-base multimodal LLM, flexible foundation for custom AI
Powerful 14B-base multimodal model — flexible base for fine-tuning