Language modeling in a sentence representation space
Renderer for the harmony response format to be used with gpt-oss
Diversity-driven optimization and large-model reasoning ability
Large Multimodal Models for Video Understanding and Editing
Large-language-model & vision-language-model based on Linear Attention
The official PyTorch implementation of Google's Gemma models
Foundation Models for Time Series
Phi-3.5 for Mac: Locally-run Vision and Language Models
Tiny vision language model
Programmatic access to the AlphaGenome model
Open Source Speech Language Model
Long-form streaming TTS system for multi-speaker dialogue generation
Qwen3-ASR is an open-source series of ASR models
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Multimodal embedding and reranking models built on Qwen3-VL
Z80-μLM is a 2-bit quantized language model
Implementation of "MobileCLIP" CVPR 2024
Video understanding codebase from FAIR for reproducing video models
Multimodal-Driven Architecture for Customized Video Generation
Personalize Any Characters with a Scalable Diffusion Transformer
General-purpose image editing model that delivers high-fidelity
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Easy Docker setup for Stable Diffusion with user-friendly UI
Inference script for Oasis 500M
Fast and Universal 3D reconstruction model for versatile tasks