OpenTinker is an RL-as-a-Service infrastructure for foundation models
Hunyuan Translation Model Version 1.5
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
Z80-μLM is a 2-bit quantized language model
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Ling is a MoE LLM provided and open-sourced by InclusionAI
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Multimodal Diffusion with Representation Alignment
Personalize Any Characters with a Scalable Diffusion Transformer
Ultra-Efficient LLMs on End Device
Long-form streaming TTS system for multi-speaker dialogue generation
General-purpose image editing model that delivers high-fidelity
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Generate Any 3D Scene in Seconds
Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
ICLR2024 Spotlight: curation/training code, metadata, distribution
A PyTorch library for implementing flow matching algorithms
PyTorch code and models for the DINOv2 self-supervised learning