Foundation Models for Time Series
A PyTorch library for implementing flow matching algorithms
State-of-the-art TTS model under 25MB
Python inference and LoRA trainer package for the LTX-2 audio–video
Capable of understanding text, audio, vision, video
Official repository for LTX-Video
A Systematic Framework for Interactive World Modeling
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Generate Any 3D Scene in Seconds
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
This repository contains the official implementation of FastVLM
RGBD video generation model conditioned on camera input
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Large Multimodal Models for Video Understanding and Editing
DeepMind model for tracking arbitrary points across videos & robotics
Qwen3-omni is a natively end-to-end, omni-modal LLM
Inference script for Oasis 500M
Open-weight, large-scale hybrid-attention reasoning model
Implementation of "MobileCLIP" CVPR 2024
Sharp Monocular Metric Depth in Less Than a Second
ICLR2024 Spotlight: curation/training code, metadata, distribution
Research code artifacts for Code World Model (CWM)
Pokee Deep Research Model Open Source Repo
Implementation of the Surya Foundation Model for Heliophysics