State-of-the-art (SoTA) text-to-video pre-trained model
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Phi-3.5 for Mac: Locally-run Vision and Language Models
An Efficient Agentic Model for Computer Use
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Industrial-level controllable zero-shot text-to-speech system
Provides convenient access to the Anthropic REST API from any Python 3
Recovering the Visual Space from Any Views
Python SDK for Claude Agent
A trainable PyTorch reproduction of AlphaFold 3
CodeGeeX2: A More Powerful Multilingual Code Generation Model
PyTorch code and models for the DINOv2 self-supervised learning
tiktoken is a fast BPE tokeniser for use with OpenAI's models
GPT4V-level open-source multi-modal model based on Llama3-8B
MOSS‑TTS Family open‑source speech and sound generation model
Collection of Gemma 3 variants that are trained for performance
Generating Immersive, Explorable, and Interactive 3D Worlds
Robust Speech Recognition Across Languages, Dialects
Accurate × Fast × Comprehensive
HY-Motion model for 3D character animation generation
Open-source framework for intelligent speech interaction
Repo for SeedVR2 & SeedVR
Renderer for the harmony response format to be used with gpt-oss
Implementation of the Surya Foundation Model for Heliophysics
Project Lyra: Open Generative 3D World Models