Qwen2.5-VL is the multimodal large language model series
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Miso TTS is an 8 billion, highly emotive text-to-speech model
A SOTA open-source image editing model
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Open-source deep-learning framework
A Multi-Modal World Model for Reconstructing, Generating, Simulation
A Systematic Framework for Interactive World Modeling
Provides convenient access to the Anthropic REST API from any Python 3
Large Multimodal Models for Video Understanding and Editing
General-purpose image editing model that delivers high-fidelity
Unified Multimodal Understanding and Generation Models
A Powerful Native Multimodal Model for Image Generation
Open-source large language model family from Tencent Hunyuan
Video Object and Interaction Deletion
Python SDK for Claude Agent
Genome modeling and design across all domains of life
Designed for text embedding and ranking tasks
Generating Immersive, Explorable, and Interactive 3D Worlds
MOSS‑TTS Family open‑source speech and sound generation model
Foundation model for image generation
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Phi-3.5 for Mac: Locally-run Vision and Language Models
Repo for SeedVR2 & SeedVR
Implementation of the Surya Foundation Model for Heliophysics