Multimodal Diffusion with Representation Alignment
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Advancing Open-source World Models
A trainable PyTorch reproduction of AlphaFold 3
Python inference and LoRA trainer package for the LTX-2 audio–video
DeepSeek Coder: Let the Code Write Itself
LLM-based Reinforcement Learning audio edit model
Controllable & emotion-expressive zero-shot TTS
A Pragmatic VLA Foundation Model
Industrial-level controllable zero-shot text-to-speech system
PyTorch implementation of JiT
HY-Motion model for 3D character animation generation
Qwen3.6 is the large language model series developed by Qwen team
Towards self-verifiable mathematical reasoning
Advanced language and coding AI model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
A Powerful Native Multimodal Model for Image Generation
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Provides convenient access to the Anthropic REST API from any Python 3
GPT4V-level open-source multi-modal model based on Llama3-8B
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Instructions on how to use the Realtime API on Microcontrollers
Tooling for the Common Objects In 3D dataset
OpenAI’s open-weight 120B model optimized for reasoning and tooling
Self-evolving AI model for agents, coding, and complex workflows