Official inference repo for FLUX.2 models
Multimodal Diffusion with Representation Alignment
A theoretical reconstruction of the Claude Mythos architecture
Renderer for the harmony response format to be used with gpt-oss
Long-form streaming TTS system for multi-speaker dialogue generation
Qwen3-TTS is an open-source series of TTS models
Controllable & emotion-expressive zero-shot TTS
An experimental version of DeepSeek model
Industrial-level controllable zero-shot text-to-speech system
Contexts Optical Compression
Easy Docker setup for Stable Diffusion with user-friendly UI
Qwen3-ASR is an open-source series of ASR models
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Provides convenient access to the Anthropic REST API from any Python 3
A 0.1B Omni model trained from scratch
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Hunyuan Translation Model Version 1.5
Towards self-verifiable mathematical reasoning
Foundational Models for State-of-the-Art Speech and Text Translation
Official implementation of DreamCraft3D
Towards Real-World Vision-Language Understanding
Dataset of GPT-2 outputs for research in detection, biases, and more
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Code for reproducing key results in the paper
Flexible text-to-text transformer model for multilingual NLP tasks