Miso TTS is an 8 billion, highly emotive text-to-speech model
Contexts Optical Compression
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Python inference and LoRA trainer package for the LTX-2 audio–video
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Lets make video diffusion practical
Qwen3 is the large language model series developed by Qwen team
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Native and Compact Structured Latents for 3D Generation
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Text and image to video generation: CogVideoX and CogVideo
Multimodal Diffusion with Representation Alignment
Open-source multi-speaker long-form text-to-speech model
Personalize Any Characters with a Scalable Diffusion Transformer
Easy Docker setup for Stable Diffusion with user-friendly UI
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
FAIR Sequence Modeling Toolkit 2
MOSS‑TTS Family open‑source speech and sound generation model
AlphaFold 3 inference pipeline
The Clay Foundation Model - An open source AI model and interface
Programmatic access to the AlphaGenome model
Long-form streaming TTS system for multi-speaker dialogue generation
Research code artifacts for Code World Model (CWM)
A Family of Open Sourced Music Foundation Models
High-Resolution 3D Assets Generation with Large Scale Diffusion Models