Text-to-image diffusion model for high-quality image generation
CTC-based forced aligner for audio-text in 158 languages
Compact 360M text model with high efficiency and fine-tuning support
Powerful 12B parameter model for top-tier text-to-image creation