Dia-1.6B generates lifelike English dialogue and vocal expressions
JetBrains’ 4B parameter code model for completions
Metric monocular depth estimation (vision model)
OpenAI’s compact 20B open model for fast, agentic, and local use
Tencent’s 36-language state-of-the-art translation model
CTC-based forced aligner for audio-text in 158 languages
Vision-language-action model for robot control via images and text