Generate blog articles from video or audio
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Management of Yandex Station and other smart home devices
A fast TTS architecture with conditional flow matching
SOTA discrete acoustic codec models with 40/75 tokens per second
Controllable and fast Text-to-Speech for over 7000 languages
One-click deployment (including offline integration package)
Foundational model for human-like, expressive TTS
End-to-end speech processing toolkit
Multi-lingual large voice generation model, providing inference
A TTS model capable of generating ultra-realistic dialogue
Pokee Deep Research Model Open Source Repo
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Volcano Engine Reinforcement Learning for LLMs
LLM powered fuzzing via OSS-Fuzz
DeepMind model for tracking arbitrary points across videos & robotics
Sharp Monocular Metric Depth in Less Than a Second
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools
Tooling for the Common Objects In 3D dataset
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
PyTorch code and models for VJEPA2 self-supervised learning from video
Language modeling in a sentence representation space
An open sourced end-to-end VLM-based GUI Agent
Code for Language models can explain neurons in language models paper