Controllable & emotion-expressive zero-shot TTS
Democratizing Reinforcement Learning for LLMs
Generate blog articles from video or audio
Provider-agnostic, open-source evaluation infrastructure
VITS2 backbone with multilingual-bert
A fast TTS architecture with conditional flow matching
SOTA discrete acoustic codec models with 40/75 tokens per second
One-click deployment (including offline integration package)
A TTS model capable of generating ultra-realistic dialogue
Self hosted & open source anonymous 360 review software
An Efficient, Scalable, Multi-Modality RL Training Framework
Pokee Deep Research Model Open Source Repo
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Stable-diffusion-webui-pixelization
Volcano Engine Reinforcement Learning for LLMs
DeepMind model for tracking arbitrary points across videos & robotics
An alignment auditing agent capable of exploring alignment hypothesis
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools
code for Mesh R-CNN, ICCV 2019
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Language modeling in a sentence representation space
Inference Llama 2 in one file of pure C
The repository provides code for running inference with SAM 2
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
How to improve NGINX performance, security, and other important things