Generate blog articles from video or audio
When LLM Meets Domain Experts
Open-sourced unified customization model
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
A text-to-speech, speech-to-text and speech-to-speech library
Collections of robotics environments
Unified Multimodal Understanding and Generation Models
AI discovers 520000 stable inorganic crystal structures for research
DeepMind model for tracking arbitrary points across videos & robotics
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools
NVIDIA Federated Learning Application Runtime Environment
An MLOps framework to package, deploy, monitor and manage models
Request recommended movies, TV shows and anime to Jellyseer/Overseer
This repo contains the code for 1D tokenizer and generator
Bailing is a voice dialogue robot similar to GPT-4o
An Open Source text-to-speech system built by inverting Whisper
Reading book source
Interface for OuteTTS models
Plug-and-play library to enable agents to call MCP and UTCP tools
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
OCR expert VLM powered by Hunyuan's native multimodal architecture
GUI Exploration Lab. One of the best GUI agent solutions
Automatically translates the text of a video based on a subtitle file
Building a Secure and Interoperable Future for AI-Driven Payments