Democratizing Reinforcement Learning for LLMs
Generate blog articles from video or audio
When LLM Meets Domain Experts
Open-sourced unified customization model
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Collections of robotics environments
Unified Multimodal Understanding and Generation Models
AI discovers 520000 stable inorganic crystal structures for research
DeepMind model for tracking arbitrary points across videos & robotics
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools
NVIDIA Federated Learning Application Runtime Environment
Request recommended movies, TV shows and anime to Jellyseer/Overseer
This repo contains the code for 1D tokenizer and generator
Bailing is a voice dialogue robot similar to GPT-4o
An Open Source text-to-speech system built by inverting Whisper
Towards Human-Sounding Speech
Reading book source
Interface for OuteTTS models
Plug-and-play library to enable agents to call MCP and UTCP tools
A set of Docker images for training and serving models in TensorFlow
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
OCR expert VLM powered by Hunyuan's native multimodal architecture
GUI Exploration Lab. One of the best GUI agent solutions
Automatically translates the text of a video based on a subtitle file
Building a Secure and Interoperable Future for AI-Driven Payments