Long-form streaming TTS system for multi-speaker dialogue generation
Open-source industrial-grade ASR models
A frontier, first-principles handbook
Ultimate meta-skill for generating best-in-class Claude Code skills
End-to-end pipeline converting generative videos
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Motion-controllable Video Generation via Latent Trajectory Guidance
Persistent context and multi-instance coordination
Multimodal embedding and reranking models built on Qwen3-VL
Implementation of "MobileCLIP" CVPR 2024
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Search all of YouTube from the command line
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
All-in-one WebUI for AI generative image and video creation
Curated list of classic, high-quality computer science books
Operating LLMs in production
Extract schema, statistics and entities from datasets
A collaboration friendly studio for NeRFs
Fast image augmentation library and an easy-to-use wrapper
A library for deep learning end-to-end dialog systems and chatbots
A mcp server for vikingdb store and search
A library to communicate with ChatGPT, Claude, Copilot, Gemini