OCR model for complex documents with layout-aware structured outputs
Foundation model for image generation
StreamSpeech is a seamless model for offline speech recognition
A list of free LLM inference resources accessible via API
go1pylib is a Python library designed to control the Go1 robot
Multimodal embedding and reranking models built on Qwen3-VL
GenAI Processors is a lightweight Python library
Build Vision Agents quickly with any model or video provider
The Python code to reproduce illustrations from Machine Learning Book
Python library for scraping and analyzing online news articles easily
Open source terminal session recorder
Foundational model for human-like, expressive TTS
Towards Human-Sounding Speech
Interface for OuteTTS models
Implementation of AudioLM audio generation model in Pytorch
Code and models for ICML 2024 paper, NExT-GPT
Zero-copy PDF text extraction library written in Zig
Extract audio and video content and organize it into a Markdown note
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Search all of YouTube from the command line
Qwen3-ASR is an open-source series of ASR models
Open-Sora: Democratizing Efficient Video Production for All
Framework for building, orchestrating, and deploying AI agents
Open-source multi-speaker long-form text-to-speech model
Documentation for Google's Gen AI site - including Gemini API & Gemma