Document Image Parsing via Heterogeneous Anchor Prompting”
Build Vision Agents quickly with any model or video provider
Official repository for LTX-Video
AI-powered tool for generating, optimizing, and translating subtitles
A Telegram RSS bot that cares about your reading experience
High-resolution models for human tasks
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Get your documents ready for gen AI
An Open Source text-to-speech system built by inverting Whisper
Private AI platform for agents, enterprise search and RAG pipelines
Qwen3-TTS is an open-source series of TTS models
A lightweight text-to-speech model with zero-shot voice cloning
StreamSpeech is a seamless model for offline speech recognition
Instill Core is a full-stack AI infrastructure tool for data
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official PyTorch Implementation
State-of-the-art diffusion models for image and audio generation
Improve human sleep through scientifically
An AI for Music Generation
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
The data structure for multimodal data
Framework for building realtime multimodal voice AI agents apps
The Triton Inference Server provides an optimized cloud
Large Multimodal Models for Video Understanding and Editing
A simple native web interface that uses ChatTTS to synthesize text