Text and image to video generation: CogVideoX and CogVideo
A Rust machine learning framework
A fast TTS architecture with conditional flow matching
Generating Immersive, Explorable, and Interactive 3D Worlds
An Open Source text-to-speech system built by inverting Whisper
A Powerful Native Multimodal Model for Image Generation
High-Quality Voice Cloning TTS for 600+ Languages
MII makes low-latency and high-throughput inference possible
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Autonomous Agents (LLMs) research papers. Updated Daily
Global weather forecasting model using graph neural networks and JAX
Qwen3-omni is a natively end-to-end, omni-modal LLM
DepGraph: Towards Any Structural Pruning
Flexible Photo Recrafting While Preserving Your Identity
A Universal Customization Method for Single and Multi Conditioning
Towards Human-Level Text-to-Speech through Style Diffusion
A simple, high-quality voice conversion tool focused on ease of use
C++ inference library for multiple SVC/TTS
Virtual AI anchor that combines state-of-the-art technology
Foundation model for image generation
Advanced language and coding AI model
AI discovers 520000 stable inorganic crystal structures for research
Plug-n-play module turning text-to-image models into animation
Headless browser automation server for AI agents to visit sites
Reference PyTorch implementation and models for DINOv3