Industrial-level controllable zero-shot text-to-speech system
RGBD video generation model conditioned on camera input
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
From Images to High-Fidelity 3D Assets
ChatGPT interface with better UI
AlphaFold 3 inference pipeline
Contexts Optical Compression
Pushing the Limits of Mathematical Reasoning in Open Language Models
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
FAIR Sequence Modeling Toolkit 2
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Large Multimodal Models for Video Understanding and Editing
Official implementation of Watermark Anything with Localized Messages
Python SDK for Claude Agent
Inference script for Oasis 500M
Real-time behaviour synthesis with MuJoCo, using Predictive Control
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Pokee Deep Research Model Open Source Repo
code for Mesh R-CNN, ICCV 2019
Open-source framework for intelligent speech interaction
Example Discord bot written in Python that uses the completions API
Suite with Real-ESRGAN, BSRGAN , RealESRNet, IRCNN, GFPGAN & RIFE.
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Let us control diffusion models
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)