Industrial-level controllable zero-shot text-to-speech system
RGBD video generation model conditioned on camera input
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
From Images to High-Fidelity 3D Assets
ChatGPT interface with better UI
Contexts Optical Compression
Controllable & emotion-expressive zero-shot TTS
Python SDK for Claude Agent
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Pokee Deep Research Model Open Source Repo
Pushing the Limits of Mathematical Reasoning in Open Language Models
Inference script for Oasis 500M
FAIR Sequence Modeling Toolkit 2
Foundational Models for State-of-the-Art Speech and Text Translation
Analyze computation-communication overlap in V3/R1
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Large Multimodal Models for Video Understanding and Editing
Example Discord bot written in Python that uses the completions API
Suite with Real-ESRGAN, BSRGAN , RealESRNet, IRCNN, GFPGAN & RIFE.
A CNN model that predicts human joints from RGB images of a person
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Let us control diffusion models
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Locally run an Instruction-Tuned Chat-Style LLM
A collection of high-quality models for the MuJoCo physics engine