A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Personalize Any Characters with a Scalable Diffusion Transformer
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
High-Fidelity and Controllable Generation of Textured 3D Assets
State-of-the-art (SoTA) text-to-video pre-trained model
Large-language-model & vision-language-model based on Linear Attention
GLM-4 series: Open Multilingual Multimodal Chat LMs
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Fast and Universal 3D reconstruction model for versatile tasks
Official code for Style Aligned Image Generation via Shared Attention
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
FAIR Sequence Modeling Toolkit 2
ICLR2024 Spotlight: curation/training code, metadata, distribution
A Production-ready Reinforcement Learning AI Agent Library
A PyTorch library for implementing flow matching algorithms
Memory-efficient and performant finetuning of Mistral's models
Diffusion Transformer with Fine-Grained Chinese Understanding
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Pokee Deep Research Model Open Source Repo
Unified Multimodal Understanding and Generation Models
DeepMind model for tracking arbitrary points across videos & robotics
Tooling for the Common Objects In 3D dataset
code for Mesh R-CNN, ICCV 2019