Video understanding codebase from FAIR for reproducing video models
A state-of-the-art open visual language model
GLM-4-Voice | End-to-End Chinese-English Conversational Model
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Memory-efficient and performant finetuning of Mistral's models
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Official implementation of DreamCraft3D
Towards Real-World Vision-Language Understanding
Release for Improved Denoising Diffusion Probabilistic Models
The ChatGPT Retrieval Plugin lets you easily find personal documents
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Diffusion Transformer with Fine-Grained Chinese Understanding
A Unified Framework for Text-to-3D and Image-to-3D Generation
A Customizable Image-to-Video Model based on HunyuanVideo
Open-source large language model family from Tencent Hunyuan
Multimodal-Driven Architecture for Customized Video Generation
Personalize Any Characters with a Scalable Diffusion Transformer
Implementation of the Surya Foundation Model for Heliophysics
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project
FlashMLA: Efficient Multi-head Latent Attention Kernels
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Let us control diffusion models