Inference framework for 1-bit LLMs
Large-language-model & vision-language-model based on Linear Attention
Qwen2.5-VL is the multimodal large language model series
Fast and Universal 3D reconstruction model for versatile tasks
Official code for Style Aligned Image Generation via Shared Attention
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
FAIR Sequence Modeling Toolkit 2
A Production-ready Reinforcement Learning AI Agent Library
A PyTorch library for implementing flow matching algorithms
Official DeiT repository
Memory-efficient and performant finetuning of Mistral's models
Diffusion Transformer with Fine-Grained Chinese Understanding
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
ChatGPT interface with better UI
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
An AI-powered security review GitHub Action using Claude
Implementation of the Surya Foundation Model for Heliophysics
GLM-4 series: Open Multilingual Multimodal Chat LMs
Unified Multimodal Understanding and Generation Models
DeepMind model for tracking arbitrary points across videos & robotics
Tooling for the Common Objects In 3D dataset
code for Mesh R-CNN, ICCV 2019
State-of-the-art Image & Video CLIP, Multimodal Large Language Models