A neural network that transforms a design mock-up into static websites
This repo contains the code for 1D tokenizer and generator
Visual Studio Code client for Tabnine
Tiny vision language model
Code for running inference and finetuning with SAM 3 model
LTX-Video Support for ComfyUI
Extensible workflow development framework
Python inference and LoRA trainer package for the LTX-2 audio–video
"Big Model" trains a visual multimodal VLM with 26M parameters
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Guiding Instruction-based Image Editing via Multimodal Large Language
Flexible Photo Recrafting While Preserving Your Identity
Inference script for Oasis 500M
ICLR2024 Spotlight: curation/training code, metadata, distribution
Learning multi-scale deep model correcting over- and under- exposed
Official code for Style Aligned Image Generation via Shared Attention
Code release for ConvNeXt model
A real-time approach for mapping all human pixels of 2D RGB images
Simulating worlds in a computer
Vision-language-action model for robot control via images and text