A neural network that transforms a design mock-up into static websites
This repo contains the code for 1D tokenizer and generator
Tiny vision language model
Code for running inference and finetuning with SAM 3 model
LTX-Video Support for ComfyUI
Self-supervised visual learning using momentum contrast in PyTorch
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Taming Stable Diffusion for Lip Sync
Lets make video diffusion practical
Python inference and LoRA trainer package for the LTX-2 audio–video
"Big Model" trains a visual multimodal VLM with 26M parameters
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
OCR expert VLM powered by Hunyuan's native multimodal architecture
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Elyra extends JupyterLab with an AI centric approach
Guiding Instruction-based Image Editing via Multimodal Large Language
Flexible Photo Recrafting While Preserving Your Identity
Inference script for Oasis 500M
ICLR2024 Spotlight: curation/training code, metadata, distribution
Python package for AutoML on Tabular Data with Feature Engineering
Official code for Style Aligned Image Generation via Shared Attention
Creation of a Taylorplot for several machine learning models
Code release for ConvNeXt model
GLIDE: a diffusion-based text-conditional image synthesis model
Generative Adversarial Transformers