Taming Stable Diffusion for Lip Sync
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch
Suite of reference architectures for building GPU-accelerated vision
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
AI tool that converts GitHub repositories into interactive diagrams
Extension of Google Research’s PaperBanana
A state-of-the-art open visual language model
Open-source evaluation toolkit of large multi-modality models (LMMs)
Multimodal Diffusion with Representation Alignment
All-in-one AI productivity platform with agents, workflows, and IM
A neural network that transforms a design mock-up into static websites
Contexts Optical Compression
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Label Studio is a multi-type data labeling and annotation tool
Reference PyTorch implementation and models for DINOv3
Elyra extends JupyterLab with an AI centric approach
Python inference and LoRA trainer package for the LTX-2 audio–video
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
VMZ: Model Zoo for Video Modeling
Generate audiobooks from e-books
An on-premises, OCR-free unstructured data extraction