This repo contains the code for 1D tokenizer and generator
Code for running inference and finetuning with SAM 3 model
LTX-Video Support for ComfyUI
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
Unified Multimodal Understanding and Generation Models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Visual Studio Code client for Tabnine
A workflow execution platform built on top of the fantastic Cloudflare
Tiny vision language model
This repository contains the official implementation of FastVLM
A neural network that transforms a design mock-up into static websites
Towards Real-World Vision-Language Understanding
SAPIEN Manipulation Skill Framework
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Refer and Ground Anything Anywhere at Any Granularity
Multimodal Diffusion with Representation Alignment
Reference PyTorch implementation and models for DINOv3
Extensible workflow development framework
Python inference and LoRA trainer package for the LTX-2 audio–video
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Workflow and speech recognition app
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
The most powerful Android RPA agent framework
Foundational Models for State-of-the-Art Speech and Text Translation