Parse files for optimal RAG
Recovering the Visual Space from Any Views
LISA: Reasoning Segmentation via Large Language Model
Suite of reference architectures for building GPU-accelerated vision
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Python inference and LoRA trainer package for the LTX-2 audio–video
Unified Multimodal Understanding and Generation Models
Elyra extends JupyterLab with an AI centric approach
Agent-ready RPA suite with visual workflow automation tools engine
Open-Source Python3 tool for recognizing layouts, tables, and math
Machine learning image inpainting task that removes watermarks
Full-stack AI Red Teaming platform
Official implementation of Watermark Anything with Localized Messages
Reference PyTorch implementation and models for DINOv3
From Addition, Subtraction, Multiplication, and Division to ML
A neural network that transforms a design mock-up into static websites
SAPIEN Manipulation Skill Framework
AI tool that converts GitHub repositories into interactive diagrams
Parallel computing with task scheduling
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch
Contexts Optical Compression