Gemma open-weight LLM library, from Google DeepMind
ComfyUI wrapper nodes for HunyuanVideo
Multilingual Document Layout Parsing in a Single Vision-Language Model
An on-premises, OCR-free unstructured data extraction
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Marrying Grounding DINO with Segment Anything & Stable Diffusion
Motion-controllable Video Generation via Latent Trajectory Guidance
Multimodal embedding and reranking models built on Qwen3-VL
"Big Model" trains a visual multimodal VLM with 26M parameters
PaddlePaddle End-to-End Development Toolkit
Modular quant framework
Open multimodal web agent built by Ai2
Learning agent trained in a diffusion world model
Fast, powerful, git-native ticket tracking in a single bash script
ICLR2024 Spotlight: curation/training code, metadata, distribution
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Unifying 3D Mesh Generation with Language Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Flexible Photo Recrafting While Preserving Your Identity
Python package for AutoML on Tabular Data with Feature Engineering
OCR expert VLM powered by Hunyuan's native multimodal architecture
Large-language-model & vision-language-model based on Linear Attention
Chat & pretrained large vision language model
airda(Air Data Agent
Virtual AI anchor that combines state-of-the-art technology