Handwritten Text Recognition (HTR) system implemented with TensorFlow
Harmonized and Coherent Human Image Animation
A frontier, first-principles handbook
Foundation model for image generation
Marrying Grounding DINO with Segment Anything & Stable Diffusion
Motion-controllable Video Generation via Latent Trajectory Guidance
Let agents classify your bank transactions
Multimodal embedding and reranking models built on Qwen3-VL
Modular quant framework
A theme for Sublime Text 3 by Mattia Astorino
Learning agent trained in a diffusion world model
General-purpose image editing model that delivers high-fidelity
No-code LLM Platform to launch APIs and ETL Pipelines
Fast, powerful, git-native ticket tracking in a single bash script
Inference script for Oasis 500M
TorchMultimodal is a PyTorch library
ICLR2024 Spotlight: curation/training code, metadata, distribution
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Unifying 3D Mesh Generation with Language Models
GitLab automatic code review tool based on large models
Flexible Photo Recrafting While Preserving Your Identity
A command-line utility for taking automated screenshots of websites
Python module that helps you build complex pipelines of batch jobs
OCR expert VLM powered by Hunyuan's native multimodal architecture
Large-language-model & vision-language-model based on Linear Attention