Open Source OCR Engine
Lets make video diffusion practical
AI video generator optimized for low VRAM and older GPUs use
This repo contains the code for 1D tokenizer and generator
A lightweight vision library for performing large object detection
Sharp Monocular Metric Depth in Less Than a Second
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Fast and efficient unstructured data extraction
MII makes low-latency and high-throughput inference possible
Overcoming Data Limitations for High-Quality Video Diffusion Models
A Customizable Image-to-Video Model based on HunyuanVideo
Visual AI Workflow Builder
Plug-in that makes it easy to generate stable diffusion images
Implementation of Dreambooth
TDSFT (Two-Dimensional Segmentation Fusion Tool)
An open-source framework for training large multimodal models
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion
Deep learning tool that converts portrait photos into line art
Based on the Disco Diffusion, version of the AI art creation software
A Unified Toolkit for Deep Learning Based Document Image Analysis
Implementation of Deep Feature Rotation for Multimodal Image
Punctuation restoration production-ready model for Russian language
Composable GAN framework with api and user interface
Compute FID scores with PyTorch
Python library for model interpretation/explanations