OCR model for complex documents with layout-aware structured outputs
A general fine-tuning kit geared toward image/video/audio diffusion
InvokeAI is a leading creative engine for Stable Diffusion models
Your Personal AI Assistant; easy to install, deploy on local or coud
Welcome the Era of One-shot Long-horizon Parsing
OpenRecall is a fully open-source, privacy-first alternative
Implementation of the Surya Foundation Model for Heliophysics
No-code LLM Platform to launch APIs and ETL Pipelines
The open-source tool for building high-quality datasets
Multimodal-Driven Architecture for Customized Video Generation
Implementation of Phenaki Video, which uses Mask GIT
Adversarial Robustness Toolbox (ART) - Python Library for ML security
A Systematic Framework for Interactive World Modeling
Gemma open-weight LLM library, from Google DeepMind
Generating Immersive, Explorable, and Interactive 3D Worlds
Question and Answer based on Anything
Project Lyra: Open Generative 3D World Models
General-purpose image editing model that delivers high-fidelity
A Customizable Image-to-Video Model based on HunyuanVideo
A simple screen parsing tool towards pure vision based GUI agent
RGBD video generation model conditioned on camera input
Multi-source content processor for NotebookLM
Instill Core is a full-stack AI infrastructure tool for data
RF-DETR is a real-time object detection and segmentation
Build multimodal AI applications with cloud-native stack