An on-premises, OCR-free unstructured data extraction
Multimodal Diffusion with Representation Alignment
Time-lapse Video Generation Models as Metamorphic Simulators
Generate audiobooks from EPUBs, PDFs and text with captions
CLI tool to extract (meta)data from PDF and manipulate PDF files
A modular graph-based Retrieval-Augmented Generation (RAG) system
Framework for building real-time voice and multimodal AI agents
Lets make video diffusion practical
Official MiniMax Model Context Protocol (MCP) server
Create prompt-friendly codebase digests from any Git repository URL
Extract audio and video content and organize it into a Markdown note
Vision utilities for web interaction agents
Deepfakes Software For All
OCR software, free and offline
ComfyUI wrapper nodes for HunyuanVideo
100–200× Acceleration for Video Diffusion Models
AV1 Image File Format Specification - ISO-BMFF/HEIF derivative
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Image polygonal annotation with Python
Semantic search and workflows for medical/scientific papers
Python scripts for ETL (extract, transform and load) jobs for Ethereum
Structured data extraction and instruction calling with ML, LLM
Build Vision Agents quickly with any model or video provider
Modular AI image and video generation web UI with extensible tools
Persepolis Download Manager is a GUI for aria2