OCR expert VLM powered by Hunyuan's native multimodal architecture
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Label Studio is a multi-type data labeling and annotation tool
Qwen2.5-VL is the multimodal large language model series
Use Microsoft Edge's online text-to-speech service from Python
A feature-rich event management system
A beautiful, powerful, self-hosted rom manager and player
Dealing with all unstructured data, such as reverse image search
InvokeAI is a leading creative engine for Stable Diffusion models
The data structure for multimodal data
GUI/CLI tool for downloading Xiaohongshu
New Modpack with Gregtech, Thaumcraft and Witchery
A Telegram bot that integrates with OpenAI's official ChatGPT APIs
🎥 A free & open-source Python tool to remove unwanted objects from videos frame-by-frame using brush masking and AI inpainting (OpenCV + FFmpeg). EXE included.
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
21 Lessons, Get Started Building with Generative AI
Adversarial Robustness Toolbox (ART) - Python Library for ML security
Generating Immersive, Explorable, and Interactive 3D Worlds
Open-source MCP server that gives your coding agent
Inference script for Oasis 500M
Extract audio and video content and organize it into a Markdown note
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
An Open Source package that allows video game creators
Less rage, more chill