StarVector is a foundation model for SVG generation
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
Structured data extraction and instruction calling with ML, LLM
An AI personal assistant for your digital brain
Skywork-R1V is an advanced multimodal AI model series
Code and models for ICML 2024 paper, NExT-GPT
An AI-powered file management tool that ensures privacy
Qwen-Image is a powerful image generation foundation model
Visual intelligence for your home.
Qwen2.5-VL is the multimodal large language model series
Enhances Tesseract OCR output using LLMs (local or API)
Data Infrastructure providing an approach to multimodal AI workloads
LISA: Reasoning Segmentation via Large Language Model
Build multimodal language agents for fast prototype and production
Document (PDF, Word, PPTX ...) extraction and parse API
Open source demo platform where you can easily showcase your AI models
Qwen3-omni is a natively end-to-end, omni-modal LLM
Open source libraries and APIs to build custom preprocessing pipelines
Tensor search for humans
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Gemma open-weight LLM library, from Google DeepMind
Multi-source content processor for NotebookLM
Gracefully face hCaptcha challenge with multimodal llms
Data Lake for Deep Learning. Build, manage, and query datasets
Chinese and English multimodal conversational language model