Page 2 | visual free download

Showing 366 open source projects for "visual"

View related business solutions

Artificial Intelligence Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

R1-V

Witness the aha moment of VLM with less than $3

R1-V is an initiative aimed at enhancing the generalization capabilities of Vision-Language Models (VLMs) through Reinforcement Learning in Visual Reasoning (RLVR). The project focuses on building a comprehensive framework that emphasizes algorithm enhancement, efficiency optimization, and task diversity to achieve general vision-language intelligence and visual/GUI agents. The team's long-term goal is to contribute impactful open-source research in this domain.

Downloads: 0 This Week

Last Update: 2025-03-19
See Project
2

Skywork-R1V4

Skywork-R1V is an advanced multimodal AI model series

Skywork-R1V is an open-source multimodal reasoning model designed to extend the capabilities of large language models into vision-language tasks that require complex logical reasoning. The project introduces a model architecture that transfers the reasoning abilities of advanced text-based models into visual domains so the system can interpret images and perform multi-step reasoning about them. Instead of retraining both language and vision models from scratch, the framework uses a lightweight visual projection layer that connects a pretrained vision backbone with a reasoning-capable language model. This design allows the model to analyze images while maintaining strong textual reasoning performance, enabling tasks such as solving visual math problems, interpreting scientific diagrams, and answering questions about images.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
3

ArtCraft

Crafting engine for artists, designers, and filmmakers

...The project positions itself as an intentional “crafting engine” for artists, designers, and filmmakers who want deeper control over generative media pipelines. Rather than relying purely on text prompts, ArtCraft emphasizes visual manipulation, compositional control, and iterative refinement so creators can treat AI output more like a malleable creative medium. The application is built with performance and responsiveness in mind, enabling users to move between different creative canvases and asset workflows within a unified interface. It aims to support complex multimedia generation workflows including image, video, and potentially 3D content creation, making it useful for experimental filmmaking and advanced visual design.

Downloads: 5 This Week

Last Update: 2026-04-23
See Project
4

SAM 3

Code for running inference and finetuning with SAM 3 model

SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short phrase or exemplars, scaling to a vastly larger set of categories than traditional closed-set models. ...

Downloads: 35 This Week

Last Update: 2 days ago
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

OpenClaw

Your own personal AI assistant. Any OS. Any Platform.

OpenClaw (formerly Clawdbot/Moltbot) is an open-source, self-hosted autonomous AI assistant designed to run on user-controlled hardware and bridge conversational natural language with real-world task execution, effectively acting as a proactive digital assistant rather than a reactive chatbot. It lets you send instructions through familiar messaging platforms like WhatsApp, Telegram, Discord, Slack, Signal, iMessage, and more, and then interprets those instructions to carry out actions such...

1 Review

Downloads: 598 This Week

Last Update: 23 hours ago
See Project
6

WeChatMsg

Project aimed at extracting, exporting, and analyzing chat records

...It provides tools that read local WeChat database files and allow users to convert chat data into readable formats such as HTML, Word, and CSV, making it possible to inspect conversations outside the mobile app environment. Beyond simple export, the project includes mechanisms for analyzing chat histories and generating annual reports or visual summaries about messaging trends, interaction patterns, and more. The original README communicates a guiding philosophy about owning personal data and using it responsibly to train personalized AI agents or preserve memories. Although the repository has seen periods of inactivity and may not receive frequent updates, its widespread use indicates community interest in preserving chat logs and understanding conversation data outside of the WeChat interface.

Downloads: 165 This Week

Last Update: 2026-02-06
See Project
7

Self-Operating Computer

A framework to enable multimodal models to operate a computer

...Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.

1 Review

Downloads: 11 This Week

Last Update: 2025-02-28
See Project
8

Coze Studio

An AI agent development platform with all-in-one visual tools

Coze Studio is ByteDance’s open‑source, visual AI agent development platform. It offers no-code/low-code workflows to build, debug, and deploy conversational agents, integrating prompting, RAG-based knowledge bases, plugin systems, and workflow orchestration. Developed in Go (backend) and React/TypeScript (frontend), it uses a containerized microservices architecture suitable for enterprise deployment.

Downloads: 2 This Week

Last Update: 2026-01-20
See Project
9

LTX-2.3

Official Python inference and LoRA trainer package

...Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while simultaneously producing corresponding audio elements such as speech, music, ambient sound, or effects. This unified approach allows creators to generate complete multimedia sequences where motion, timing, and sound are aligned automatically. LTX-2 is designed for both research and production workflows and can generate high-resolution video clips with precise control over structure, motion, and camera behavior.

Downloads: 185 This Week

Last Update: 2026-04-23
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
10

Inkeep

Create AI Agents in a No-Code Visual Builder or TypeScript SDK

Inkeep is an open-source framework for building and deploying AI agent workflows and interactive assistants that operate autonomously across applications, enterprise environments, and customer engagement use cases. It lets developers and non-technical users create, manage, and orchestrate multi-agent systems using both a no-code visual builder and a full TypeScript SDK, giving two ways to define agent behaviors that stay in sync with each other. Agents built with this framework can act as real-time conversational assistants — for example, handling help desk inquiries, providing internal support to teams, or driving in-app experiences — and they can be extended to automate multi-step tasks that interact with external systems like CRMs, knowledge bases, or ticketing systems. ...

Downloads: 1 This Week

Last Update: 2026-04-15
See Project
11

Qwen3-VL

Qwen3-VL, the multimodal large language model series by Alibaba Cloud

Qwen3-VL is the latest multimodal large language model series from Alibaba Cloud’s Qwen team, designed to integrate advanced vision and language understanding. It represents a major upgrade in the Qwen lineup, with stronger text generation, deeper visual reasoning, and expanded multimodal comprehension. The model supports dense and Mixture-of-Experts (MoE) architectures, making it scalable from edge devices to cloud deployments, and is available in both instruction-tuned and reasoning-enhanced variants. Qwen3-VL is built for complex tasks such as GUI automation, multimodal coding (converting images or videos into HTML, CSS, JS, or Draw.io diagrams), long-context reasoning with support up to 1M tokens, and comprehensive video understanding. ...

Downloads: 5 This Week

Last Update: 4 days ago
See Project
12

Moondream

Tiny vision language model

...It serves as both a playground for the author’s artistic curiosity and a resource for other creative coders interested in generative art techniques. The repository may include shaders, canvas/WebGL code, visual demos, and utilities that demonstrate how mathematical functions or noise patterns can be harnessed for compelling visuals.

Downloads: 0 This Week

Last Update: 2026-04-20
See Project
13

Activepieces

Open Source AI Automation

Activepieces is an open-source automation tool designed to build workflows that connect different apps and services without requiring extensive programming knowledge. It’s tailored for technical and non-technical users alike, enabling teams to automate repetitive tasks using a visual editor and a large library of pre-built connectors. Activepieces can be self-hosted or used via a cloud deployment, making it flexible for teams of all sizes. It supports integrations with popular services like Slack, Google Sheets, and Discord, and allows users to create custom pieces to suit unique needs. With real-time logs, version history, and scheduling, Activepieces is positioned as a compelling alternative to Zapier for open-source and privacy-conscious users.

Downloads: 8 This Week

Last Update: 2026-04-24
See Project
14

Autonomous Agents

Autonomous Agents (LLMs) research papers. Updated Daily

...These methods allow agents to combine visual and geometric information while maintaining awareness of the spatial relationships between agents and objects.

Downloads: 2 This Week

Last Update: 2026-04-28
See Project
15

AionUi

Free, local, open-source Cowork for Gemini CLI, Claude Code, Codex

AionUi is an open-source, cross-platform graphical interface that turns command-line AI tools into a unified coworking desktop for interacting with multiple local AI agents and CLI models like Gemini CLI, Claude Code, Codex, Qwen Code, and others. Instead of forcing users to work in separate terminals for each tool, AionUi automatically detects installed CLI tools and provides a central visual workspace where sessions can run in parallel, contexts are preserved, and conversations are saved locally without sending data to external servers. It enhances productivity by offering smart file management features like batch renaming, automatic organization, and intelligent file classification, thereby reducing manual overhead when working with large datasets or complex document structures. ...

Downloads: 143 This Week

Last Update: 17 hours ago
See Project
16

PySpur

Visual tool for building, testing, and deploying AI agent workflows

...By offering a visual representation of workflows, PySpur makes it easier to debug interactions between components and identify failures in complex pipelines. It supports iterative experimentation, allowing developers to rapidly improve agents without rebuilding systems from scratch. PySpur also enables deployment of finalized workflows after testing, making it suitable for both development and production use.

Downloads: 0 This Week

Last Update: 2026-03-17
See Project
17

Puck

Open source visual editor for building React drag-and-drop pages

Puck is an open source visual editor designed for React applications that enables developers to build customizable drag-and-drop page editing experiences. It allows teams to create their own page builders by defining React components that can be arranged and configured through a visual interface. Puck is component-based and configuration-driven, meaning developers specify how components render and which editable fields control their properties.

Downloads: 0 This Week

Last Update: 2026-04-01
See Project
18

CogVLM

A state-of-the-art open visual language model

CogVLM is an open-source visual–language model suite—and its GUI-oriented sibling CogAgent—aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks.

Downloads: 0 This Week

Last Update: 4 days ago
See Project
19

DeepSeek VL2

Mixture-of-Experts Vision-Language Models for Advanced Multimodal

...or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to process visual inputs as context for downstream tasks. The repository includes evaluation results (e.g. image/text alignment scores, common VL benchmarks), configuration files, and model weights (where permitted). While the internal architecture details are not fully documented publicly, the repo suggests that VL2 introduces enhancements over prior vision-language models (e.g. better scaling, cross-modal attention, more robust alignment) to improve grounding and multimodal understanding.

Downloads: 12 This Week

Last Update: 2025-10-03
See Project
20

Rivet

Visual AI IDE for building agents with prompt chains and graphs

Rivet is an open source visual AI programming environment designed to help developers build complex AI agents using a node-based interface and prompt chaining workflows. It provides a desktop application that allows users to visually construct and debug AI logic as interconnected graphs, making it easier to manage sophisticated interactions between language models and external tools.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
21

LlamaParse

Parse files for optimal RAG

LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.

Downloads: 3 This Week

Last Update: 2026-02-13
See Project
22

Agentspan

Durable, Distributed runtime for ALL of your agents

...The system is built for durability, meaning tasks can pause for extended periods, including waiting for human approval, and then resume seamlessly. It supports scaling across multiple environments, making it suitable for production-grade agent orchestration. Agentspan includes a local server and visual interface that allow developers to inspect execution flows and debug agent behavior. It also integrates with multiple model providers, enabling flexibility in selecting underlying AI systems. Overall, it provides infrastructure for building resilient, long-running AI agents rather than short-lived scripts.

Downloads: 4 This Week

Last Update: 2026-04-27
See Project
23

OpenWhip

Optimize interaction with AI coding assistants

OpenWhip is a desktop utility built as a cross-platform Node.js application that humorously gamifies interaction with AI coding assistants by simulating a “whip” tool to interrupt and motivate them during long or stalled operations. The application runs as a lightweight system tray program and overlays a visual whip animation on the screen when activated, creating an interactive and slightly absurd interface for user engagement. Its core functionality is surprisingly practical beneath the joke: when triggered, it sends a keyboard interrupt signal (Ctrl+C) to halt the current AI process, effectively giving developers a quick way to stop unresponsive or slow-running tasks. ...

Downloads: 4 This Week

Last Update: 2026-04-14
See Project
24

Langflow

Low-code app builder for RAG and multi-agent AI applications

Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.

Downloads: 32 This Week

Last Update: 4 days ago
See Project
25

CC Workflow Studio

Accelerate Claude Code/GitHub Copilot

CC Workflow Studio is a powerful Visual Studio Code extension that accelerates AI-assisted development by providing a visual workflow editor tailored for AI automation and agent orchestration, particularly with tools like Claude Code, GitHub Copilot, OpenAI Codex, and others. The extension lets developers and creators design complex AI workflows using intuitive drag-and-drop canvases or via conversational AI commands, blending graphical editing with natural language refinement. ...

Downloads: 0 This Week

Last Update: 21 hours ago
See Project