Page 5 | visual free download

Showing 366 open source projects for "visual"

View related business solutions

Artificial Intelligence Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
1

FastGPT

FastGPT is a knowledge-based platform built on the LLMs

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.

Downloads: 2 This Week

Last Update: 2026-04-28
See Project
2

DeepSeek VL

Towards Real-World Vision-Language Understanding

DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot). The repository includes model weights (or pointers to them), evaluation metrics on standard vision + language benchmarks, and configuration or architecture files. ...

Downloads: 3 This Week

Last Update: 2025-10-03
See Project
3

Agent S

Agent S: an open agentic framework that uses computers like a human

...The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines powerful foundation models (such as GPT-5) with grounding models like UI-TARS to translate visual inputs into precise executable actions. It supports flexible deployment via CLI, SDK, or cloud, and integrates with multiple model providers including OpenAI, Anthropic, Gemini, Azure, and Hugging Face endpoints. With optional local code execution, reflection mechanisms, and compositional planning, Agent S provides a scalable and research-driven framework for building advanced computer-use agents.

Downloads: 5 This Week

Last Update: 2025-12-16
See Project
4

Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM

...It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and 10 speech output languages. It achieves state-of-the-art results: across 36 audio and audio-visual benchmarks, it hits open-source SOTA on 32 and overall SOTA on 22, outperforming or matching strong closed-source models such as Gemini-2.5 Pro and GPT-4o. To reduce latency, especially in audio/video streaming, Talker predicts discrete speech codecs via a multi-codebook scheme and replaces heavier diffusion approaches.

Downloads: 4 This Week

Last Update: 2026-04-23
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

AppAgent

Multimodal Agents as Smartphone Users, an LLM-based multimodal agent

AppAgent is an open-source multimodal agent framework designed to enable large language models to operate smartphone applications through natural interactions with graphical user interfaces. The system allows an AI agent to interpret visual information from the screen and translate natural language instructions into actions such as tapping, swiping, and navigating between application screens. Instead of requiring backend access to application APIs, the framework interacts with apps the same way a human user would, making it compatible with a wide variety of mobile applications. ...

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
6

Coze Loop

Next-generation AI Agent Optimization Platform

...The project aims to simplify the increasingly complex workflow of building reliable AI agents by offering integrated tools for debugging, evaluation, observability, and optimization. Through its visual playground, developers can test prompts interactively and compare outputs across different language models. The platform also includes automated evaluation capabilities that assess agent performance across multiple quality dimensions such as accuracy and compliance. Its observability layer captures detailed execution traces, enabling teams to understand how inputs, prompts, and tools interact during runtime. ...

Downloads: 1 This Week

Last Update: 2026-03-02
See Project
7

CogView4

CogView4, CogView3-Plus and CogView3(ECCV 2024)

CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets, enabling stronger alignment between textual prompts and generated visual content. It emphasizes bilingual usability, making it well-suited for cross-lingual multimodal applications. ...

Downloads: 1 This Week

Last Update: 2 days ago
See Project
8

GLM-4.5V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding, and long-document interpretation. GLM-4.5V emerged from a training framework that leverages scalable reinforcement learning (with curriculum sampling) to boost performance across tasks ranging from STEM problem solving to long-context reasoning, giving it broad applicability beyond narrow benchmarks. ...

Downloads: 0 This Week

Last Update: 2026-04-06
See Project
9

Sa2VA

Official Repo For "Sa2VA: Marrying SAM2 with LLaVA

Sa2VA is a cutting-edge open-source multi-modal large language model (MLLM) developed by ByteDance that unifies dense segmentation, visual understanding, and language-based reasoning across both images and videos. It merges the segmentation power of a state-of-the-art video segmentation model (based on SAM‑2) with the vision-language reasoning capabilities of a strong LLM backbone (derived from models like InternVL2.5 / Qwen-VL series), yielding a system that can answer questions about visual content, perform referring segmentation, and maintain temporal consistency across frames in video. ...

Downloads: 0 This Week

Last Update: 2025-12-02
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

DINOv3

Reference PyTorch implementation and models for DINOv3

DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while maintaining or improving feature quality. ...

Downloads: 12 This Week

Last Update: 2026-03-30
See Project
11

what-to-eat

An AI-based intelligent recipe generation platform

...The application combines modern frontend technologies with large language models to create a fully interactive cooking assistant that can produce detailed recipes, nutritional analysis, and even visual representations of dishes. It supports a wide range of cuisines, including traditional Chinese regional styles and international dishes, making it versatile for different cultural preferences. The system goes beyond simple recipe suggestions by including features such as wine pairing recommendations, sauce design, and health scoring, providing a more holistic cooking experience. ...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
12

AlphaTree

DNN && GAN && NLP && BIG DATA

AlphaTree is an educational repository that provides a visual roadmap of deep learning models and related artificial intelligence technologies. The project focuses on explaining the historical development and relationships between major neural network architectures used in modern machine learning. It presents diagrams and documentation describing the evolution of models such as LeNet, AlexNet, VGG, ResNet, DenseNet, and Inception networks.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
13

Watermark-Removal

Machine learning image inpainting task that removes watermarks

Watermark-Removal repository is a machine learning project focused on removing visible watermarks from digital images using deep learning and image inpainting techniques. The system analyzes an image containing a watermark and attempts to reconstruct the underlying visual content so that the watermark is removed while preserving the original appearance of the image. The project uses neural network models inspired by research in contextual attention and gated convolution, which are methods commonly applied to image restoration tasks. Through these techniques, the model learns to identify regions of the image affected by the watermark and generate realistic replacements for the missing visual information. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
14

i.am.ai

Roadmap to becoming an Artificial Intelligence Expert in 2022

i.am.ai is a structured educational guide that maps out the knowledge areas and technologies required to become an artificial intelligence or machine learning expert. The project presents visual charts that outline multiple career paths such as data scientist, machine learning engineer, and AI specialist, helping learners understand what to study and in what order. It was originally created to train internal employees but was released publicly to support the broader community. The roadmap emphasizes foundational skills like mathematics, programming, and data handling before progressing into deep learning and specialized domains. ...

Downloads: 0 This Week

Last Update: 2026-03-02
See Project
15

Clawra

Openclaw as your girlfriend

...Rather than being a static chatbot tied to a corporate ecosystem, Clawra runs locally or on a private server, giving users full control over the software and data that back her behavior. She is designed not just to answer questions but to maintain a persistent character with memory, backstory, and the ability to present visual outputs like generated selfies through integrated image tools, blending conversational AI with a playful persona. Clawra has captured attention as an experimental project showcasing how far open-source agents can be pushed in creating engaging and personalized interactions, with community interest spiking around her capabilities.

Downloads: 0 This Week

Last Update: 2026-02-13
See Project
16

Plannotator

Annotate and review coding agent plans visually, share with your team

Plannotator is an interactive plan review and annotation tool built to support AI coding agents, offering a visual UI for markup, refinement, and team collaboration around agent-generated plans. It allows developers to annotate proposed plans, sketches, and outlines from tools like Claude Code or OpenCode with pen tools, arrows, and highlighting, seamlessly capturing feedback that can be shared across teams or pushed back to agents. Plannotator integrates with diff views so reviewers can annotate changes line-by-line in git diffs, provide structured feedback, and navigate plans visually rather than through raw text alone. ...

Downloads: 0 This Week

Last Update: 23 hours ago
See Project
17

Better Chatbot

Just a Better Chatbot. Powered by MCP Client & Workflows

...Integrates all major LLMs: OpenAI, Anthropic, Google, xAI, Ollama, and more. MCP protocol, web search, JS/Python code execution, data visualization. Custom agents, visual workflows, artifact generation. Custom agents, visual workflows, artifact generation. Realtime voice chat with full MCP tool integration.

Downloads: 0 This Week

Last Update: 2025-11-22
See Project
18

MCPJam

Postman for MCPs - A tool for testing and debugging MCPs

Inspector by MCPJam is a visual developer tool—akin to Postman—for testing and debugging MCP servers, with capabilities to simulate and trace tool execution via various transports and LLM integrations.

Downloads: 4 This Week

Last Update: 2026-04-27
See Project
19

Short Video Factory

AI tool for automatic batch short video creation and editing

Short Video Factory is an open source desktop application designed to simplify the creation of short-form videos using AI-driven automation. It enables users to generate product marketing clips and general content videos by combining simple prompt-based input with pre-prepared media assets. Short Video Factory integrates multiple stages of video production, including script generation, voice synthesis, video editing, and subtitle effects, into a single streamlined workflow. By leveraging AI...

Downloads: 13 This Week

Last Update: 2026-04-07
See Project
20

LongCat-Image

Foundation model for image generation

LongCat-Image is an open-source foundation model for image generation and editing created by the LongCat team at Meituan, designed to deliver high-quality visual outputs while remaining efficient and accessible for developers and researchers. Rather than relying on massive parameter counts typical of many cutting-edge models, LongCat-Image achieves strong photorealism, stable structure, and accurate bilingual (Chinese and English) text rendering with a more compact ~6-billion parameter architecture, making it competitive with much larger alternatives despite its relatively lean design. ...

Downloads: 10 This Week

Last Update: 2026-04-24
See Project
21

GLM-4.6V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...

Downloads: 0 This Week

Last Update: 2026-04-06
See Project
22

Super Magic

All-in-one AI productivity platform with agents, workflows, and IM

...Magic centers around a general-purpose AI agent system called Super Magic, which can autonomously understand tasks, plan actions, execute workflows, and perform error correction. Alongside this, Magic includes a visual workflow engine that enables users to design complex AI processes using a drag-and-drop interface without requiring extensive coding knowledge. It also provides an enterprise-grade instant messaging system that integrates AI conversations with internal communication, allowing teams to collaborate while leveraging intelligent assistants. ...

Downloads: 1 This Week

Last Update: 5 days ago
See Project
23

Label Studio

Label Studio is a multi-type data labeling and annotation tool

...The frontend part of Label Studio app lies in the frontend/ folder and written in React JSX. Multi-user labeling sign up and login, when you create an annotation it's tied to your account. Configurable label formats let you customize the visual interface to meet your specific labeling needs. Support for multiple data types including images, audio, text, HTML, time-series, and video.

Downloads: 18 This Week

Last Update: 2026-03-13
See Project
24

FramePack

Lets make video diffusion practical

FramePack explores compact representations for sequences of image frames, targeting tasks where many near-duplicate frames carry redundant information. The idea is to “pack” frames by detecting shared structure and storing differences efficiently, which can accelerate training or inference on video-like data. By reducing I/O and memory bandwidth, datasets become lighter to load while models still see the essential temporal variation. The repository demonstrates both packing and unpacking...

Downloads: 9 This Week

Last Update: 2025-10-21
See Project
25

Magazine Web PPT

A Claude Code Skill that turns prompts into magazine-style HTML decks

Magazine Web PPT is a specialized AI skill set designed to enhance the creation and structuring of PowerPoint presentations. It provides guidance on slide organization, storytelling, and visual design principles tailored for professional presentations. The system helps users transform raw ideas into coherent slide decks with clear messaging and logical flow. It emphasizes effective communication through structured layouts and concise content. The project is particularly useful for business, education, and consulting scenarios where presentation quality is critical. ...

Downloads: 6 This Week

Last Update: 7 days ago
See Project