Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Large Language Models (LLM)
Search Results

Search Results for "visual python"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 32
Mac 31
Windows 30
More...
BSD 23
ChromeOS 23

Category

Artificial Intelligence 32

License

OSI-Approved Open Source 32

Programming Language

Python 30
TypeScript 2
Unix Shell 2

Showing 32 open source projects for "visual python"

View related business solutions

Large Language Models (LLM) Clear Filters & Widen Search

Stop Storing Third-Party Tokens in Your Database
Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.

Try Auth0 for Free
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

InternLM-XComposer-2.5

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

InternLM-XComposer is an open-source multimodal AI system designed to generate long-form content that combines text with visual elements such as images and diagrams. The model is built on top of the InternLM language model architecture and extends its capabilities to handle multimodal inputs and outputs. Instead of producing only textual responses, the system can generate visually enriched documents such as illustrated articles, presentations, and educational materials. It incorporates...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
2

InternGPT

Open source demo platform where you can easily showcase your AI models

InternGPT is an open-source multimodal AI framework designed to extend large language models beyond text interactions into visual reasoning and image manipulation tasks. The system integrates conversational AI with computer vision models so users can interact with images, videos, and visual environments through natural language instructions. Unlike traditional chat systems that rely solely on text prompts, InternGPT allows users to interact with visual content using both language and...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
3

Skywork-R1V4

Skywork-R1V is an advanced multimodal AI model series

Skywork-R1V is an open-source multimodal reasoning model designed to extend the capabilities of large language models into vision-language tasks that require complex logical reasoning. The project introduces a model architecture that transfers the reasoning abilities of advanced text-based models into visual domains so the system can interpret images and perform multi-step reasoning about them. Instead of retraining both language and vision models from scratch, the framework uses a...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
4

LLM Vision

Visual intelligence for your home.

LLM Vision is an open-source integration for Home Assistant that adds multimodal large language model capabilities to smart home environments. The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process...

Downloads: 4 This Week

Last Update: 1 day ago
See Project
Secure File Transfer for Windows with Cerberus by Redwood
Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.

Try for Free
5

CogVLM

A state-of-the-art open visual language model

CogVLM is an open-source visual–language model suite—and its GUI-oriented sibling CogAgent—aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks. The repo provides multiple ways to run models...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
6

LISA

LISA: Reasoning Segmentation via Large Language Model

LISA is an open-source multimodal AI system designed to enable language models to perform pixel-level reasoning and segmentation tasks on images. The project introduces a framework where a large language model can interpret natural language instructions and produce segmentation masks that highlight relevant regions in an image. Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual...

Downloads: 2 This Week

Last Update: 2026-03-06
See Project
7

StarVector

StarVector is a foundation model for SVG generation

StarVector is a multimodal foundation model designed for generating Scalable Vector Graphics (SVG) from images or textual descriptions. The system treats vector graphics creation as a code generation problem, producing SVG code that can render detailed vector images. Its architecture combines computer vision techniques with language modeling capabilities so it can understand visual inputs and textual prompts simultaneously. The model converts raster images or text instructions into...

Downloads: 1 This Week

Last Update: 2026-03-05
See Project
8

Langflow

Low-code app builder for RAG and multi-agent AI applications

Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.

Downloads: 5 This Week

Last Update: 1 day ago
See Project
9

DriveLM

Driving with Graph Visual Question Answering

DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
10

LlamaGen

Autoregressive Model Beats Diffusion

LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
11

ML Ferret

Refer and Ground Anything Anywhere at Any Granularity

Ferret is Apple’s end-to-end multimodal large language model designed specifically for flexible referring and grounding: it can understand references of any granularity (boxes, points, free-form regions) and then ground open-vocabulary descriptions back onto the image. The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo...

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
12

UFO³

Weaving the Digital Agent Galaxy

UFO is an open-source framework developed by Microsoft for building intelligent agents that automate interactions with graphical user interfaces on the Windows operating system. The system allows users to issue natural language instructions that are translated into automated actions across multiple desktop applications. Using a dual-agent architecture, the framework analyzes both visual interface elements and system control structures in order to understand how applications should be...

Downloads: 1 This Week

Last Update: 2026-05-15
See Project
13

CogView4

CogView4, CogView3-Plus and CogView3(ECCV 2024)

CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets,...

Downloads: 1 This Week

Last Update: 2 days ago
See Project
14

VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs)

VLMEvalKit is an open-source evaluation toolkit designed for benchmarking large vision-language models that combine visual understanding with natural language reasoning. The toolkit provides a unified framework that allows researchers and developers to evaluate multimodal models across a wide range of datasets and standardized benchmarks with minimal setup. Instead of requiring complex data preparation pipelines or multiple repositories for each benchmark, the system enables evaluation...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
15

Gemma

Gemma open-weight LLM library, from Google DeepMind

Gemma, developed by Google DeepMind, is a family of open-weights large language models (LLMs) built upon the research and technology behind Gemini. This repository provides the official implementation of the Gemma PyPI package, a JAX-based library that enables users to load, interact with, and fine-tune Gemma models. The framework supports both text and multi-modal input, allowing natural language conversations that incorporate visual content such as images. It includes APIs for...

Downloads: 5 This Week

Last Update: 2026-05-20
See Project
16

InternVL

A Pioneering Open-Source Alternative to GPT-4o

InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. The model supports a...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
17

PaperBanana

Extension of Google Research’s PaperBanana

PaperBanana is an open-source agentic framework designed to automatically generate publication-quality academic diagrams and statistical plots directly from text descriptions. The project focuses on helping researchers, educators, and data scientists transform conceptual descriptions of figures into structured visual outputs suitable for research papers, presentations, and technical reports. Instead of manually designing charts or diagrams using traditional visualization tools, users can...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
18

AppAgent

Multimodal Agents as Smartphone Users, an LLM-based multimodal agent

AppAgent is an open-source multimodal agent framework designed to enable large language models to operate smartphone applications through natural interactions with graphical user interfaces. The system allows an AI agent to interpret visual information from the screen and translate natural language instructions into actions such as tapping, swiping, and navigating between application screens. Instead of requiring backend access to application APIs, the framework interacts with apps the same...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
19

VisualGLM-6B

Chinese and English multimodal conversational language model

VisualGLM-6B is an open-source multimodal conversational language model developed by ZhipuAI that supports both images and text in Chinese and English. It builds on the ChatGLM-6B backbone, with 6.2 billion language parameters, and incorporates a BLIP2-Qformer visual module to connect vision and language. In total, the model has 7.8 billion parameters. Trained on a large bilingual dataset — including 30 million high-quality Chinese image-text pairs from CogView and 300 million English pairs...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
20

Context Engineering

A frontier, first-principles handbook

Context Engineering is a comprehensive, open-source project serving as a first-principles handbook for the emerging discipline of context design and optimization in AI. Moving beyond traditional prompt engineering, this repository defines and explores how to craft and provide complete context payloads — not just single prompts — to large language models so they can perform tasks more reliably and intelligently. It takes inspiration from thought leaders like Andrej Karpathy and bridges theory...

Downloads: 4 This Week

Last Update: 2026-02-27
See Project
21

AI-Codereview-Gitlab

GitLab automatic code review tool based on large models

AI-Codereview-Gitlab is an open-source automation tool that integrates large language models into the GitLab development workflow to perform automated code reviews. The system monitors GitLab repositories and analyzes commits or merge requests using AI models to identify potential issues, coding mistakes, and quality improvements before the code is merged. By leveraging multiple large language model providers—including OpenAI, DeepSeek, ZhipuAI, or local models through Ollama—the platform...

Downloads: 12 This Week

Last Update: 2026-05-19
See Project
22

Phi-3-MLX

Phi-3.5 for Mac: Locally-run Vision and Language Models

Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.

Downloads: 1 This Week

Last Update: 2025-03-13
See Project
23

Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM

Qwen3-Omni is a natively end-to-end multilingual omni-modal foundation model that processes text, images, audio, and video and delivers real-time streaming responses in text and natural speech. It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and...

Downloads: 2 This Week

Last Update: 2026-04-23
See Project
24

Flock

Flock is a workflow-based low-code platform for building chatbots

Flock is a workflow-based low-code platform designed for building AI applications such as chatbots, retrieval-augmented generation systems, and multi-agent workflows. The platform uses a visual workflow architecture where different nodes represent processing steps such as input processing, model inference, retrieval operations, and tool execution. Developers can connect these nodes to create complex pipelines that orchestrate multiple language models and external services. Built on...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
25

Paper2Slides

From Paper to Presentation in One Click

Paper2Slides is an automation tool that converts research papers, reports, and other documents into polished slide decks and posters with minimal manual effort. It is designed to replace the repetitive work of turning dense technical documents into presentation-friendly structure by extracting key points, figures, and data into a coherent visual narrative. The system supports multiple input formats, so you can process PDFs and common office documents rather than being locked to a single file...

Downloads: 2 This Week

Last Update: 2026-05-20
See Project

Previous
You're on page 1
2
Next

Related Searches

langflow

phi

tiny app builder

n8n

khoj

Related Categories

Artificial Intelligence

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise