Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "visual python" - Page 2

x

Sort By:

Relevance

Clear All Filters

OS

BSD 96
ChromeOS 96
Linux 96
More...
Mac 96
Windows 96

Category

Artificial Intelligence 96
Business 1
Multimedia 1
Software Development 1

License

OSI-Approved Open Source 86
Creative Commons Attribution License 1

Translations

English 2
Russian 1
Ukrainian 1

Programming Language

Python 90
TypeScript 4
Unix Shell 3
Java 2
More...
JavaScript 2
C++ 1
Lisp 1
PHP 1

Status

Pre-Alpha 1
Beta 1

96 projects for "visual python" with 2 filters applied:

Artificial Intelligence BSD Clear Filters & Widen Search

Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

GitDiagram

AI tool that converts GitHub repositories into interactive diagrams

GitDiagram is an open source web application designed to help developers quickly understand the structure and architecture of GitHub repositories by automatically generating interactive diagrams. It analyzes repository metadata such as the file tree and project documentation to build a visual representation of how different components of a project relate to one another. It uses an AI-powered pipeline to interpret repository structure and transform that information into system design diagrams...

Downloads: 3 This Week

Last Update: 6 days ago
See Project
2

ComfyUI-3D-Pack

An extensive node suite that enables ComfyUI to process 3D inputs

ComfyUI-3D-Pack is an extension package for the ComfyUI visual AI workflow environment that enables users to generate and manipulate 3D assets using advanced machine learning techniques. ComfyUI itself is a node-based interface for designing and executing generative AI pipelines, and this extension expands its capabilities by introducing nodes specifically designed for working with three-dimensional data. The package allows the platform to process inputs such as meshes and UV textures and...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
3

DriveLM

Driving with Graph Visual Question Answering

DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
4

LlamaGen

Autoregressive Model Beats Diffusion

LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

FastVLM

This repository contains the official implementation of FastVLM

FastVLM is an efficiency-focused vision-language modeling stack that introduces FastViTHD, a hybrid vision encoder engineered to emit fewer visual tokens and slash encoding time, especially for high-resolution images. Instead of elaborate pruning stages, the design trades off resolution and token count through input scaling, simplifying the pipeline while maintaining strong accuracy. Reported results highlight dramatic speedups in time-to-first-token and competitive quality versus...

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
6

ML Ferret

Refer and Ground Anything Anywhere at Any Granularity

Ferret is Apple’s end-to-end multimodal large language model designed specifically for flexible referring and grounding: it can understand references of any granularity (boxes, points, free-form regions) and then ground open-vocabulary descriptions back onto the image. The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo...

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
7

HunyuanVideo-Foley

Multimodal Diffusion with Representation Alignment

HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional...

Downloads: 1 This Week

Last Update: 2025-09-28
See Project
8

GLM-4.5V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...

Downloads: 1 This Week

Last Update: 2026-05-16
See Project
9

Super Magic

All-in-one AI productivity platform with agents, workflows, and IM

Magic is an open source all-in-one AI productivity platform designed to help organizations build, deploy, and scale AI-driven applications efficiently. It is not a single tool but a complete product ecosystem composed of multiple integrated systems that work together to enhance productivity across different business scenarios. Magic centers around a general-purpose AI agent system called Super Magic, which can autonomously understand tasks, plan actions, execute workflows, and perform error...

Downloads: 2 This Week

Last Update: 14 hours ago
See Project
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
10

VOID

Video Object and Interaction Deletion

VOID is an advanced AI video processing system developed by Netflix that focuses on removing objects from videos while preserving the physical and visual realism of the surrounding environment. Unlike traditional inpainting methods that only erase pixels or simple artifacts, VOID models the full interaction dynamics between objects and their environment, including shadows, reflections, and even physical consequences such as movement or balance changes. Built on top of transformer-based...

Downloads: 0 This Week

Last Update: 2026-05-04
See Project
11

machine-learning-refined

Master the fundamentals of machine learning, deep learning

machine-learning-refined is an educational repository designed to help students and practitioners understand machine learning algorithms through intuitive explanations and interactive examples. The project accompanies a series of textbooks and teaching materials that focus on making machine learning concepts accessible through visual demonstrations and simple code implementations. Instead of presenting algorithms purely through mathematical derivations, the repository emphasizes geometric...

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
12

VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs)

VLMEvalKit is an open-source evaluation toolkit designed for benchmarking large vision-language models that combine visual understanding with natural language reasoning. The toolkit provides a unified framework that allows researchers and developers to evaluate multimodal models across a wide range of datasets and standardized benchmarks with minimal setup. Instead of requiring complex data preparation pipelines or multiple repositories for each benchmark, the system enables evaluation...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
13

firerpa LAMDA

The most powerful Android RPA agent framework

lamda is an Android RPA agent framework that provides visual remote desktop control and automation at scale, geared toward testing, automation validation, and device management. It exposes a clean UI to monitor and interact with connected devices and includes tooling to script actions reliably across apps and OS versions. The project emphasizes low-friction setup and powerful control primitives so teams can move from interactive validation to repeatable automation. A public wiki, releases,...

Downloads: 0 This Week

Last Update: 2026-03-22
See Project
14

LongCat-Image

Foundation model for image generation

LongCat-Image is an open-source foundation model for image generation and editing created by the LongCat team at Meituan, designed to deliver high-quality visual outputs while remaining efficient and accessible for developers and researchers. Rather than relying on massive parameter counts typical of many cutting-edge models, LongCat-Image achieves strong photorealism, stable structure, and accurate bilingual (Chinese and English) text rendering with a more compact ~6-billion parameter...

Downloads: 7 This Week

Last Update: 2026-05-09
See Project
15

Sa2VA

Official Repo For "Sa2VA: Marrying SAM2 with LLaVA

Sa2VA is a cutting-edge open-source multi-modal large language model (MLLM) developed by ByteDance that unifies dense segmentation, visual understanding, and language-based reasoning across both images and videos. It merges the segmentation power of a state-of-the-art video segmentation model (based on SAM‑2) with the vision-language reasoning capabilities of a strong LLM backbone (derived from models like InternVL2.5 / Qwen-VL series), yielding a system that can answer questions about...

Downloads: 0 This Week

Last Update: 2025-12-02
See Project
16

Phi-3-MLX

Phi-3.5 for Mac: Locally-run Vision and Language Models

Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.

Downloads: 2 This Week

Last Update: 2025-03-13
See Project
17

PaperBanana

Extension of Google Research’s PaperBanana

PaperBanana is an open-source agentic framework designed to automatically generate publication-quality academic diagrams and statistical plots directly from text descriptions. The project focuses on helping researchers, educators, and data scientists transform conceptual descriptions of figures into structured visual outputs suitable for research papers, presentations, and technical reports. Instead of manually designing charts or diagrams using traditional visualization tools, users can...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
18

AppAgent

Multimodal Agents as Smartphone Users, an LLM-based multimodal agent

AppAgent is an open-source multimodal agent framework designed to enable large language models to operate smartphone applications through natural interactions with graphical user interfaces. The system allows an AI agent to interpret visual information from the screen and translate natural language instructions into actions such as tapping, swiping, and navigating between application screens. Instead of requiring backend access to application APIs, the framework interacts with apps the same...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
19

GLM-4.6V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...

Downloads: 0 This Week

Last Update: 2026-05-16
See Project
20

InternVL

A Pioneering Open-Source Alternative to GPT-4o

InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. The model supports a...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
21

OSWorld

Benchmarking Multimodal Agents for Open-Ended Tasks

OSWorld is an open-source synthetic world environment designed for embodied AI research and multi-agent learning. It provides a richly simulated 3D world where multiple agents can interact, perform tasks, and learn complex behaviors. OSWorld emphasizes multi-modal interaction, enabling agents to process visual, auditory, and symbolic data for grounded learning in a simulated world.

Downloads: 0 This Week

Last Update: 2025-03-13
See Project
22

FireRed-Image-Edit

General-purpose image editing model that delivers high-fidelity

FireRed-Image-Edit is an open-source general-purpose image editing model and toolset designed to deliver high-fidelity, visually coherent edits across a wide range of editing tasks, from simple object modifications to complex enhancements like restoration and style preservation. It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong...

Downloads: 4 This Week

Last Update: 2026-04-03
See Project
23

SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow

SimpleHTR is an open-source implementation of a handwriting text recognition system based on deep learning techniques. The project focuses on converting images of handwritten text into machine-readable digital text using neural networks. The system uses a combination of convolutional neural networks and recurrent neural networks to extract visual features and model sequential character patterns in handwriting. It also employs connectionist temporal classification (CTC) to align predicted...

Downloads: 2 This Week

Last Update: 2026-03-12
See Project
24

Context Engineering

A frontier, first-principles handbook

Context Engineering is a comprehensive, open-source project serving as a first-principles handbook for the emerging discipline of context design and optimization in AI. Moving beyond traditional prompt engineering, this repository defines and explores how to craft and provide complete context payloads — not just single prompts — to large language models so they can perform tasks more reliably and intelligently. It takes inspiration from thought leaders like Andrej Karpathy and bridges theory...

Downloads: 2 This Week

Last Update: 2026-02-27
See Project
25

Wan Move

Motion-controllable Video Generation via Latent Trajectory Guidance

Wan Move is an open-source research codebase for motion-controllable video generation that focuses on enabling fine-grained control of motion within generative video models. It is designed to guide the temporal evolution of visual content by leveraging latent trajectory guidance, allowing users to manipulate how objects move over time without modifying the underlying generative architecture. By representing motion information as dense point trajectories and integrating them into the latent...

Downloads: 2 This Week

Last Update: 2026-01-30
See Project

Previous
1
You're on page 2
3
4
Next

Related Searches

phi

Related Categories

Artificial Intelligence

Business

Multimedia

Software Development

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise