Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "visual python" - Page 4

x

Sort By:

Relevance

Clear All Filters

OS

BSD 238
Linux 238
Windows 224
More...
Mac 204
ChromeOS 197
Desktop Operating Systems 7
Server Operating Systems 5
Mobile Operating Systems 3
Game Consoles 1

Category

Artificial Intelligence 96
Software Development 49
Multimedia 21
Scientific/Engineering 21
Business 20
System 18
Internet 14
Education 13
Games 11
Security 11
Communications 9
Desktop Environment 7
Formats and Protocols 6
Text Editors 4
Database 2
Mobile 1
Printing 1
Religion and Philosophy 1

License

OSI-Approved Open Source 211
Other License 3
Creative Commons Attribution License 2
Public Domain 2

Translations

English 56
German 9
Spanish 6
French 5
More...
Russian 5
Chinese (Simplified) 3
Italian 3
Polish 3
Chinese (Traditional) 2
Indonesian 2
Ukrainian 2
Brazilian Portuguese 1
Bulgarian 1
Czech 1
Dutch 1
Japanese 1

Programming Language

Status

Beta 32
Production/Stable 32
Pre-Alpha 18
Alpha 16
More...
Planning 9
Inactive 5
Mature 4

238 projects for "visual python" with 1 filter applied:

BSD Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
1

VGGT-Ω

[CVPR 2026 Oral] VGGT Omega

VGGT-Omega is a Facebook Research computer vision project for feed-forward camera and depth reconstruction. It takes images as input and predicts camera parameters, depth maps, confidence values, and related scene tokens. The project is associated with 3D understanding workflows where models infer scene geometry without a traditional multi-stage reconstruction pipeline. It includes pretrained model variants with different resolutions and text-alignment capabilities, though checkpoint access...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
2

Book6_First-Course-in-Data-Science

From Addition, Subtraction, Multiplication, and Division to ML

Book6_First-Course-in-Data-Science is an open-source educational project that serves as part of the “Iris Book” series focused on teaching data science and machine learning concepts through a combination of mathematics, programming, and visualization. The repository contains draft chapters, supporting Python code, and visual materials designed to guide readers from basic mathematical operations toward practical machine learning understanding. The goal of the project is to make complex topics such as statistics, algorithms, and data analysis more accessible to learners by breaking concepts into clear explanations supported by code examples and diagrams. ...

Downloads: 0 This Week

Last Update: 2026-05-01
See Project
3

Flock

Flock is a workflow-based low-code platform for building chatbots

Flock is a workflow-based low-code platform designed for building AI applications such as chatbots, retrieval-augmented generation systems, and multi-agent workflows. The platform uses a visual workflow architecture where different nodes represent processing steps such as input processing, model inference, retrieval operations, and tool execution. Developers can connect these nodes to create complex pipelines that orchestrate multiple language models and external services. Built on...

Downloads: 1 This Week

Last Update: 8 hours ago
See Project
4

Paper2Slides

From Paper to Presentation in One Click

Paper2Slides is an automation tool that converts research papers, reports, and other documents into polished slide decks and posters with minimal manual effort. It is designed to replace the repetitive work of turning dense technical documents into presentation-friendly structure by extracting key points, figures, and data into a coherent visual narrative. The system supports multiple input formats, so you can process PDFs and common office documents rather than being locked to a single file...

Downloads: 2 This Week

Last Update: 2026-05-20
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Nexent

Zero-code platform for building AI agents from natural language input

Nexent is an open source platform designed to enable users to create intelligent agents using natural language instead of traditional programming or visual orchestration tools. It focuses on a zero-code approach, allowing users to define workflows and agent behavior purely through language prompts, significantly lowering the barrier to entry for AI development. Built on the MCP ecosystem, Nexent integrates a wide range of tools, models, and data sources into a unified environment for agent...

Downloads: 1 This Week

Last Update: 2026-05-18
See Project
6

PyTorch3D

PyTorch3D is FAIR's library of reusable components for deep learning

PyTorch3D is a comprehensive library for 3D deep learning that brings differentiable rendering, geometric operations, and 3D data structures into the PyTorch ecosystem. It’s designed to make it easy to build and train neural networks that work directly with 3D data such as meshes, point clouds, and implicit surfaces. The library provides fast GPU-accelerated implementations of rendering pipelines, transformations, rasterization, and lighting—making it possible to compute gradients through...

Downloads: 1 This Week

Last Update: 2025-11-27
See Project
7

dots.ocr

Multilingual Document Layout Parsing in a Single Vision-Language Model

dots.ocr is a cutting-edge multilingual document parsing system built on a unified vision-language model that combines layout detection, text recognition, and structural understanding into a single architecture. Unlike traditional OCR pipelines that rely on multiple specialized components, dots.ocr integrates these processes end-to-end, reducing error propagation and improving consistency across tasks. The model is designed to recognize virtually any human script, making it highly effective...

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
8

docext

An on-premises, OCR-free unstructured data extraction

docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual...

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
9

MEDIUM_NoteBook

Repository containing notebooks of my posts on Medium

...The project is useful for learners who want to explore machine learning concepts interactively using Python and common data science libraries.

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
10

SteadyDancer

Harmonized and Coherent Human Image Animation

SteadyDancer is a research-oriented motion stabilization and dancer tracking system designed to analyze and correct motion in videos, making captured performances appear smoother and more stable while preserving expressiveness. It employs computer vision and motion modeling to estimate and reduce unwanted jitters, shakes, or camera wobbles — particularly in dance or movement sequences where traditional smoothing would distort intentional motion. By differentiating between intentional...

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
11

Grounded-Segment-Anything

Marrying Grounding DINO with Segment Anything & Stable Diffusion

Grounded-Segment-Anything is a research-oriented project that combines powerful open-set object detection with pixel-level segmentation and subsequent creative workflows, effectively enabling detection, segmentation, and high-level vision tasks guided by free-form text prompts. The core idea behind the project is to pair Grounding DINO — a zero-shot object detector that can locate objects described by natural language — with Segment Anything Model (SAM), which can produce detailed masks for...

Downloads: 0 This Week

Last Update: 2026-02-03
See Project
12

Tally

Let agents classify your bank transactions

Tally is an open-source, AI-assisted tool designed to automate the classification of personal financial transactions, helping users turn raw bank data into meaningful categories without manual tagging. At its core, Tally pairs a local rule engine with large language models so that an AI assistant (like Claude Code, Copilot, or any CLI agent) interprets, suggests, and categorizes expenses, savings, subscriptions, and income events based on your own rules and behavior. It generates...

Downloads: 0 This Week

Last Update: 2026-01-30
See Project
13

Qwen3-VL-Embedding

Multimodal embedding and reranking models built on Qwen3-VL

Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a...

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
14

MiniMind-V

"Big Model" trains a visual multimodal VLM with 26M parameters

MiniMind-V is an experimental open-source project that aims to train a very small multimodal vision–language model (VLM) from scratch with extremely low compute and cost, making research and experimentation accessible to more people. The repository showcases training workflows and code designed to produce a 26-million parameter model—including both image and text capabilities—using minimal resources in very little time, reflecting a trend toward democratizing AI research. MiniMind-V combines...

Downloads: 0 This Week

Last Update: 2026-01-21
See Project
15

LLaMA-Mesh

Unifying 3D Mesh Generation with Language Models

LLaMA-Mesh is a research framework that extends large language models so they can understand and generate 3D mesh data alongside text. The system introduces a method for representing 3D meshes in a textual format by encoding vertex coordinates and face definitions as sequences that can be processed by a language model. By serializing 3D geometry into text tokens, the approach allows existing transformer architectures to generate and interpret 3D models without requiring specialized visual...

Downloads: 1 This Week

Last Update: 2026-03-09
See Project
16

MolmoWeb

Open multimodal web agent built by Ai2

MolmoWeb is an open-source multimodal web agent designed to autonomously navigate and interact with web browsers using vision-language models, representing a significant step toward fully agentic AI systems that can operate in real-world digital environments. The system takes natural language instructions and translates them into sequences of browser actions such as clicking, typing, scrolling, and navigating, effectively performing tasks on behalf of the user. Unlike traditional automation...

Downloads: 0 This Week

Last Update: 2026-04-10
See Project
17

Diffusion for World Modeling

Learning agent trained in a diffusion world model

Diffusion for World Modeling is an experimental reinforcement learning system that trains intelligent agents inside a simulated environment generated by a diffusion-based world model. The project introduces the idea of using diffusion models, commonly used for image generation, to simulate the dynamics of an environment and predict future states based on previous observations and actions. Instead of interacting directly with a real environment, the reinforcement learning agent learns within...

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
18

Oasis

Inference script for Oasis 500M

Open-Oasis provides inference code and released weights for Oasis 500M, an interactive world model that generates gameplay frames conditioned on user keyboard input. Instead of rendering a pre-built game world, the system produces the next visual state via a diffusion-transformer approach, effectively “imagining” the world response to your actions in real time. The project focuses on enabling action-conditional frame generation so developers can experiment with interactive, model-generated...

Downloads: 0 This Week

Last Update: 2026-01-06
See Project
19

Multimodal

TorchMultimodal is a PyTorch library

This project, also known as TorchMultimodal, is a PyTorch library for building, training, and experimenting with multimodal, multi-task models at scale. The library provides modular building blocks such as encoders, fusion modules, loss functions, and transformations that support combining modalities (vision, text, audio, etc.) in unified architectures. It includes a collection of ready model classes—like ALBEF, CLIP, BLIP-2, COCA, FLAVA, MDETR, and Omnivore—that serve as reference...

Downloads: 0 This Week

Last Update: 2026-01-12
See Project
20

MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution

MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
21

VGGT

[CVPR 2025 Best Paper Award] VGGT

VGGT is a transformer-based framework aimed at unifying classic visual geometry tasks—such as depth estimation, camera pose recovery, point tracking, and correspondence—under a single model. Rather than training separate networks per task, it shares an encoder and leverages geometric heads/decoders to infer structure and motion from images or short clips. The design emphasizes consistent geometric reasoning: outputs from one head (e.g., correspondences or tracks) reinforce others (e.g., pose...

Downloads: 0 This Week

Last Update: 2026-05-19
See Project
22

shot-scraper

A command-line utility for taking automated screenshots of websites

shot-scraper is a command-line utility for taking automated screenshots of web pages using a headless browser engine. After installation, a single command can capture a full-page screenshot of a URL and save it to a file, making it ideal for documentation, monitoring, and visual regression tasks. Under the hood it uses a modern browser (installed via a one-time shot-scraper install step) and exposes options for viewport size, full-page versus clipped screenshots, and device emulation. Beyond...

Downloads: 1 This Week

Last Update: 2026-02-01
See Project
23

hCaptcha Challenger

Gracefully face hCaptcha challenge with multimodal llms

hCaptcha Challenger is an open-source automation framework designed to solve hCaptcha verification challenges using computer vision models and multimodal reasoning techniques. The project integrates machine learning models capable of analyzing visual captcha tasks and identifying the correct responses required to pass the verification process. Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
24

Map-Anything

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Map-Anything is a universal, feed-forward transformer for metric 3D reconstruction that predicts a scene’s geometry and camera parameters directly from visual inputs. Instead of stitching together many task-specific models, it uses a single architecture that supports a wide range of 3D tasks—multi-image structure-from-motion, multi-view stereo, monocular metric depth, registration, depth completion, and more. The model flexibly accepts different input combinations (images, intrinsics, poses,...

Downloads: 0 This Week

Last Update: 2026-03-23
See Project
25

InfiniteYou

Flexible Photo Recrafting While Preserving Your Identity

InfiniteYou is an open-source image-generation and “identity-preserving image editing / generation” framework from ByteDance, designed to generate high-fidelity images that preserve a subject’s identity while allowing flexible editing or re-creation according to textual prompts. Using an architecture built around diffusion transformers (DiTs), InfiniteYou introduces a component called InfuseNet that injects identity features derived from reference images into the generation process — via...

Downloads: 0 This Week

Last Update: 2025-12-02
See Project

Previous
1
2
3
You're on page 4
5
6
7
8
Next

Related Categories

Artificial Intelligence

Software Development

Multimedia

Scientific/Engineering

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise