Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "audio gui interface" - Page 2

x

Sort By:

Relevance

Clear All Filters

OS

Windows 53
Linux 48
Mac 42
More...
BSD 17
ChromeOS 16

Category

Artificial Intelligence 56
Multimedia 6
Software Development 4
Business 1
Database 1
Scientific/Engineering 1
Security 1

License

OSI-Approved Open Source 49
Public Domain 2
Creative Commons Attribution License 1
GNU Free Documentation License 1
More...
Other License 1

Translations

English 6
German 1
Spanish 1
Turkish 1

Programming Language

Python 56
C++ 3
JavaScript 3
C 2
Unix Shell 2
More...
Go 1
Java 1
Perl 1

Status

Production/Stable 8
Beta 3
Pre-Alpha 1
Mature 1

Showing 56 open source projects for "audio gui interface"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

labelme Image Polygonal Annotation

Image polygonal annotation with Python

Labelme is a graphical image annotation tool. It is written in Python and uses Qt for its graphical interface. Image annotation for polygon, rectangle, circle, line and point. Image flag annotation for classification and cleaning. Video annotation. (video annotation). GUI customization (predefined labels / flags, auto-saving, label validation, etc). Exporting VOC-format dataset for semantic/instance segmentation. (semantic segmentation, instance segmentation).

Downloads: 12 This Week

Last Update: 2 days ago
See Project
2

Pixeltable

Data Infrastructure providing an approach to multimodal AI workloads

...Developers define data transformations and AI operations using computed columns on tables, allowing pipelines to evolve incrementally as new data or models are added. The framework supports multimodal content including images, video, text, and audio, enabling applications such as retrieval-augmented generation systems, semantic search, and multimedia analytics.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
3

StreamSpeech

StreamSpeech is a seamless model for offline speech recognition

StreamSpeech is an “all-in-one” speech model designed to perform offline and simultaneous speech recognition, speech translation, and speech synthesis within a single unified architecture. Developed as part of an ACL 2024 paper, it targets streaming and low-latency scenarios where intermediate results and final translations or synthetic speech must be produced continuously as audio is being received. The model supports eight tasks: offline ASR, speech-to-text translation, speech-to-speech...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
4

h2oGPT

Private chat with local GPT with document, images, video, etc.

h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...

Downloads: 0 This Week

Last Update: 2025-02-22
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

npcpy

The AI toolkit for the AI developer

npcpy is a Python-based agent framework and command-line toolkit (the NPC Shell) for developers to build, test, and integrate AI agents into their workflows, including both command-line and GUI interfaces via NPC Studio. Welcome to npcpy, the core library of the NPC Toolkit that supercharges natural language processing pipelines and agent tooling. npcpy is a flexible framework for building state-of-the-art applications and conducting novel research with LLMs. The structure of npcpy also...

Downloads: 4 This Week

Last Update: 1 day ago
See Project
6

UFO³

Weaving the Digital Agent Galaxy

UFO is an open-source framework developed by Microsoft for building intelligent agents that automate interactions with graphical user interfaces on the Windows operating system. The system allows users to issue natural language instructions that are translated into automated actions across multiple desktop applications. Using a dual-agent architecture, the framework analyzes both visual interface elements and system control structures in order to understand how applications should be...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
7

AnyTool

AnyTool: Universal Tool-Use Layer for AI Agents

AnyTool is an open-source universal tool-use layer for AI agents that addresses the critical problem of how autonomous agents reliably interact with external tools and environments. Rather than having each agent handle tool invocation logic on its own, AnyTool provides a standardized interface and orchestrator that intelligently selects and manages tools, reduces context overhead, and improves execution reliability across diverse capabilities like web APIs, local commands, and GUI automation. It uses progressive filtering and adaptive orchestration to ensure the right tools are retrieved efficiently and work cohesively with agents of varying complexity, scaling to thousands of tools with self-optimizing behavior. ...

Downloads: 0 This Week

Last Update: 2026-02-28
See Project
8

OmAgent

Build multimodal language agents for fast prototype and production

OmAgent is an open-source Python framework designed to simplify the development of multimodal language agents that can reason, plan, and interact with different types of data sources. The framework provides abstractions and infrastructure for building AI agents that operate on text, images, video, and audio while maintaining a relatively simple interface for developers. Instead of forcing developers to implement complex orchestration logic manually, the system manages task scheduling, worker coordination, and node optimization behind the scenes. Its architecture uses a graph-based workflow engine where tasks are represented as nodes in a directed workflow, enabling modular composition of complex reasoning pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
9

Matcha-TTS

A fast TTS architecture with conditional flow matching

Matcha-TTS is a non-autoregressive neural text-to-speech architecture that uses conditional flow matching to generate speech quickly while maintaining natural quality. It models speech as an ODE-based generative process, and conditional flow matching lets it reach high-quality audio in only a few synthesis steps, which greatly reduces latency compared to score-matching diffusion approaches. The model is fully probabilistic, so it can generate diverse realizations of the same text while still sounding stable and intelligible. The repository provides an end-to-end TTS pipeline: a PyTorch/Lightning training stack, configuration files, pre-trained checkpoints, a command-line interface, and a Gradio app for interactive testing. ...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
8 Monitoring Tools in One APM. Install in 5 Minutes.
Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.

Start Free
10

Agent S

Agent S: an open agentic framework that uses computers like a human

Agent S is an open-source agentic framework designed to enable autonomous computer use through an Agent-Computer Interface (ACI). Built to operate graphical user interfaces like a human, it allows AI agents to perceive screens, reason about tasks, and execute actions across macOS, Windows, and Linux systems. The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines...

Downloads: 7 This Week

Last Update: 2025-12-16
See Project
11

WhatsApp MCP Server

WhatsApp MCP server enabling AI access to chats and messaging

...It supports both sending and receiving messages, including various media types such as images, audio, videos, and documents. It integrates with AI applications like Claude through MCP, enabling conversational automation and contextual message retrieval.

Downloads: 1 This Week

Last Update: 2026-03-17
See Project
12

CC2.TV / CC2 - Audio- und TV-Datenbank

Meta-Datenbank-Anwendung für die Audio- und TV-Sendungen des CC2.TV

Dieses Programm stellt eine Meta-Datenbank-Anwendung für die Audio- und Video-Sendungen des CC2.TV für GNU/Linux Systeme zur Verfügung. Es ermöglicht das Durchsuchen, Verwalten und Abspielen der umfangreichen Inhalte des CC2.TV-Audiocasts und -Videocasts. Ziel ist es, die über 3000 Audiocast-Themen und über 1000 Videocast-Themen, die sich auf Computerthemen, Technik und gesellschaftliche Aspekte konzentrieren, komfortabel zugänglich zu machen. Für die volle Funktionalität,...

Downloads: 0 This Week

Last Update: 2025-11-17
See Project
13

Deface GUI - Face Anonymization Tool

Graphical User Interface Face Anonymization Tool

This application is a professional tool with a graphical user interface that enables anonymization of faces using the Deface Engine. Cross-Platform Compatible (Linux-Windows) NOTE: To use on Windows, first install Python. Then, if necessary, install “pip install deface” (only if necessary).

1 Review

Downloads: 20 This Week

Last Update: 2025-10-13
See Project
14

ollama_manager_gui

A graphical manager for ollama that can manage your LLMs

This app will help install ollama and LLMs using the gui provided by this app. It checks for ollama when launched and if it doesn't exist it will help by bringing you to the ollama site for download. This app is heavily upgraded and now also works properly on Linux. It now has progress bars and many many many improvements. It can launch the LLM by clicking the link. it can launch multiple LLMs in separate windows. It can also remove an installed LLM. There is a confirmation...

Downloads: 2 This Week

Last Update: 2025-08-14
See Project
15

MuJoCo MPC

Real-time behaviour synthesis with MuJoCo, using Predictive Control

MuJoCo MPC (MJPC) is an advanced interactive framework for real-time model predictive control (MPC) built on top of the MuJoCo physics engine, developed by Google DeepMind. It allows researchers and roboticists to design, visualize, and execute complex control tasks for simulated or real robotic systems. MJPC integrates a high-performance GUI and multiple predictive control algorithms, including iLQG, gradient descent, and Predictive Sampling — a competitive, derivative-free method that...

Downloads: 0 This Week

Last Update: 2025-10-09
See Project
16

vocal-separate

An extremely simple tool for separating vocals and background music

...Users can drag and drop an audio or video file onto the interface to begin separation, choosing between two, four, or five stems, which allows isolating specific components like vocals, bass, drums, or piano depending on the chosen model. After processing, the tool outputs separate WAV files for each extracted stem, making it easy to export and use in audio editing or remix software.

Downloads: 6 This Week

Last Update: 2026-02-17
See Project
17

Whisper Batch Transcriber

Unlimited, private and free Speech-To-Text program

...(I did this because compiling to exe made it slower) - I made it as easy as possible for a layperson to use, so despite its crude looks, its as good as a GUI application experience. Enjoy freedom!

Downloads: 19 This Week

Last Update: 2025-07-16
See Project
18

DragGAN

Official Code for DragGAN (SIGGRAPH 2023)

...It combines feature-based motion supervision with a robust point-tracking mechanism to ensure accurate edits during user interaction. DragGAN has gained attention for making complex image edits, such as pose changes or shape adjustments, accessible through an intuitive interface. The repository provides code and GUI tooling that allow researchers and advanced users to experiment with this next-generation controllable image manipulation technique.

Downloads: 0 This Week

Last Update: 2026-02-24
See Project
19

Audio Webui

A webui for different audio related Neural Networks

Audio Webui is a Gradio-based web user interface that unifies a wide range of audio-related neural networks under a single, accessible front end. It is designed as an “all-in-one” environment where users can experiment with text-to-speech, voice cloning, generative music, and other neural audio models without writing boilerplate code.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
20

StoryTeller

Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.

...Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals. To develop locally, install dev dependencies and install pre-commit hooks. This will automatically trigger linting and code quality checks before each commit. The final video will be saved as /out/out.mp4, alongside other intermediate images, audio files, and subtitles. For more advanced use cases, you can also directly interface with Story Teller in Python code.

Downloads: 1 This Week

Last Update: 2023-08-22
See Project
21

Txt-2-Mp3 6.3 Mark 2 [I.S.A]

Txt-2-Mp3 6.3 Mark 2 [Improved.Simplified.Alternative]

'Txt2Mp3' an desktop application developed using python 3.6.8 and other add-on libaries. Can convert texts into audio (.mp3) files using gTTS (Google Text-to-speech) api module library. Compatible only for windows OS.

Downloads: 1 This Week

Last Update: 2023-06-07
See Project
22

Img2Txt

Img2Txt - Extract Text From Images using AI

Important: If you are sharing this program. Please Include the official Download Link What is Img2Txt? Img2Txt is a Python-based application packaged using PyInstaller that utilizes the power of pytesseract, an AI-powered optical character recognition (OCR) library, to extract text from images and convert it into plain text. The application features a simple and modern user-friendly interface created using customtkinter, allowing users to easily process images and obtain the text...

1 Review

Downloads: 2 This Week

Last Update: 2023-08-15
See Project
23

Riffusion

Real-time music generation using stable diffusion techniques AI

...Riffusion (hobby) serves as the core implementation for audio and image processing, providing essential building blocks for generating music from text prompts. It includes both developer-oriented tools and user-facing components such as a command-line interface and an interactive Streamlit application for experimentation. Additionally, it can run as a Flask server to expose model inference through an API, enabling integration with other applications or services.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
24

AI Atelier

Based on the Disco Diffusion, version of the AI art creation software

Based on the Disco Diffusion, we have developed a Chinese & English version of the AI art creation software "AI Atelier". We offer both Text-To-Image models (Disco Diffusion and VQGAN+CLIP) and Text-To-Text (GPT-J-6B and GPT-NEOX-20B) as options. Making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. When a modified version is used to provide a...

Downloads: 0 This Week

Last Update: 2023-03-23
See Project
25

Mocking Bird

Clone a voice in 5 seconds to generate arbitrary speech in real-time

...The codebase is implemented in Python (with PyTorch) and includes modules for encoder, synthesizer, vocoder, preprocessing, and inference, as well as demo scripts and a web-server interface for easier experimentation or deployment. MockingBird supports both using pretrained models and training your own synthesizer (with custom datasets), giving flexibility for voice-cloning or custom-voice synthesis depending on your needs.

1 Review

Downloads: 2 This Week

Last Update: 2023-03-23
See Project

Previous
1
You're on page 2
3
Next

Related Searches

ai agent mod

labelme

video ai

cc

gpt

whisper

story

ai

convert txt to m3u

image2text

Related Categories

Artificial Intelligence

Multimedia

Software Development

Business

Database

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise