Showing 56 open source projects for "audio gui interface"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    labelme Image Polygonal Annotation

    labelme Image Polygonal Annotation

    Image polygonal annotation with Python

    Labelme is a graphical image annotation tool. It is written in Python and uses Qt for its graphical interface. Image annotation for polygon, rectangle, circle, line and point. Image flag annotation for classification and cleaning. Video annotation. (video annotation). GUI customization (predefined labels / flags, auto-saving, label validation, etc). Exporting VOC-format dataset for semantic/instance segmentation. (semantic segmentation, instance segmentation).
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    ...Developers define data transformations and AI operations using computed columns on tables, allowing pipelines to evolve incrementally as new data or models are added. The framework supports multimodal content including images, video, text, and audio, enabling applications such as retrieval-augmented generation systems, semantic search, and multimedia analytics.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    StreamSpeech

    StreamSpeech

    StreamSpeech is a seamless model for offline speech recognition

    StreamSpeech is an “all-in-one” speech model designed to perform offline and simultaneous speech recognition, speech translation, and speech synthesis within a single unified architecture. Developed as part of an ACL 2024 paper, it targets streaming and low-latency scenarios where intermediate results and final translations or synthetic speech must be produced continuously as audio is being received. The model supports eight tasks: offline ASR, speech-to-text translation, speech-to-speech...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    h2oGPT

    h2oGPT

    Private chat with local GPT with document, images, video, etc.

    h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    npcpy

    npcpy

    The AI toolkit for the AI developer

    npcpy is a Python-based agent framework and command-line toolkit (the NPC Shell) for developers to build, test, and integrate AI agents into their workflows, including both command-line and GUI interfaces via NPC Studio. Welcome to npcpy, the core library of the NPC Toolkit that supercharges natural language processing pipelines and agent tooling. npcpy is a flexible framework for building state-of-the-art applications and conducting novel research with LLMs. The structure of npcpy also...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    UFO³

    UFO³

    Weaving the Digital Agent Galaxy

    UFO is an open-source framework developed by Microsoft for building intelligent agents that automate interactions with graphical user interfaces on the Windows operating system. The system allows users to issue natural language instructions that are translated into automated actions across multiple desktop applications. Using a dual-agent architecture, the framework analyzes both visual interface elements and system control structures in order to understand how applications should be...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    AnyTool

    AnyTool

    AnyTool: Universal Tool-Use Layer for AI Agents

    AnyTool is an open-source universal tool-use layer for AI agents that addresses the critical problem of how autonomous agents reliably interact with external tools and environments. Rather than having each agent handle tool invocation logic on its own, AnyTool provides a standardized interface and orchestrator that intelligently selects and manages tools, reduces context overhead, and improves execution reliability across diverse capabilities like web APIs, local commands, and GUI automation. It uses progressive filtering and adaptive orchestration to ensure the right tools are retrieved efficiently and work cohesively with agents of varying complexity, scaling to thousands of tools with self-optimizing behavior. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    OmAgent

    OmAgent

    Build multimodal language agents for fast prototype and production

    OmAgent is an open-source Python framework designed to simplify the development of multimodal language agents that can reason, plan, and interact with different types of data sources. The framework provides abstractions and infrastructure for building AI agents that operate on text, images, video, and audio while maintaining a relatively simple interface for developers. Instead of forcing developers to implement complex orchestration logic manually, the system manages task scheduling, worker coordination, and node optimization behind the scenes. Its architecture uses a graph-based workflow engine where tasks are represented as nodes in a directed workflow, enabling modular composition of complex reasoning pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Matcha-TTS

    Matcha-TTS

    A fast TTS architecture with conditional flow matching

    Matcha-TTS is a non-autoregressive neural text-to-speech architecture that uses conditional flow matching to generate speech quickly while maintaining natural quality. It models speech as an ODE-based generative process, and conditional flow matching lets it reach high-quality audio in only a few synthesis steps, which greatly reduces latency compared to score-matching diffusion approaches. The model is fully probabilistic, so it can generate diverse realizations of the same text while still sounding stable and intelligible. The repository provides an end-to-end TTS pipeline: a PyTorch/Lightning training stack, configuration files, pre-trained checkpoints, a command-line interface, and a Gradio app for interactive testing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 10
    Agent S

    Agent S

    Agent S: an open agentic framework that uses computers like a human

    Agent S is an open-source agentic framework designed to enable autonomous computer use through an Agent-Computer Interface (ACI). Built to operate graphical user interfaces like a human, it allows AI agents to perceive screens, reason about tasks, and execute actions across macOS, Windows, and Linux systems. The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    WhatsApp MCP Server

    WhatsApp MCP Server

    WhatsApp MCP server enabling AI access to chats and messaging

    ...It supports both sending and receiving messages, including various media types such as images, audio, videos, and documents. It integrates with AI applications like Claude through MCP, enabling conversational automation and contextual message retrieval.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    CC2.TV / CC2 - Audio- und TV-Datenbank

    CC2.TV / CC2 - Audio- und TV-Datenbank

    Meta-Datenbank-Anwendung für die Audio- und TV-Sendungen des CC2.TV

    Dieses Programm stellt eine Meta-Datenbank-Anwendung für die Audio- und Video-Sendungen des CC2.TV für GNU/Linux Systeme zur Verfügung. Es ermöglicht das Durchsuchen, Verwalten und Abspielen der umfangreichen Inhalte des CC2.TV-Audiocasts und -Videocasts. Ziel ist es, die über 3000 Audiocast-Themen und über 1000 Videocast-Themen, die sich auf Computerthemen, Technik und gesellschaftliche Aspekte konzentrieren, komfortabel zugänglich zu machen. Für die volle Funktionalität,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Deface GUI -  Face Anonymization Tool

    Deface GUI - Face Anonymization Tool

    Graphical User Interface Face Anonymization Tool

    This application is a professional tool with a graphical user interface that enables anonymization of faces using the Deface Engine. Cross-Platform Compatible (Linux-Windows) NOTE: To use on Windows, first install Python. Then, if necessary, install “pip install deface” (only if necessary).
    Downloads: 20 This Week
    Last Update:
    See Project
  • 14
    ollama_manager_gui

    ollama_manager_gui

    A graphical manager for ollama that can manage your LLMs

    This app will help install ollama and LLMs using the gui provided by this app. It checks for ollama when launched and if it doesn't exist it will help by bringing you to the ollama site for download. This app is heavily upgraded and now also works properly on Linux. It now has progress bars and many many many improvements. It can launch the LLM by clicking the link. it can launch multiple LLMs in separate windows. It can also remove an installed LLM. There is a confirmation...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    MuJoCo MPC

    MuJoCo MPC

    Real-time behaviour synthesis with MuJoCo, using Predictive Control

    MuJoCo MPC (MJPC) is an advanced interactive framework for real-time model predictive control (MPC) built on top of the MuJoCo physics engine, developed by Google DeepMind. It allows researchers and roboticists to design, visualize, and execute complex control tasks for simulated or real robotic systems. MJPC integrates a high-performance GUI and multiple predictive control algorithms, including iLQG, gradient descent, and Predictive Sampling — a competitive, derivative-free method that...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    vocal-separate

    vocal-separate

    An extremely simple tool for separating vocals and background music

    ...Users can drag and drop an audio or video file onto the interface to begin separation, choosing between two, four, or five stems, which allows isolating specific components like vocals, bass, drums, or piano depending on the chosen model. After processing, the tool outputs separate WAV files for each extracted stem, making it easy to export and use in audio editing or remix software.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    Whisper Batch Transcriber

    Whisper Batch Transcriber

    Unlimited, private and free Speech-To-Text program

    ...(I did this because compiling to exe made it slower) - I made it as easy as possible for a layperson to use, so despite its crude looks, its as good as a GUI application experience. Enjoy freedom!
    Downloads: 19 This Week
    Last Update:
    See Project
  • 18
    DragGAN

    DragGAN

    Official Code for DragGAN (SIGGRAPH 2023)

    ...It combines feature-based motion supervision with a robust point-tracking mechanism to ensure accurate edits during user interaction. DragGAN has gained attention for making complex image edits, such as pose changes or shape adjustments, accessible through an intuitive interface. The repository provides code and GUI tooling that allow researchers and advanced users to experiment with this next-generation controllable image manipulation technique.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Audio Webui

    Audio Webui

    A webui for different audio related Neural Networks

    Audio Webui is a Gradio-based web user interface that unifies a wide range of audio-related neural networks under a single, accessible front end. It is designed as an “all-in-one” environment where users can experiment with text-to-speech, voice cloning, generative music, and other neural audio models without writing boilerplate code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    StoryTeller

    StoryTeller

    Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.

    ...Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals. To develop locally, install dev dependencies and install pre-commit hooks. This will automatically trigger linting and code quality checks before each commit. The final video will be saved as /out/out.mp4, alongside other intermediate images, audio files, and subtitles. For more advanced use cases, you can also directly interface with Story Teller in Python code.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Txt-2-Mp3  6.3 Mark 2 [I.S.A]

    Txt-2-Mp3 6.3 Mark 2 [I.S.A]

    Txt-2-Mp3 6.3 Mark 2 [Improved.Simplified.Alternative]

    'Txt2Mp3' an desktop application developed using python 3.6.8 and other add-on libaries. Can convert texts into audio (.mp3) files using gTTS (Google Text-to-speech) api module library. Compatible only for windows OS.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Img2Txt

    Img2Txt

    Img2Txt - Extract Text From Images using AI

    Important: If you are sharing this program. Please Include the official Download Link What is Img2Txt? Img2Txt is a Python-based application packaged using PyInstaller that utilizes the power of pytesseract, an AI-powered optical character recognition (OCR) library, to extract text from images and convert it into plain text. The application features a simple and modern user-friendly interface created using customtkinter, allowing users to easily process images and obtain the text...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Riffusion

    Riffusion

    Real-time music generation using stable diffusion techniques AI

    ...Riffusion (hobby) serves as the core implementation for audio and image processing, providing essential building blocks for generating music from text prompts. It includes both developer-oriented tools and user-facing components such as a command-line interface and an interactive Streamlit application for experimentation. Additionally, it can run as a Flask server to expose model inference through an API, enabling integration with other applications or services.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    AI Atelier

    AI Atelier

    Based on the Disco Diffusion, version of the AI art creation software

    Based on the Disco Diffusion, we have developed a Chinese & English version of the AI art creation software "AI Atelier". We offer both Text-To-Image models (Disco Diffusion and VQGAN+CLIP) and Text-To-Text (GPT-J-6B and GPT-NEOX-20B) as options. Making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. When a modified version is used to provide a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Mocking Bird

    Mocking Bird

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    ...The codebase is implemented in Python (with PyTorch) and includes modules for encoder, synthesizer, vocoder, preprocessing, and inference, as well as demo scripts and a web-server interface for easier experimentation or deployment. MockingBird supports both using pretrained models and training your own synthesizer (with custom datasets), giving flexibility for voice-cloning or custom-voice synthesis depending on your needs.
    Downloads: 2 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB