Showing 24 open source projects for "audio development"

View related business solutions
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 2
    ComfyUI

    ComfyUI

    The most powerful and modular diffusion model GUI, api and backend

    The most powerful and modular diffusion model is GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart-based interface. We are a team dedicated to iterating and improving ComfyUI, supporting the ComfyUI ecosystem with tools like node manager, node registry, cli, automated testing, and public documentation. Open source AI models will win in the long run against closed models and we are only at the beginning. Our core mission...
    Downloads: 165 This Week
    Last Update:
    See Project
  • 3
    OpenAI-Compatible Edge-TTS API

    OpenAI-Compatible Edge-TTS API

    Free, high-quality text-to-speech API endpoint to replace OpenAI

    OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Groq Python

    Groq Python

    The official Python Library for the Groq API

    ...For those building AI-based tooling, automation scripts, or ML-backed backends, groq-python abstracts away HTTP request plumbing and exposes a clean API, accelerating development and reducing boilerplate.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    ...Developers define data transformations and AI operations using computed columns on tables, allowing pipelines to evolve incrementally as new data or models are added. The framework supports multimodal content including images, video, text, and audio, enabling applications such as retrieval-augmented generation systems, semantic search, and multimedia analytics.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    DocsGPT

    DocsGPT

    Private AI platform for agents, enterprise search and RAG pipelines

    DocsGPT is an open-source AI platform for deploying private RAG pipelines, AI agents, and enterprise search on your own infrastructure. Connect any data source (PDFs, DOCX, CSV, Excel, HTML, audio, GitHub, databases, URLs) and get accurate, hallucination-free answers with source citations. Choose your LLM: OpenAI, Anthropic, Google Gemini, or local models. Works with Qdrant, MongoDB, and Elasticsearch and more. Deploy via Docker or Kubernetes with full data sovereignty. Build...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Spring AI Alibaba Examples

    Spring AI Alibaba Examples

    Spring AI Alibaba examples for building and testing AI apps

    ...It is designed to help developers understand core concepts, explore practical implementations, and follow best practices when building AI-powered systems using the Spring ecosystem. Each module focuses on a specific use case such as chat, image processing, audio handling, graph workflows, and retrieval-augmented generation. The examples highlight how to integrate AI models, manage prompts, handle memory, and build multi-model or multi-agent workflows. Developers can explore individual project folders for detailed instructions and implementation guidance. Spring AI Alibaba Examples also supports experimentation through playground modules and encourages contributions to expand real-world AI use cases and improve development practices.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. Data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    OmAgent

    OmAgent

    Build multimodal language agents for fast prototype and production

    OmAgent is an open-source Python framework designed to simplify the development of multimodal language agents that can reason, plan, and interact with different types of data sources. The framework provides abstractions and infrastructure for building AI agents that operate on text, images, video, and audio while maintaining a relatively simple interface for developers. Instead of forcing developers to implement complex orchestration logic manually, the system manages task scheduling, worker coordination, and node optimization behind the scenes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for ML security

    Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, sci-kit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Jina-Serve

    Jina-Serve

    Build multimodal AI applications with cloud-native stack

    Jina Serve is an open-source framework designed for building, deploying, and scaling AI services and machine learning pipelines in production environments. The framework allows developers to create microservices that expose machine learning models through APIs that communicate using protocols such as HTTP, gRPC, and WebSockets. It is built with a cloud-native architecture that supports deployment on local machines, containerized environments, or large orchestration platforms such as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    MiniMax-MCP

    MiniMax-MCP

    Official MiniMax Model Context Protocol (MCP) server

    MiniMax-MCP is the official Model Context Protocol (MCP) server for accessing MiniMax’s multimodal generative APIs from MCP-compatible clients. It acts as a bridge between tools like Claude Desktop, Cursor, Windsurf, OpenAI Agents, and the MiniMax platform, exposing capabilities such as text-to-speech, voice cloning, image generation, text-to-image, video generation, image-to-video, text-to-video, and music generation. The server is written in Python and distributed under the MIT license,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Acontext

    Acontext

    Context data platform for building observable, self-learning AI agents

    Acontext is a cloud-native context data platform designed to support the development and operation of advanced AI agents. It provides a unified system to store and manage contexts, multimodal messages, artifacts, and task workflows, enabling developers to engineer context effectively for their agent products. The platform observes agent tasks and user feedback in real time, offering robust observability into workflows and helping teams understand how agents perform over time. Acontext also...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Txt-2-Mp3  6.3 Mark 2 [I.S.A]

    Txt-2-Mp3 6.3 Mark 2 [I.S.A]

    Txt-2-Mp3 6.3 Mark 2 [Improved.Simplified.Alternative]

    'Txt2Mp3' an desktop application developed using python 3.6.8 and other add-on libaries. Can convert texts into audio (.mp3) files using gTTS (Google Text-to-speech) api module library. Compatible only for windows OS.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Tensor2Tensor

    Tensor2Tensor

    Library of deep learning models and datasets

    Deep Learning (DL) has enabled the rapid advancement of many useful technologies, such as machine translation, speech recognition and object detection. In the research community, one can find code open-sourced by the authors to help in replicating their results and further advancing deep learning. However, most of these DL systems use unique setups that require significant engineering effort and may only work for a specific problem or architecture, making it hard to run new experiments and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    GoodByeCatpcha

    GoodByeCatpcha

    Solver ReCaptcha v2 Free

    An async Python library to automate solving ReCAPTCHA v2 by images/audio using Mozilla's DeepSpeech, PocketSphinx, Microsoft Azure’s, Google Speech and Amazon's Transcribe Speech-to-Text API. Also image recognition to detect the object suggested in the captcha. Built with Pyppeteer for Chrome automation framework and similarities to Puppeteer, PyDub for easily converting MP3 files into WAV, aiohttp for async minimalistic web-server, and Python’s built-in AsyncIO for convenience.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Defox text to speech and downloader

    Defox text to speech and downloader

    Written or imported text offline read or online download.

    This software design to convert text to speech and download the converted speech. Description : • Installation setup with two languages (English, French) • Two areas called text reading and speech downloading • Many languages supported to download center Note 1: I'm a student yet and I'm not in the software designing industry. Therefore maybe I haven't software making skills. I'm worried about that. ! Note 2 : When you double click on the software maybe it will get some seconds...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Distant Speech Recognition

    Beamforming and Speech Recognition Toolkit

    BTK contains C++ and Python libraries that implement speech processing and microphone array techniques such as speech feature extraction, speech enhancement, speaker tracking, beamforming, dereverberation and echo cancellation algorithms. The Millennium ASR provides C++ and python libraries for automatic speech recognition. The Millennium ASR implements a weighted finite state transducer (WFST) decoder, training and adaptation methods. These toolkits are meant for facilitating research and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    InproTK

    InproTK

    An Incremental Spoken Dialogue Processing Toolkit

    InproTK is an Incremental Spoken Dialogue Processing Toolkit, that is, a toolkit to help you build dialogue systems that listen and talk incrementally, allowing for advanced interactional behaviour. Please see our Wiki for more information: http://sourceforge.net/p/inprotk/wiki/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    RNNLIB is a recurrent neural network library for sequence learning problems. Applicable to most types of spatiotemporal data, it has proven particularly effective for speech and handwriting recognition. full installation and usage instructions given at http://sourceforge.net/p/rnnl/wiki/Home/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Speect
    Speect is a multilingual TTS system. It offers a full text-to-speech system with various API's, as well as an environment for research and development of TTS systems and voices. It is written in ANSI C and uses a plug-in mechanism for extensions. Speect also includes an extensive set of Python bindings for quick implementation of new ideas, these bindings are derived from SWIG interface files and can easily be extended for other languages supported by SWIG. Speect is free and open...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    uListen is a TTS(Text To Speech) application. It can TALK you the web pages, chm files, pdf files and word files and plain text files.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB