Showing 22 open source projects for "configuration"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    EPUB to Audiobook Converter

    EPUB to Audiobook Converter

    EPUB to audiobook converter, optimized for Audiobookshelf

    ...The project supports multiple TTS providers, including Microsoft Azure TTS, EdgeTTS, OpenAI TTS, local Piper, and Kokoro via an OpenAI-compatible endpoint, allowing users to choose between cloud and self-hosted voices. A recent addition is a Gradio-based WebUI, which wraps all configuration options in a graphical interface for users who prefer not to work with the command line. The tool offers advanced options such as controlling chapter ranges, handling paragraph detection via newline modes, removing endnote markers, and using regex-based search-and-replace files to tweak pronunciations. It can be run directly with Python or via Docker.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    Applio

    Applio

    A simple, high-quality voice conversion tool focused on ease of use

    Applio is a high-quality voice conversion toolkit designed to make modern RVC/VITS-based voice cloning accessible to non-experts. It focuses strongly on ease of use: installation scripts for Windows, Linux, and macOS set up dependencies and then launch a browser-based Gradio interface. Within that interface, users can train and run voice conversion models for tasks like singing conversion, speech-to-speech transformation, and voice cloning. The project is structured to be flexible through...
    Downloads: 113 This Week
    Last Update:
    See Project
  • 3
    kokoro-onnx

    kokoro-onnx

    TTS with kokoro and onnx runtime

    ...The project ships prebuilt model files and a simple example script, so you can go from installation to producing an audio.wav file in just a few steps. It supports multiple languages and voices, with a curated voice list and configuration via a VOICES file hosted alongside the models. The package is distributed on PyPI, meaning you can integrate it directly into applications or scripts using standard Python tooling. It also recommends pairing with an external G2P package to improve pronunciation quality, especially for more complex languages or names, and is licensed under permissive MIT and Apache-style licenses.
    Downloads: 137 This Week
    Last Update:
    See Project
  • 4
    AI Runner

    AI Runner

    Offline inference engine for art, real-time voice conversations

    ...At the core of its LLM stack is a mode-based architecture with specialized “modes” such as Author, Code, Research, QA and General, and a workflow manager that automatically routes user requests to the right agent based on the task. The project has a strong focus on developer ergonomics, with thorough development guidelines, environment configuration using .env variables, and a clear structure for tests, tools and agents.
    Downloads: 12 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    ...It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. Configuration options let you control the number of clients, maximum connection time, and threading behavior so the server can be tuned for different deployment environments. On the client side, you can set the language, whether to translate into English, model size, voice activity detection, and output recording behavior.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 6
    EasyVoice

    EasyVoice

    Open source text-to-speech tool, supports extra-long text

    easyVoice is an open-source text-to-speech platform aimed at turning long-form text and novels into high-quality audio, with a strong focus on usability and scalability. It provides a web interface where users can paste or upload large texts and generate speech and subtitles in a single workflow, even for works exceeding 100,000 characters. The system supports multi-role voice acting, letting users assign different neural voices to different characters or narrative roles and configure...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    MiniMax-MCP

    MiniMax-MCP

    Official MiniMax Model Context Protocol (MCP) server

    ...The server is written in Python and distributed under the MIT license, with a pyproject.toml and uv-based workflow that makes installation and execution reproducible. Configuration is handled through JSON files that tell MCP clients how to launch the server (typically via uvx minimax-mcp) and which environment variables to use for the API key, host, and output directory. The README carefully explains region-specific API hosts for global and mainland users to avoid invalid-key errors, and documents both local stdio transport and SSE-based network transport modes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    NVIDIA NeMo Framework

    NVIDIA NeMo Framework

    Scalable generative AI framework built for researchers and developers

    ...It provides collections of domain-specific modules and reference implementations that make it easier to pre-train, fine-tune, and deploy very large models on multi-GPU and multi-node infrastructure. NeMo 2.0 introduces a Python-based configuration system, replacing YAML with more flexible, programmable configs that can be versioned and composed for different experiments. The framework builds on PyTorch Lightning–style modular abstractions, so training scripts are composed from reusable components for data loading, models, optimizers, and schedulers, which simplifies experimentation and adaptation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    IMS Toucan

    IMS Toucan

    Controllable and fast Text-to-Speech for over 7000 languages

    ...The toolkit focuses on being fast and controllable while not requiring huge amounts of compute, making it practical for research labs and smaller teams. It includes complete pipelines for preprocessing datasets, training models, and running inference, plus a storage configuration system to manage where models and caches are stored. IMS-Toucan ships with several ready-to-run scripts, including GUIs for interactive demos, prosody override tools, zero-shot language embedding injection, and text-to-audio file generation. Pretrained models are automatically downloaded when needed, and there is an online demo instance hosted on GPU that anyone can try.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    TTS WebUI

    TTS WebUI

    A single Gradio + React WebUI with extensions for ACE-Step

    TTS-WebUI is a unified Gradio + React web interface that brings together a large ecosystem of text-to-speech, voice conversion, and audio generation models under a single UI. It supports a wide range of models such as Bark, MusicGen, Tortoise, RVC, StyleTTS2, ParlerTTS, CosyVoice, XTTSv2, Stable Audio, SeamlessM4T, and many others, exposing them as interchangeable backends for speech and music synthesis. The project provides an installer that sets up Conda, Python environments, and all...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Lingvo

    Lingvo

    Framework for building neural networks

    ...Lingvo includes reference models and configurations for domains like machine translation, automatic speech recognition, language modeling, image understanding, and 3D object detection. Centralized hyperparameter configuration files allow researchers to share exact experiment setups so others can retrain and compare results reliably.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Matcha-TTS

    Matcha-TTS

    A fast TTS architecture with conditional flow matching

    ...The model is fully probabilistic, so it can generate diverse realizations of the same text while still sounding stable and intelligible. The repository provides an end-to-end TTS pipeline: a PyTorch/Lightning training stack, configuration files, pre-trained checkpoints, a command-line interface, and a Gradio app for interactive testing. Users can train on standard datasets like LJSpeech or plug in their own corpora, with helper tools for computing dataset statistics, extracting phoneme durations, and running multi-GPU training.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    OuteTTS is an interface library for running OuteTTS text-to-speech models across a range of backends, making it easier to deploy the same model on different hardware and runtimes. It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face Transformers, ExLlamaV2, VLLM and a JavaScript interface via Transformers.js, allowing it to run on CPUs, NVIDIA CUDA GPUs, AMD ROCm, Vulkan-capable GPUs, and Apple Metal. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Amphion

    Amphion

    Toolkit for audio, music, and speech generation

    ...The toolkit is organized with example experiments (“egs”) and visualization demos that guide users through training, evaluation, and inspection of models. Built on the broader OpenMMLab ecosystem, Amphion follows modular design patterns and configuration systems similar to other OpenMMLab projects, easing adoption for users who are already familiar with that stack.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    StyleTTS 2

    StyleTTS 2

    Towards Human-Level Text-to-Speech through Style Diffusion

    ...StyleTTS2 supports both single-speaker and multi-speaker configurations, with the ability to sample or transfer styles from reference audio, making it powerful for expressive TTS and character voices. The repository includes training scripts, configuration files, and pre-trained auxiliary modules such as a text aligner, pitch extractor, and PL-BERT-based linguistic encoder.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Bert-VITS2

    Bert-VITS2

    VITS2 backbone with multilingual-bert

    ...The core idea is to use BERT-style contextual embeddings for text encoding while relying on a refined VITS2 architecture for acoustic generation and vocoding. The repository includes everything needed to train, fine-tune, and run the model, from configuration files to preprocessing scripts, spectrogram utilities, and training entrypoints for multi-GPU and multi-node setups. It provides emotional modeling through “emo embeddings,” allowing voices to be conditioned on different affective states during synthesis. Releases include optimizations for Japanese and English alignment, expanded training data, spec caching and pre-generation tools, as well as ONNX export for more lightweight inference deployments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ekho

    ekho

    Chinese text-to-speech engine

    ekho is a project with relatively sparse documentation, but from the repository it appears to be a small-scale tool for audio processing and playback, possibly with features for speech synthesis or manipulation. The repo includes scripts and configuration files suggesting interactions with media/audio handling libraries. Because of limited README detail, it seems targeted at users comfortable reading and modifying code, rather than end users expecting polished UIs. The code structure implies that Ekho may support hooking into audio input/output streams, perhaps for tasks like audio capture, playback, transformation, or simple voice-based operations. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Audio Webui

    Audio Webui

    A webui for different audio related Neural Networks

    Audio Webui is a Gradio-based web user interface that unifies a wide range of audio-related neural networks under a single, accessible front end. It is designed as an “all-in-one” environment where users can experiment with text-to-speech, voice cloning, generative music, and other neural audio models without writing boilerplate code. The project supports multiple back-end models and toolchains (such as Bark, RVC, AudioLDM, Audiocraft, and other text-to-audio or voice-cloning tools),...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    edge-TTS-record

    edge-TTS-record

    Tool that can record speech synthesis

    ...Users can type or paste text, preview the speech, and then trigger the recorder; the system automatically captures the audio output from the browser and writes it to a WAV file. The tool includes a small GUI (built with Aardio) and aims to be plug-and-play — after downloading the .exe you can immediately start using it without deep configuration. It is cloud-based in the sense that it relies on Edge’s online TTS service, so internet connection is required; but once recorded, the audio is local.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    XZVoice

    XZVoice

    Free and open source text-to-speech software

    Text-to-speech software developed by Electron + vue + ElementUI + js. The high-fidelity and flexible configuration of speech synthesis products opens up the closed loop of human-computer interaction and enables applications to sound realistically. A variety of timbres are available, and functions such as adjusting speech rate, intonation, and volume are provided. Technically, multi-level rhythmic pauses are taken into account to achieve the purpose of natural synthesizing rhythm, and comprehensively use acoustic parameters and linguistic parameters to establish multiple automatic prediction models based on deep learning. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    HiFi-GAN

    HiFi-GAN

    Generative Adversarial Networks for Efficient and High Fidelity Speech

    ...In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    OpenSeq2Seq

    OpenSeq2Seq

    Toolkit for efficient experimentation with Speech Recognition

    ...Mixed-precision support (float16) is optimized for NVIDIA Volta and Turing GPUs, allowing significant speedups and memory savings without sacrificing model quality. The project comes with configuration-driven training scripts, documentation, and examples that demonstrate how to set up pipelines for tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB