Showing 14 open source projects for "audio generation"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 1
    Podcastfy.ai

    Podcastfy.ai

    Transforming Multimodal Content into Captivating Multilingual Audio

    Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization and scale.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    LiveAvatar

    LiveAvatar

    Streaming Real-time Audio-Driven Avatar Generation

    LiveAvatar is an open-source research and implementation project that provides a unified framework for real-time, streaming, interactive avatar video generation driven by audio and other control signals. It implements techniques from state-of-the-art diffusion-based avatar modeling to support infinite-length continuous video generation with low latency, enabling interactive AI avatars that maintain continuity and realism over extended sessions. The project co-designs algorithms and system optimizations, such as block-wise autoregressive processing and fast sampling strategies, to deliver real-time frame rates (e.g., ~45 FPS on appropriate GPU clusters) while handling non-stop generation without quality degradation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Speakr

    Speakr

    Speakr is a personal, self-hosted web application

    ...It provides a clean, user-friendly interface where users can input text, choose a voice style or language, and immediately hear the output, making it ideal for accessibility, content creation, and learning applications. Behind the scenes, Speakr leverages modern TTS engines and streaming audio technologies to deliver smooth and responsive speech generation without noticeable delay. The project is built with extensibility in mind, enabling developers to add custom voices, integrate additional languages, and tailor the backend for different hardware or cloud environments. It also supports saving generated audio as downloadable files so users can reuse the speech outputs in other projects, presentations, or media content.
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    SoulSync

    SoulSync

    Automated Music Discovery and Collection Manager

    SoulSync is an intelligent music discovery and automation platform designed to bridge streaming services with self-hosted media libraries, enabling users to automatically grow and maintain curated music collections. The system continuously monitors selected artists and detects new releases, then generates personalized playlists such as Release Radar and Discovery Weekly using its built-in recommendation logic. It can automatically download missing tracks from multiple sources including...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Audiomentations

    Audiomentations

    A Python library for audio data augmentation

    A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Moshi

    Moshi

    A speech-text foundation model for real time dialogue

    ...At inference, the stream from the user is taken from the audio input, and the one for Moshi is sampled from the model's output. Along these two audio streams, Moshi predicts text tokens corresponding to its own speech, its inner monologue, which greatly improves the quality of its generation. A small Depth Transformer models inter codebook dependencies for a given time step, while a large, 7B parameter Temporal Transformer models the temporal dependencies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    ...PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    HuMidi

    HuMidi

    Play MIDI like a human in ROBLOX with automatic sustain pedals.

    HuMidi is a universal piano auto player for ROBLOX. It plays even the most generic MIDI file with great depth, thanks to the humanization and automatic sustain pedal generation algorithms. The pedals are generated with a thorough analysis of the MIDI data, adding more depth to your performance that no other MIDI players could! Personally have been tested in: - Visual Pianos - Starving Pianists - Digital Piano - Piano's Got Talent This tool is universal, as long as the piano in-game...
    Downloads: 128 This Week
    Last Update:
    See Project
  • Powerful App Monitoring Without Surprise Bills Icon
    Powerful App Monitoring Without Surprise Bills

    AppSignal starts at $23/month with all features included. No overages, no hidden fees. 30-day free trial.

    Tired of monitoring tools that punish you for scaling? AppSignal offers transparent, predictable pricing with every feature unlocked on every plan. Track errors, monitor performance, detect anomalies, and manage logs across Ruby, Python, Node.js, and more. Trusted by developers since 2012 with free dev-to-dev support. No credit card required to start your 30-day trial.
    Try AppSignal Free
  • 10
    audio-diffusion-pytorch

    audio-diffusion-pytorch

    Audio generation using diffusion models, in PyTorch

    A fully featured audio diffusion library, for PyTorch. Includes models for unconditional audio generation, text-conditional audio generation, diffusion autoencoding, upsampling, and vocoding. The provided models are waveform-based, however, the U-Net (built using a-unet), DiffusionModel, diffusion method, and diffusion samplers are both generic to any dimension and highly customizable to work on other formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    PyVcon

    PyVcon

    A stylish Video Converter written in Python

    PyVcon is a Python video converter using PyQt as its primary GUI Toolkit and because of this, PyVcon has a very sleek user friendly interface. Using ffmpeg for video conversion, PyVcon has great performance in speed and converts any kind of video into mp4, mkv, wmv, avi, 3gp, m4a, mp3 and wma formats. Also included, is MediaInfo who PyVcon partly depends for video metadata generation. PyVcon is also generally known to be of outstanding value, possessing a highly flexible video quality...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Canorus is the next generation music score editor. It is the official successor of NoteEdit. It uses Qt4, has scripting capibilities and uses modern development standards like eclipse, patterns and test units as base.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 13
    XMMS2
    Second generation of XMMS music player.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    SuperWillow is a Music Generation program. Artists have many influences which they have accumulated over the years by listening to countless pieces of music, this principle is reflected in SuperWillow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB