Showing 1212 open source projects for "audio-share,"

View related business solutions
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Bili23 Downloader

    Bili23 Downloader

    Cross platform GUI tool for downloading videos from Bilibili sites

    ...It can parse different types of links such as standard video pages, short links, and collection or activity pages to automatically retrieve downloadable media. It also allows users to choose video resolution, audio quality, and encoding format based on the available sources. Additional features include downloading subtitles, comments, metadata, and artwork associated with videos.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Anime Player

    Anime Player

    Video player for improving quality of hand-drawn images

    ...This program is a video player written in the Python programming language using the PySimpleGUI graphical user interface library, an mpv media player, and the Anime4K scaling algorithm . Anime Player is designed to play video and audio files and includes functions such as opening files, URLs and folders, setting image scaling parameters using the Anime4K algorithm, creating an mpv config for watching videos using the Anime4K algorithm on Android, viewing help and information about tuning the algorithm. The player also has support for frame interpolation using SVP. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 3
    YuE

    YuE

    Open source AI model for generating full songs from lyrics prompts

    ...It focuses on transforming text inputs such as lyrics and genre prompts into complete musical compositions that include both vocal and instrumental tracks. Unlike many shorter audio generators, the model is capable of producing songs that last several minutes while maintaining coherent musical structure and alignment with the provided lyrics. YuE introduces a family of models built on large language model architectures that process music generation as a sequence prediction task. YuE also incorporates techniques such as track-decoupled prediction and progressive conditioning to help manage complex audio signals and maintain consistency throughout long compositions. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    EPUB to Audiobook Converter

    EPUB to Audiobook Converter

    EPUB to audiobook converter, optimized for Audiobookshelf

    EPUB to Audiobook Converter is a tool designed to convert EPUB ebooks into chaptered audiobooks, optimized specifically for Audiobookshelf servers. It reads each chapter from an EPUB file, generates audio using a chosen text-to-speech backend, and outputs separate MP3 files with chapter titles preserved as metadata to make navigation easier. The project supports multiple TTS providers, including Microsoft Azure TTS, EdgeTTS, OpenAI TTS, local Piper, and Kokoro via an OpenAI-compatible endpoint, allowing users to choose between cloud and self-hosted voices. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection.
    Downloads: 50 This Week
    Last Update:
    See Project
  • 6
    Mercury

    Mercury

    Convert Python notebook to web app and share with non-technical users

    ...We provide Mercury Pro with additional features, dedicated support and friendly commercial license. Mercury is a perfect tool to convert Python notebook to interactive web application and share with non-programmers. You define interactive widgets for your notebook with the YAML header. Your users can change the widgets values, execute the notebook and save result (as PDF or html file). You can hide your code to not scare your (non-coding) collaborators. Easily deploy to any server. Mercury is dual-licensed. Looking for dedicated support, a commercial-friendly license, and more features? ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    ...It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. For best quality, the model is designed to work with a reference speaker clip and will inherit emotion, style, and accent from that reference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    MARS5

    MARS5

    MARS5 speech model (TTS) from CAMB.AI

    ...The model is built to handle prosodically challenging content such as sports commentary, anime dialogue, and other high-energy or highly varied speech patterns with realistic rhythm and intonation. To control speaker identity, MARS5 uses a short reference audio clip, typically between 2 and 12 seconds, from which it learns the voice characteristics. It supports two main inference modes: shallow clone, which is faster and only needs the reference audio, and deep clone, which additionally uses the transcript of the reference audio to increase similarity and naturalness at the cost of more computation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    ...PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    HunyuanCustom

    HunyuanCustom

    Multimodal-Driven Architecture for Customized Video Generation

    HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for identity reinforcement and modality-specific condition injection. Text-image fusion module based on LLaVA for improved multimodal understanding. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ComfyUI

    ComfyUI

    The most powerful and modular diffusion model GUI, api and backend

    The most powerful and modular diffusion model is GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart-based interface. We are a team dedicated to iterating and improving ComfyUI, supporting the ComfyUI ecosystem with tools like node manager, node registry, cli, automated testing, and public documentation. Open source AI models will win in the long run against closed models and we are only at the beginning. Our core mission...
    Downloads: 180 This Week
    Last Update:
    See Project
  • 12
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Pipecat

    Pipecat

    Framework for building real-time voice and multimodal AI agents

    ...Developers can create a wide range of interactive systems including voice assistants, customer service agents, interactive storytelling applications, and multimodal interfaces that combine voice, video, images, and text. Its modular architecture allows components to be composed into pipelines that process audio, text, and video streams in real time.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    CCXT

    CCXT

    JavaScript/TypeScript/Python/C#/PHP cryptocurrency trading API

    The ccxt library is a collection of available crypto exchanges or exchange classes. Each class implements the public and private API for a particular crypto exchange. All exchanges are derived from the base Exchange class and share a set of common methods. To access a particular exchange from ccxt library you need to create an instance of corresponding exchange class. Supported exchanges are updated frequently and new exchanges are added regularly.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 15
    yami

    yami

    An open-source music player with simple UI

    Yami is a lightweight, open-source music player built in Python. It focuses on simplicity and ease of use, providing an intuitive user interface (UI) for users to manage and play their music. Whether you're playing local files or downloading from online sources using spotdl, Yami offers a seamless experience. This project is designed for users who want a minimalistic, cross-platform music player with the ability to integrate external sources like Spotify/YouTube Music.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OSCAL

    OSCAL

    Open Security Controls Assessment Language (OSCAL)

    NIST is developing the Open Security Controls Assessment Language (OSCAL), a set of hierarchical, XML-, JSON-, and YAML-based formats that provide a standardized representation of information pertaining to the publication, implementation, and assessment of security controls. OSCAL is being developed through a collaborative approach with the public. Public contributions to this project are welcome. With this effort, we are stressing the agile development of a set of minimal formats that are...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Weblate

    Weblate

    Web based localization tool with tight version control integration

    ...This way translators can work on translations the entire time, instead of working through huge amounts of new text just prior to release. Copylefted; use, see, modify and share at will, and with everyone. All translators are properly credited in the version control system. Customizable quality checks helps improve translation quality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Determined

    Determined

    Determined, deep learning training platform

    ...Interpret your experiment results using the Determined UI and TensorBoard, and reproduce experiments with artifact tracking. Deploy your model using Determined's built-in model registry. Easily share on-premise or cloud GPUs with your team. Determined’s cluster scheduling offers first-class support for deep learning and seamless spot instance support. Check out examples of how you can use Determined to train popular deep learning models at scale.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    TorchAudio

    TorchAudio

    Data manipulation and transformation for audio signal processing

    The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Generative AI

    Generative AI

    Sample code and notebooks for Generative AI on Google Cloud

    Generative AI is a comprehensive collection of code samples, notebooks, and demo applications designed to help developers build generative-AI workflows on the Vertex AI platform. It spans multiple modalities—text, image, audio, search (RAG/grounding) and more—showing how to integrate foundation models like the Gemini family into cloud projects. The README emphasises getting started with prompts, datasets, environments and sample apps, making it ideal for both experimentation and production-ready usage. The repository architecture is organised into folders like gemini/, search/, vision/, audio/, and rag-grounding/, which helps developers locate use cases by modality. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    notebooklm-py

    notebooklm-py

    Unofficial Python API and agentic skill for Google NotebookLM

    ...The project covers notebook management, source ingestion, conversational querying, research workflows, and sharing controls, while also enabling the generation of a wide range of study and media artifacts. These outputs include audio overviews, videos, slide decks, infographics, quizzes, flashcards, reports, data tables, and mind maps, with configurable formats and export options.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the Hugging Face Hub, allowing you to easily load and share a dataset with the wider NLP community. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    ...It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. It also supports generating subtitle files (such as SRT or VTT) alongside the speech, which is handy for video narration, e-learning, or accessibility workflows. From the CLI you can adjust parameters such as speaking rate, volume, and pitch, giving you some control over prosody without diving into SSML. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 24
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    ...Configurable label formats let you customize the visual interface to meet your specific labeling needs. Support for multiple data types including images, audio, text, HTML, time-series, and video.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 25
    ChatTTS_colab

    ChatTTS_colab

    One-click deployment (including offline integration package)

    ...It provides an integrated offline bundle and scripts for Windows and macOS so users can run ChatTTS locally without wrestling with complex environment setup. The repository includes Colab notebooks that launch a Gradio-based web UI and expose streaming TTS, making it possible to listen to generated audio as it is produced. A distinctive feature is the “voice gacha” system, which batch-generates many distinct voice timbres and allows users to save the ones they like into a curated voice library. It has first-class support for long-form audio generation, making it suitable for audiobooks, podcasts, or long narration tasks. The project also implements multi-speaker or role-based reading, letting users assign different voices to different characters in a script and even use a large language model to generate that script in one step.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB