Showing 744 open source projects for "sound%20detection"

View related business solutions
  • Find Hidden Risks in Windows Task Scheduler Icon
    Find Hidden Risks in Windows Task Scheduler

    Free diagnostic script reveals configuration issues, error patterns, and security risks. Instant HTML report.

    Windows Task Scheduler might be hiding critical failures. Download the free JAMS diagnostic tool to uncover problems before they impact production—get a color-coded risk report with clear remediation steps in minutes.
    Download Free Tool
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    spotDL

    spotDL

    Download your Spotify playlists and songs along with album art

    spotDL finds songs from Spotify playlists on YouTube and downloads them - along with album art, lyrics and metadata.
    Downloads: 131 This Week
    Last Update:
    See Project
  • 2
    yami

    yami

    An open-source music player with simple UI

    Yami is a lightweight, open-source music player built in Python. It focuses on simplicity and ease of use, providing an intuitive user interface (UI) for users to manage and play their music. Whether you're playing local files or downloading from online sources using spotdl, Yami offers a seamless experience. This project is designed for users who want a minimalistic, cross-platform music player with the ability to integrate external sources like Spotify/YouTube Music.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    You-Get

    You-Get

    Dumb downloader that scrapes the web

    You-Get is a small command-line utility for downloading media (video, audio and images) from the Web when there are no other means to do so. It can download video and audio files from such popular web sites as YouTube, Twitter, Niconico, Vimeo, Flickr, Instagram and a whole lot more. You-Get is a great option for when you want to enjoy your favorite videos, audio or images from the internet without having to open any web browsers or get interrupted by ads. It’s also a good choice for...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 5
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    Audiomentations

    Audiomentations

    A Python library for audio data augmentation

    ...Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products. Mix in another sound, e.g. a background noise. Useful if your original sound is clean and you want to simulate an environment where background noise is present. A folder of (background noise) sounds to be mixed in must be specified. These sounds should ideally be at least as long as the input sounds to be transformed. Otherwise, the background sound will be repeated, which may sound unnatural. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Quod Libet

    Quod Libet

    Music player and music library manager for Linux, Windows, and macOS

    Quod Libet is a cross-platform audio/music management program. It provides many ways to view your local library, and supports streaming audio and feeds (podcasts, etc). It has extremely flexible metadata editing and searching capabilities. With over 90 plugins included, you can extend and integrate with almost anything, or write your own! Ex Falso is a bare-bones tag editor with the same editing interface as Quod Libet. Quod Libet is a GTK+-based audio player written in Python, using the...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    FeelUOwn

    FeelUOwn

    Trying to be a robust, user-friendly and hackable music player

    FeelUOwn is a user-friendly, and hackable music player.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Moshi

    Moshi

    A speech-text foundation model for real time dialogue

    Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codecs like SpeechTokenizer (50 Hz, 4kbps), or SemantiCodec (50 Hz, 1.3kbps). Moshi models two streams of audio: one corresponds to Moshi, and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Total Network Visibility for Network Engineers and IT Managers Icon
    Total Network Visibility for Network Engineers and IT Managers

    Network monitoring and troubleshooting is hard. TotalView makes it easy.

    This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.
    Learn More
  • 10
    Librosa

    Librosa

    Python library for audio and music analysis

    Librosa is a powerful Python library for analyzing and processing audio and music signals. Built on top of NumPy, SciPy, and matplotlib, it provides a wide range of tools for feature extraction, time-series manipulation, audio display, and music information retrieval. Whether you're building machine learning models for audio classification or visualizing spectrograms, Librosa is a go-to library for researchers and developers working in audio signal processing.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    ...It uses a novel model setup that combines continuous acoustic features with discrete semantic tokens to richly capture sound and meaning across speech, music, and environmental audio.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Tauon

    Tauon

    The music player of today

    Tauon is a modern, streamlined music player app that's packed with features! An emphasis on playlists and drag-and-drop importing puts you in control of your music library. Faded volume control, 24-bit FLAC support, and gapless playback provide the ultimate listening experience. Excellent CUE sheet support, an original smart playlist system, and network playback from koel or Airsonic servers. Last.fm, Listenbrainz, and Maloja scribbling. MPRIS2 support for desktop integration. Tauon is a...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Podcastfy.ai

    Podcastfy.ai

    Transforming Multimodal Content into Captivating Multilingual Audio

    Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    NovaSR

    NovaSR

    A lightning fast audio upsampler

    NovaSR is an extremely lightweight and high-performance audio upsampling model that transforms low-quality 16 kHz audio into clearer, high-fidelity 48 kHz audio with remarkable speed and efficiency. At only about 50 KB in size, the model is orders of magnitude smaller than typical audio super-resolution networks, yet it achieves high quality and realtime performance thanks to its compact architecture and efficient convolutional design. NovaSR is especially valuable for post-processing tasks...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MoviePy

    MoviePy

    Video editing with Python

    MoviePy is a Python module for video editing, which can be used for basic operations (like cuts, concatenations, title insertions), video compositing (a.k.a. non-linear editing), video processing, or to create advanced effects. It can read and write the most common video formats, including GIF. MoviePy is an open source software originally written by Zulko and released under the MIT licence. It works on Windows, Mac, and Linux, with Python 2 or Python 3. The code is hosted on Github, where...
    Downloads: 36 This Week
    Last Update:
    See Project
  • 18
    LiveAvatar

    LiveAvatar

    Streaming Real-time Audio-Driven Avatar Generation

    LiveAvatar is an open-source research and implementation project that provides a unified framework for real-time, streaming, interactive avatar video generation driven by audio and other control signals. It implements techniques from state-of-the-art diffusion-based avatar modeling to support infinite-length continuous video generation with low latency, enabling interactive AI avatars that maintain continuity and realism over extended sessions. The project co-designs algorithms and system...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Ren'Py

    Ren'Py

    The Ren'Py Visual Novel Engine

    Ren’Py is a free and open-source visual novel engine that makes telling interactive stories easy by combining narrative scripting with multimedia support for images, sound, and music in a way that’s accessible to both beginners and experienced developers. Its scripting language is designed to be readable and intuitive, allowing writers and creators to define scenes, dialogues, branching choices, character expressions, and events without deep programming knowledge, while still offering the ability to embed Python for advanced control and custom gameplay features. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 20
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    ...It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. High performance on many standard benchmarks: ASR, speech-emotion recognition, vocal sound classification, speech translation etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    TerraGov Marine Corps

    TerraGov Marine Corps

    TGMC: TerraGov Marine Corps, a SS13 mod

    TerraGov Marine Corps (TGMC) is an open source multiplayer game built on the BYOND engine, forked from the Space Station 13 (SS13) codebase. It is a tactical, role-playing game that pits groups of human marines against alien forces in large-scale, cooperative and competitive scenarios. The project focuses heavily on teamwork, coordination, and immersive gameplay, providing players with different roles such as engineers, medics, or combat marines to ensure strategic variety. TGMC offers a...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 22
    clone-voice

    clone-voice

    A sound cloning tool with a web interface, using your voice

    Clone-voice is a local voice-cloning tool that lets you synthesize speech in any target voice or convert one recording into another voice using the same timbre. It is built around Coqui’s XTTS-v2 model, so it inherits multilingual support and modern neural TTS quality while wrapping it in a user-friendly desktop workflow. The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 23
    The Arcade Library

    The Arcade Library

    Easy to use Python library for creating 2D arcade games

    Arcade is an easy-to-use Python library for creating 2D video games. It provides a modern and straightforward API, enabling developers to craft engaging games and graphical applications efficiently. Arcade supports rendering shapes, handling user input, and managing game physics, making it suitable for both beginners and experienced developers.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24

    sound_source_id

    sound_source_id is a Python program for sound localization experiments

    sound_source_id is a Python program for sound localization experiments. Further info can be found on the project webpage: https://samcarcagno.altervista.org/sound_source_id/sound_source_id.html
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    OSWorld

    OSWorld

    Benchmarking Multimodal Agents for Open-Ended Tasks

    OSWorld is an open-source synthetic world environment designed for embodied AI research and multi-agent learning. It provides a richly simulated 3D world where multiple agents can interact, perform tasks, and learn complex behaviors. OSWorld emphasizes multi-modal interaction, enabling agents to process visual, auditory, and symbolic data for grounded learning in a simulated world.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next