Showing 34 open source projects for "sound%20detection"

View related business solutions
  • Find Hidden Risks in Windows Task Scheduler Icon
    Find Hidden Risks in Windows Task Scheduler

    Free diagnostic script reveals configuration issues, error patterns, and security risks. Instant HTML report.

    Windows Task Scheduler might be hiding critical failures. Download the free JAMS diagnostic tool to uncover problems before they impact production—get a color-coded risk report with clear remediation steps in minutes.
    Download Free Tool
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Audiomentations

    Audiomentations

    A Python library for audio data augmentation

    ...Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products. Mix in another sound, e.g. a background noise. Useful if your original sound is clean and you want to simulate an environment where background noise is present. A folder of (background noise) sounds to be mixed in must be specified. These sounds should ideally be at least as long as the input sounds to be transformed. Otherwise, the background sound will be repeated, which may sound unnatural. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    ...It uses a novel model setup that combines continuous acoustic features with discrete semantic tokens to richly capture sound and meaning across speech, music, and environmental audio.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 5
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    ...It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. High performance on many standard benchmarks: ASR, speech-emotion recognition, vocal sound classification, speech translation etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    clone-voice

    clone-voice

    A sound cloning tool with a web interface, using your voice

    Clone-voice is a local voice-cloning tool that lets you synthesize speech in any target voice or convert one recording into another voice using the same timbre. It is built around Coqui’s XTTS-v2 model, so it inherits multilingual support and modern neural TTS quality while wrapping it in a user-friendly desktop workflow. The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 8
    OSWorld

    OSWorld

    Benchmarking Multimodal Agents for Open-Ended Tasks

    OSWorld is an open-source synthetic world environment designed for embodied AI research and multi-agent learning. It provides a richly simulated 3D world where multiple agents can interact, perform tasks, and learn complex behaviors. OSWorld emphasizes multi-modal interaction, enabling agents to process visual, auditory, and symbolic data for grounded learning in a simulated world.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    AI Chatbot Framework

    AI Chatbot Framework

    Python chatbot framework with Natural Language Understanding

    Building a chatbot can sound daunting, but it’s totally doable. AI Chatbot Framework is an AI powered conversational dialog interface built in Python. With this tool, it’s easy to create Natural Language conversational scenarios with no coding efforts whatsoever. The smooth UI makes it effortless to create and train conversations to the bot and it continuously gets smarter as it learns from conversations it has with people.
    Downloads: 1 This Week
    Last Update:
    See Project
  • DAT Freight and Analytics - DAT Icon
    DAT Freight and Analytics - DAT

    DAT Freight and Analytics operates DAT One truckload freight marketplace

    DAT Freight & Analytics operates DAT One, North America’s largest truckload freight marketplace; DAT iQ, the industry’s leading freight data analytics service; and Trucker Tools, the leader in load visibility. Shippers, transportation brokers, carriers, news organizations, and industry analysts rely on DAT for market trends and data insights, informed by nearly 700,000 daily load posts and a database exceeding $1 trillion in freight market transactions. Founded in 1978, DAT is a business unit of Roper Technologies (Nasdaq: ROP), a constituent of the Nasdaq 100, S&P 500, and Fortune 1000. Headquartered in Beaverton, Ore., DAT continues to set the standard for innovation in the trucking and logistics industry.
    Learn More
  • 10
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    SleepFM-Clinical

    SleepFM-Clinical

    Improve human sleep through scientifically

    ...Rather than simply playing static white noise or ambient tracks, it uses a closed-loop, frequency-modulated framework that responds to user-specific sleep patterns and physiological signals to tailor sound in ways that can enhance sleep onset and depth. The clinical release includes additional features for controlled experimentation, such as logging capabilities, adjustable parameter sets, and protocols suitable for sleep studies and therapeutic settings. It also integrates tools for clinicians to configure sessions, annotate events, and potentially link with biofeedback data, enabling a more nuanced understanding of sound’s effect on sleep architecture over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    ImageBind

    ImageBind

    ImageBind One Embedding Space to Bind Them All

    ...Instead of aligning each pair independently, ImageBind uses image data as the central binding modality, aligning all other modalities to it so they can interoperate zero-shot. This creates a unified embedding space where representations from any modality can be compared or retrieved against any other (e.g., matching sound to text or depth to image). The model is trained using large-scale contrastive learning, leveraging diverse datasets from natural images, videos, audio clips, and sensor data. Once trained, it can perform cross-modal retrieval, zero-shot classification, and multimodal composition without additional fine-tuning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    OpenaiBot

    OpenaiBot

    Refractoring ChatBot+LLM, Gpt-3.5-turbo, ChatGPT Bot/Voice Assistant

    If you don't have the instant messaging platform you need or you want to develop a new application, you are welcome to contribute to this repository. You can develop a new Controller by using Event.py. Compatibility with multiple LLMs and integration with GPT and third-party systems is handled by our llm-kira project on GitHub. It can accurately limit billing, with limits and ID binding. Supports asynchronous operations and can handle multiple requests simultaneously. Allows for private and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    SPPAS

    SPPAS

    SPPAS - the automatic annotation and analyses of speech

    ...Available for free, with open source code, there is simply no other package for linguists to simple use in the automatic annotations of speech, the analyses of any kind of annotated data and the conversion of annotated files. SPPAS is able to produce automatically speech annotations from a recorded speech sound and its orthographic transcription. SPPAS is helpful for the analysis of any annotated data: estimate statistical distributions, make requests, manage files, visualize annotations. SPPAS offers a file converter from/to a wide range of formats: xra, TextGrid, eaf, trs... <https://sppas.org>
    Downloads: 11 This Week
    Last Update:
    See Project
  • 15
    CC2.TV / CC2 - Audio- und TV-Datenbank

    CC2.TV / CC2 - Audio- und TV-Datenbank

    Meta-Datenbank-Anwendung für die Audio- und TV-Sendungen des CC2.TV

    Dieses Programm stellt eine Meta-Datenbank-Anwendung für die Audio- und Video-Sendungen des CC2.TV für GNU/Linux Systeme zur Verfügung. Es ermöglicht das Durchsuchen, Verwalten und Abspielen der umfangreichen Inhalte des CC2.TV-Audiocasts und -Videocasts. Ziel ist es, die über 3000 Audiocast-Themen und über 1000 Videocast-Themen, die sich auf Computerthemen, Technik und gesellschaftliche Aspekte konzentrieren, komfortabel zugänglich zu machen. Für die volle Funktionalität,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Amiga Memories

    Amiga Memories

    A walk along memory lane

    Amiga Memories is a project (started & released in 2013) that aims to make video programmes that can be published on the internet. The images and sound produced by Amiga Memories are 100% automatically generated. The generator itself is implemented in Squirrel, the 3D rendering is done on GameStart 3D. An Amiga Memories video is mostly based on a narrative. The purpose of the script is to define the spoken and written content. The spoken text will be read by a voice synthesizer (Text To Speech or TTS), the written text is simply drawn on the image as subtitles. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DeepSpeech

    DeepSpeech

    Open source embedded speech-to-text engine

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. A pre-trained English model is available for use and can be downloaded following the...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 18

    FastoCloud PRO

    IPTV/NVR/CCTV/Video cloud https://fastocloud.com

    IPTV/Video cloud Features: Cross-platform (Linux, MacOSX, FreeBSD, Raspbian/Armbian) GPU/CPU Encode/Decode/Post Processing Stream statistics CCTV Adaptive hls streams Load balancing Temporary urls HLS push EPG scanning Subtitles to text conversions AD insertion Logo overlay Video effects Relays Timeshifts Catchups Playlists Restream/Transcode from online streaming services like Youtube, Twitch ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Defox text to speech and downloader

    Defox text to speech and downloader

    Written or imported text offline read or online download.

    This software design to convert text to speech and download the converted speech. Description : • Installation setup with two languages (English, French) • Two areas called text reading and speech downloading • Many languages supported to download center Note 1: I'm a student yet and I'm not in the software designing industry. Therefore maybe I haven't software making skills. I'm worried about that. ! Note 2 : When you double click on the software maybe it will get some seconds...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    FM2TXT

    FM2TXT

    RtlSdr listen to radio, recognize audio, and writes text file log

    Just log your favorite FM station speech to a text file using rtl-sdr dongle and speech recognition. Cross-platform tool. Follow the README on the download page for Windows installation. https://sourceforge.net/projects/fm2txt-rtlsdr/files/ If you prefer GitHub source, not SF: https://github.com/randaller/fm2txt For those, who want to recognize from soundcard, not from rtl-sdr (this allows to transcribe NFM etc): https://github.com/randaller/souncard2txt
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21

    Distant Speech Recognition

    Beamforming and Speech Recognition Toolkit

    BTK contains C++ and Python libraries that implement speech processing and microphone array techniques such as speech feature extraction, speech enhancement, speaker tracking, beamforming, dereverberation and echo cancellation algorithms. The Millennium ASR provides C++ and python libraries for automatic speech recognition. The Millennium ASR implements a weighted finite state transducer (WFST) decoder, training and adaptation methods. These toolkits are meant for facilitating research and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22

    Steel TTS

    A cross-platform wrapper for common text-to-speech engines in Python

    Steel is a cross-platform package for using common text-to-speech (speech synthesis) engines in Python. Steel currently supports the following TTS software: - Microsoft Speech API 5 (SAPI5) - eSpeak - NS Speech Synthesis - FreeTTS Documentation: http://sourceforge.net/p/steeltts/wiki/ Bug Tracker: http://sourceforge.net/p/steeltts/tickets/ If you are interested in contributing to the Steel TTS codebase, or would like to make a feature-request, please contact the lead...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    A python module that provides algorithms for advanced search - basically all you need to build a search engine.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    A collection of software made by Milos Rancic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    InproTK

    InproTK

    An Incremental Spoken Dialogue Processing Toolkit

    InproTK is an Incremental Spoken Dialogue Processing Toolkit, that is, a toolkit to help you build dialogue systems that listen and talk incrementally, allowing for advanced interactional behaviour. Please see our Wiki for more information: http://sourceforge.net/p/inprotk/wiki/
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next