Search Results for "content based audio retrievel"

Sort By:

Showing 176 open source projects for "content based audio retrievel"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
1

Step-Audio-EditX

LLM-based Reinforcement Learning audio edit model

Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level...

Downloads: 0 This Week

Last Update: 2026-04-09
See Project
2

Step-Audio 2

Multi-modal large language model designed for audio understanding

Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. ...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
3

Podcastfy.ai

Transforming Multimodal Content into Captivating Multilingual Audio

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization and scale.

Downloads: 1 This Week

Last Update: 2024-11-16
See Project
4

BlogWizard

Generate blog articles from video or audio

BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections, formatting, and possibly metadata. ...

Downloads: 0 This Week

Last Update: 2025-12-19
See Project
Add Two Lines of Code. Get Full APM.
AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.

Start Free
5

Twspace-dl

A python module to download twitter spaces

Twspace-dl is a Python-based tool designed to download audio content from Twitter Spaces, enabling users to archive live or recorded sessions locally. It works by extracting streaming URLs and processing them with FFmpeg to generate downloadable audio files. The tool supports both command-line and graphical interfaces, making it accessible to different types of users.

Downloads: 5 This Week

Last Update: 5 days ago
See Project
6

FineTune

FineTune, a macOS menu bar app to control volume for each app

...Through a clean, minimal interface accessible from the menu bar, FineTune lets users isolate and balance application volumes, assign specific outputs (like headphones versus speakers), and tweak equalization to enhance or tailor audio based on content or personal preference. Its integration into the OS workflow means that these adjustments persist across sessions and respect the user’s choices without requiring constant interaction with deeper system settings.

Downloads: 62 This Week

Last Update: 2026-04-23
See Project
7

Unrud Video Downloader

Download videos from websites like YouTube and many others

Video Downloader is a desktop application designed to simplify the process of downloading videos from various online platforms through a user-friendly graphical interface. Built on top of yt-dlp, it abstracts the complexity of command-line tools and provides an accessible way for users to retrieve video and audio content. The application supports a wide range of features, including downloading entire playlists, handling private or password-protected content, and automatically selecting optimal formats based on user preferences. It also allows users to convert videos into audio files such as MP3, making it useful for media extraction workflows. ...

Downloads: 13 This Week

Last Update: 2026-04-09
See Project
8

AudioMuse-AI

AudioMuse-AI is an Open Source Dockerized environment

AudioMuse-AI is an open-source system designed to automatically generate playlists and analyze music libraries using artificial intelligence and audio signal processing techniques. The platform runs locally in a Dockerized environment and performs detailed sonic analysis on audio files to understand characteristics such as tempo, mood, and acoustic similarity. By analyzing the underlying audio content rather than relying on external metadata services, the system can organize large personal...

Downloads: 12 This Week

Last Update: 6 days ago
See Project
9

AI-Media2Doc

AI tool converting video/audio into structured documents instantly

AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. ...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

Markdownify MCP Server

Convert files and web content into clean, usable Markdown easily

Markdownify MCP is a Model Context Protocol server that converts many types of files and web content into clean Markdown. It supports formats such as PDFs, images, audio with transcription, DOCX, XLSX, and PPTX, along with web sources like YouTube transcripts, Bing results, and general webpages. Markdownify MCP is designed to simplify content extraction and make data easier to read, share, and reuse in structured workflows. Developers can install dependencies, build, and run the server locally, then extend functionality by modifying its TypeScript-based tools and server logic. ...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
11

Navidrome

Your Personal Streaming Service

...Navidrome also implements the Subsonic API, making it compatible with many third-party players and apps across different platforms. It automatically monitors and indexes your library for new content, supports on-the-fly transcoding to adapt audio streams to different network conditions.

Downloads: 11 This Week

Last Update: 2026-04-12
See Project
12

Claude Code Video Vision

Give Claude the ability to watch and understand videos

...The system dynamically adapts how much data it extracts based on the user’s query, adjusting frame rate, resolution, and time windows to optimize both performance and token efficiency. It supports multiple backends for audio processing, including local and cloud-based options, enabling flexible deployment depending on privacy or performance requirements.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
13

AI YouTube Shorts Generator

A python tool that uses GPT-4, FFmpeg, and OpenCV

AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies subtitle overlays, producing a polished short video without manual editing. ...

Downloads: 13 This Week

Last Update: 3 days ago
See Project
14

Canvas LMS

The open LMS by Instructure, Inc.

Canvas LMS is a full-featured learning management system designed for K–12, higher-ed, and professional training, with a strong emphasis on usability and openness. Instructors build courses from modular content—pages, assignments, discussions, quizzes—and organize them into learning paths with prerequisites and due dates. Rich grading tools like SpeedGrader streamline assessment with rubrics, inline annotations, and audio/video feedback, while the gradebook supports weighting, outcomes, and late/missing policies. A robust API, standards like LTI/IMS Common Cartridge, and SIS integrations make it straightforward to connect Canvas with publisher content, analytics tools, proctoring, and institutional systems. ...

Downloads: 58 This Week

Last Update: 3 days ago
See Project
15

Bili23 Downloader

Cross platform GUI tool for downloading videos from Bilibili sites

...It can parse different types of links such as standard video pages, short links, and collection or activity pages to automatically retrieve downloadable media. It also allows users to choose video resolution, audio quality, and encoding format based on the available sources. Additional features include downloading subtitles, comments, metadata, and artwork associated with videos.

Downloads: 8 This Week

Last Update: 2026-04-07
See Project
16

WhisperJAV

Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

...WhisperJAV introduces a specialized pipeline that separates text generation from timestamp alignment, allowing the system to generate transcripts and then align them with audio using forced alignment techniques. The framework supports several speech recognition models, including Qwen-based ASR systems and fine-tuned Whisper models trained on domain-specific dialogue.

Downloads: 22 This Week

Last Update: 3 days ago
See Project
17

LatentSync

Taming Stable Diffusion for Lip Sync

LatentSync is an open-source framework from ByteDance that produces high-quality lip-synchronization for video by using an audio-conditioned latent diffusion model, bypassing traditional intermediate motion representations. In effect, given a source video (with masked or reference frames) and an audio track, LatentSync directly generates frames whose lip motions and expressions align with the audio, producing convincing talking-head or animated lip-sync output. The system leverages a U-Net...

Downloads: 6 This Week

Last Update: 2025-12-02
See Project
18

CloudReader

A netease cloud music based UI

A netease cloud music-based UI, using wanandroid API development accord with Google Material Desgin reading class open-source projects. Kotlin && Netease cloud music Ui && Retrofit2 + RxJava2 + Room + MVVM-databinding && Wanandroid API. NetEase Cloud Music was officially released on April 23, 2013. It is an online music product that focuses on discovery and sharing and has a strong social use. I believe that everyone who has used it will know that the experience it gives is excellent. The...

Downloads: 0 This Week

Last Update: 2025-03-31
See Project
19

SoniTranslate

Synchronized Translation for Videos

SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets the generated dub track stay in sync with the original video structure. ...

Downloads: 55 This Week

Last Update: 2025-11-28
See Project
20

Project NEWM

Repo for Android and iOS mobile apps utilizing a KMM

NEWM Mobile is a mobile application designed as part of the NEWM ecosystem, which focuses on decentralized music distribution and ownership. The app provides a user interface for interacting with the platform, allowing users to explore, manage, and engage with music assets in a decentralized environment. It integrates with blockchain-based systems to enable ownership, licensing, and monetization of music content. The application is built for mobile platforms, emphasizing usability and...

Downloads: 2 This Week

Last Update: 2026-04-21
See Project
21

NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT

NExT-GPT is an open-source research framework that implements an advanced multimodal large language model capable of understanding and generating content across multiple modalities. Unlike traditional models that primarily handle text, NExT-GPT supports input and output combinations involving text, images, video, and audio in a unified architecture. The system connects a large language model with multimodal encoders and diffusion-based decoders so it can interpret information from different sensory formats and generate responses in different media types. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
22

MusicFree

Plug-in, customized, ad-free free music player

The MusicFree project is an open-source, plugin-based music player designed for mobile platforms such as Android and HarmonyOS, emphasizing flexibility, customization, and privacy. Unlike traditional music apps, it does not include built-in audio sources but instead relies entirely on plugins to fetch and manage music content. This modular architecture allows users to integrate multiple sources and extend functionality without modifying the core application.

Downloads: 2 This Week

Last Update: 2026-04-17
See Project
23

ReClip

Download videos from almost any website

ReClip is a lightweight, self-hosted media downloader that provides a simple web-based interface for downloading videos and audio from a wide range of online platforms. Built around the yt-dlp engine, it supports over a thousand websites, including major platforms like YouTube, TikTok, and Instagram, allowing users to retrieve media content in various formats. The application emphasizes simplicity and minimalism, featuring a clean interface built with plain HTML, CSS, and JavaScript without requiring complex build steps or frameworks. ...

Downloads: 56 This Week

Last Update: 2026-04-09
See Project
24

YouTube Playlist Downloader

A tool to download whole playlists, channels or single videos

YoutubePlaylistDownloader is a desktop-based utility designed to simplify the process of downloading entire YouTube playlists with minimal user interaction. The tool allows users to input a playlist URL and automatically retrieve all associated videos, handling the sequence and download process in a structured way. It supports multiple output formats and quality settings, enabling users to choose between audio or video downloads depending on their needs.

Downloads: 28 This Week

Last Update: 2026-03-18
See Project
25

Whisper-WebUI

A Web UI for easy subtitle using whisper model

Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools.

Downloads: 8 This Week

Last Update: 2026-03-18
See Project