Search Results for "dvd-audio" - Page 15

Sort By:

Showing 1086 open source projects for "dvd-audio"

View related business solutions

Python Clear Filters & Widen Search

Atera - an All-in-one platform for IT management
Ideal for IT departments and MSPs (managed service providers)

Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!

Try Atera now
Your monitoring isn't a stack. It's a pile. Fix that.
Errors, performance, logs, uptime. One install, one invoice, one UI.

Replace Datadog, New Relic, and Sentry without adding three more dashboards.

Free 30 days.
1

Tensorflow Transformers

State of the art faster Transformer with Tensorflow 2.0

...These models can be applied on text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Images, for tasks like image classification, object detection, and segmentation. Audio, for tasks like speech recognition and audio classification. Faster AutoReggressive Decoding, TFlite support, creating TFRecords is simple. Auto-Batching tf.data.dataset or tf.ragged tensors. Everything is dictionary (inputs and outputs) Multiple mask modes like causal, user-defined, prefix. tensorflow-text tokenizer support. Supports GPU, TPU, multi-GPU trainer with wandb, multiple callbacks, auto tensorboard.

Downloads: 0 This Week

Last Update: 2023-03-23
See Project
2

NWT - Pytorch (wip)

Implementation of NWT, audio-to-video generation, in Pytorch

Implementation of NWT, audio-to-video generation, in Pytorch. The paper proposes a new discrete latent representation named Memcodes, which can be succinctly described as a type of multi-head hard-attention to learned memory (codebook) key/values. They claim the need for less codes and smaller codebook dimensions in order to achieve better reconstructions.

Downloads: 0 This Week

Last Update: 2023-03-22
See Project
3

AugLy

A data augmentations library for audio, image, text, and video

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-library. These sub-libraries include both function-based and class-based transforms, composition operators, and have the option to provide metadata about the transform applied, including its intensity. AugLy is a great library to utilize for augmenting your data in model training, or to evaluate the robustness gaps of your model! ...

Downloads: 0 This Week

Last Update: 2022-03-29
See Project
4

Piano transcription

Task of transcribing piano recordings into MIDI files

Piano transcription is an open-source high-resolution piano transcription system by ByteDance that converts raw audio recordings of piano performance into symbolic MIDI files — detecting note onsets, offsets, pitch, velocity, and even pedal usage. The system is implemented in Python (PyTorch) and is capable of accurate transcription of polyphonic piano recordings, even with complex passages and pedal techniques, making it suitable for classical piano music.

Downloads: 4 This Week

Last Update: 2025-12-02
See Project
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
5

AutoSub

A CLI script to generate subtitle files (SRT/VTT/TXT) for any video

AutoSub is a Python-based tool designed to automatically generate subtitles for video or audio content using speech recognition technology. It processes media files by extracting audio, transcribing spoken content, and generating subtitle files in standard formats. The tool supports multiple languages and can integrate with translation systems to produce subtitles in different languages. It is designed for automation, allowing batch processing of multiple media files. ...

Downloads: 11 This Week

Last Update: 2026-04-28
See Project
6

Music Source Separation

Separate audio recordings into individual sources

...The repository provides training scripts (e.g. using datasets such as MUSDB18), preprocessing steps (audio-to-HDF5 packing, indexing), evaluation pipelines, and inference scripts to perform separation on arbitrary audio files. This makes the project useful both for researchers in music information retrieval / audio machine learning and for hobbyists or practitioners who want to experiment with remixing, karaoke, or audio editing.

Downloads: 10 This Week

Last Update: 2025-12-02
See Project
7

youtube-dl

Download videos from YouTube (and more sites)

...It is released to the public domain, which means you can modify it, redistribute it or use it however you like. youtube-dl is a powerful, open-source command-line program designed to facilitate the downloading of videos and audio from popular video streaming websites. Widely recognized for its versatility, youtube-dl supports a vast array of platforms beyond YouTube, including Vimeo, Dailymotion, and many others. The tool provides users with advanced customization options, such as selecting specific video formats, extracting audio, bypassing geographic restrictions, and downloading entire playlists or channels.

1 Review

Downloads: 68 This Week

Last Update: 2024-12-30
See Project
8

StreamTuner2 ♪♬#

Internet radio directory browser

Streamtuner2 is an internet radio station and video browser. It simply lists stations in categories from different directories. Launches your preferred media apps for playback. It's built in Python now, but retains UI similarity with the original StreamTuner 0.99

6 Reviews

Downloads: 44 This Week

Last Update: 2022-02-22
See Project
9

18k-youtube-download

❤️ 18k-youtube-download with python and kivy Dev.Wk-18k

18k-youtube-download A simple project to make gui on kivy mixed with the function of downloading music from youtube with youtube_dl package

Downloads: 0 This Week

Last Update: 2022-02-25
See Project
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
10

SVoice (Speech Voice Separation)

We provide a PyTorch implementation of the paper Voice Separation

...The repository includes all necessary scripts for training, dataset preparation, distributed training, evaluation, and audio separation.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
11

VoiceFixer

General Speech Restoration

...Unlike many single-purpose noise reduction tools, VoiceFixer targets a “general speech restoration” problem (GSR), capable of handling multiple types of distortions at once, which makes it suitable for old recordings, phone-call audio, amateur voice recordings, or archival media. Evaluations show that VoiceFixer significantly improves both objective and subjective audio quality compared to baseline speech-enhancement methods.

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
12

Spleeter

Deezer source separation library including pretrained models

...It makes it easy to train music source separation models (assuming you have a dataset of isolated sources), and provides already trained state of the art models for performing various flavours of separation. 2 stems and 4 stems models have state of the art performances on the musdb dataset. Spleeter is also very fast as it can perform separation of audio files to 4 stems 100x faster than real-time when run on a GPU. We designed Spleeter so you can use it straight from command line as well as directly in your own development pipeline as a Python library. It can be installed with Conda, with pip or be used with Docker.

1 Review

Downloads: 75 This Week

Last Update: 2021-09-03
See Project
13

Mocking Bird

Clone a voice in 5 seconds to generate arbitrary speech in real-time

MockingBird is an open-source voice cloning and real-time speech generation toolkit that lets you clone a speaker’s voice from a short audio sample (reportedly as little as 5 seconds) and then synthesize arbitrary speech in that voice. It builds on deep-learning based TTS / voice-cloning technology (in the lineage of projects such as Real-Time-Voice-Cloning), but extends it with support for Mandarin Chinese and multiple Chinese speech datasets — broadening its applicability beyond English. ...

1 Review

Downloads: 2 This Week

Last Update: 2023-03-23
See Project
14

pytube

A lightweight, dependency-free Python library

Pytube is a lightweight, dependency-free Python library that enables downloading YouTube videos and audio streams with minimal setup. It supports video resolution selection, progressive or adaptive streams, and caption downloads. Pytube is ideal for automation scripts, archiving tools, and media applications that need to interface with YouTube content programmatically.

Downloads: 7 This Week

Last Update: 2025-07-01
See Project
15

Telegram WebRTC (VoIP)

Voice chats, private incoming and outgoing calls in Telegram

Telegram WebRTC (VoIP) is a Python and C++ library that enables real-time voice and video communication features for Telegram bots and clients. It provides an interface for joining, managing, and streaming audio or video in Telegram group calls and voice chats. The library is built on top of low-level communication protocols, ensuring efficient handling of real-time media streams. It supports integration with FFmpeg and other tools for processing audio and video before transmission. tgcalls allows developers to create bots that can play music, stream content, or interact with live voice channels programmatically. ...

Downloads: 0 This Week

Last Update: 2026-05-01
See Project
16

pydatascope

Software oscilloscope using Python and tkinter

Software oscilloscope using Python and tkinter. Supports multiple sources: socket, file, audio, USB. Displays data by samples, time or frequency. Scales the input automatically or manually.

1 Review

Downloads: 0 This Week

Last Update: 2021-09-25
See Project
17

VidCutter

A modern yet simple multi-platform video cutter and joiner

A modern, simple to use, constantly evolving and hella fast MEDIA CUTTER + JOINER w/ frame-accurate SmartCut technology, chapter support, media stream selection for audio + subtitle channels and blackdetect video filter support to automatically detect scene changes or skip commercials in digital TV recordings. Chapter support allows scene chapter names to be included in final media metadata. NOTE: results will only work in media players that support chapters. Flatpak release includes the latest stable versions of FFmpeg, libmpv, MediaInfo, and PyQt5 running on the KDE platform runtime.

Downloads: 15 This Week

Last Update: 2024-06-29
See Project
18

Denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)

...The implementation includes data augmentation techniques applied to the raw waveforms (e.g. noise mixing, reverberation) to improve model robustness and generalization to diverse noise types. The project supports both offline denoising (batch inference) and live audio processing (e.g. via loopback audio interfaces), making it practical for real-time use in calls or recording. The codebase includes training and evaluation scripts, configuration management via Hydra, and pretrained models on standard noise datasets.

Downloads: 2 This Week

Last Update: 2025-10-07
See Project
19

DeepSpeech

Open source embedded speech-to-text engine

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. A pre-trained English model is available for use and can be downloaded following the...

Downloads: 4 This Week

Last Update: 2021-04-08
See Project
20

Pydub

Manipulate audio with a simple and easy high level interface

Manipulate audio with a simple and easy high level interface. You can pass an optional bitrate argument to export using any syntax ffmpeg supports. Any further arguments supported by ffmpeg can be passed as a list in a 'parameters' argument, with switch first, argument second. Note that no validation takes place on these parameters, and you may be limited by what your particular build of ffmpeg/avlib supports.

Downloads: 2 This Week

Last Update: 2021-10-08
See Project
21

Savify

Download Spotify songs to mp3 with full metadata and cover art

Savify is a command-line tool designed to download and archive music from Spotify by leveraging YouTube as the audio source while preserving Spotify metadata. It allows users to input playlists, albums, or individual tracks and automatically retrieves matching audio files with proper tagging. The tool integrates FFmpeg and yt-dlp to handle downloading, conversion, and formatting into common audio formats such as MP3. It enriches files with metadata including artist, album, cover art, and track information to maintain organized music libraries. savify supports batch downloading, enabling users to process entire playlists efficiently. ...

Downloads: 4 This Week

Last Update: 2026-04-24
See Project
22

video-to-ascii

It is a simple python package to play videos in the terminal

...It processes each frame of a video, maps pixel brightness and color values to characters, and renders them in real time within terminal constraints. The tool adapts video resolution to match terminal dimensions, ensuring a coherent and readable output despite limited character space. It can optionally include audio playback when additional dependencies such as FFmpeg and PortAudio are installed. The project supports multiple rendering strategies and allows exporting ASCII output to files for sharing or reuse. It also includes color approximation using ANSI palettes to enhance visual fidelity within terminal limitations. Designed as both a creative and technical project, it demonstrates how video data can be transformed into text-based representations.

Downloads: 0 This Week

Last Update: 2026-04-24
See Project
23

HiFi-GAN

Generative Adversarial Networks for Efficient and High Fidelity Speech

...The model targets a sweet spot between sample quality and generation speed, outperforming many previous GAN vocoders while being far faster than typical autoregressive models. In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
24

OpenDAFF

Directional Audio File Format

OpenDAFF is a free, open-source software package for directional audio data - like the directivity of microphones, speakers, as well as head-related transfer functions (HRTFs)

Downloads: 4 This Week

Last Update: 2021-01-08
See Project
25

Youtube Video Downloader

Youtube Video Downloader is Open Source GUI tool

Youtube Video Downloader is Open Source GUI tool to download Youtube video. It is Developed with Python, Qt, and Pytube Library. It is Multi-thread Application. Best Available Option download video in highly available Quality . Download Videos in 720p, 480p, 360p etc.

2 Reviews

Downloads: 4 This Week

Last Update: 2021-01-06
See Project