Showing 10 open source projects for "transcribing"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    WhisperJAV

    WhisperJAV

    Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

    WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. WhisperJAV introduces a specialized pipeline that separates text generation from timestamp alignment, allowing the system to generate transcripts and then align them with audio using forced alignment techniques. ...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 2
    whisper-timestamped

    whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps

    ...Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models. Besides, a confidence score is assigned to each word and each segment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Style-Bert-VITS2

    Style-Bert-VITS2

    Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles

    Style-Bert-VITS2 is a text-to-speech system based on Bert-VITS2 that focuses on highly controllable voice styles and emotional expression. It takes the original Bert-VITS2 v2.1 and its Japanese-Extra variant and extends them so you can control emotion and speaking style with fine-grained intensity, not just choose a generic tone. The project targets both power users and beginners: Windows users without Git or Python can install and run it using bundled .bat scripts, while advanced users can...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    WhisperSpeech

    WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper

    WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS: Whisper is used to produce semantic tokens, EnCodec compresses the waveform into acoustic tokens, and Vocos reconstructs high-fidelity audio from those tokens. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 5
    A2M — Audio to MIDI

    A2M — Audio to MIDI

    A2M is a desktop app that converts AUDIO TO MIDI in one click.

    A2M (Audio To MIDI) is a simple desktop tool for transcribing local audio files into MIDI files with one click. It is designed primarily for piano recording transcription, and works best on solo piano recordings. Using A2M is straightforward: Select an audio file, click Convert, and the application generates a MIDI file automatically in your Downloads/A2M folder. All processing is done locally on your device, no uploads, no accounts, and no telemetry.
    Leader badge
    Downloads: 77 This Week
    Last Update:
    See Project
  • 6
    Piano transcription

    Piano transcription

    Task of transcribing piano recordings into MIDI files

    Piano transcription is an open-source high-resolution piano transcription system by ByteDance that converts raw audio recordings of piano performance into symbolic MIDI files — detecting note onsets, offsets, pitch, velocity, and even pedal usage. The system is implemented in Python (PyTorch) and is capable of accurate transcription of polyphonic piano recordings, even with complex passages and pedal techniques, making it suitable for classical piano music. By using this transcription tool,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    AutoSub

    AutoSub

    A CLI script to generate subtitle files (SRT/VTT/TXT) for any video

    AutoSub is a Python-based tool designed to automatically generate subtitles for video or audio content using speech recognition technology. It processes media files by extracting audio, transcribing spoken content, and generating subtitle files in standard formats. The tool supports multiple languages and can integrate with translation systems to produce subtitles in different languages. It is designed for automation, allowing batch processing of multiple media files. AutoSub leverages FFmpeg for media handling and integrates with speech recognition engines for transcription. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    plone product for uploading, transcribing, indexing and translating of audio files
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Keystroke enables the transcription and logging of continuous media such as audio or video. It features media control (pause/play/seek) through keys to increase efficiency while transcribing and/or logging.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    GUIDE (GUgyeol Integrated Decipherment Environment) is a set of tools for browsing historical document images, transcribing, deciphering and translating it. Developed for studying old buddhist codices, it would be serviceable for studying any papyri.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo