Search Results for "audio to text converter"

Showing 1004 open source projects for "audio to text converter"

View related business solutions
  • All-in-One Payroll and HR Platform Icon
    All-in-One Payroll and HR Platform

    For small and mid-sized businesses that need a comprehensive payroll and HR solution with personalized support

    We design our technology to make workforce management easier. APS offers core HR, payroll, benefits administration, attendance, recruiting, employee onboarding, and more.
  • Digital Payments by Deluxe Payment Exchange Icon
    Digital Payments by Deluxe Payment Exchange

    A single integrated payables solution that takes manual payment processes out of the equation, helping reduce risk and cutting costs for your business

    Save time, money and your sanity. Deluxe Payment Exchange+ (DPX+) is our integrated payments solution that streamlines and automates your accounts payable (AP) disbursements. DPX+ ensures secure payments and offers suppliers alternate ways to receive funds, including mailed checks, ACH, virtual credit cards, debit cards, or eCheck payments. By simply integrating with your existing accounting software like QuickBooks®, you’ll implement efficient payment solutions for AP with ease—without costly development fees or untimely delays.
  • 1
    audio-diffusion-pytorch

    audio-diffusion-pytorch

    Audio generation using diffusion models, in PyTorch

    A fully featured audio diffusion library, for PyTorch. Includes models for unconditional audio generation, text-conditional audio generation, diffusion autoencoding, upsampling, and vocoding. The provided models are waveform-based, however, the U-Net (built using a-unet), DiffusionModel, diffusion method, and diffusion samplers are both generic to any dimension and highly customizable to work on other formats. Note: no pre-trained models are provided here, this library is meant for research...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Text Generation Web UI

    Text Generation Web UI

    A gradio web UI for running Large Language Models like LLaMA

    A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS). Very...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    Evernote to Markdown converter

    Evernote to Markdown converter

    Convert Evernote .enex files to Markdown

    Evernote2md is a CLI tool to convert Evernote notes exported in *.enex format to a directory with markdown files.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    xrdp

    xrdp

    An open source RDP server

    xrdp provides a graphical login to remote machines using RDP (Microsoft Remote Desktop Protocol). xrdp accepts connections from a variety of RDP clients: FreeRDP, rdesktop, NeutrinoRDP and Microsoft Remote Desktop Client (for Windows, macOS, iOS and Android). As Windows-to-Windows Remote Desktop can, xrdp supports not only graphics remoting but also two-way clipboard transfer (text, bitmap, file), audio redirection, drive redirection (mount local client drives on a remote machine). Connect...
    Downloads: 65 This Week
    Last Update:
    See Project
  • Engage for Amazon Connect, the Pre-built Contact Center Platform Icon
    Engage for Amazon Connect, the Pre-built Contact Center Platform

    Utilizing the power of AWS and Generative AI, Engage provides your customers with highly personalized, exceptional experiences.

    Engage is a pre-built, intelligent contact center platform that transforms customer service.
  • 5
    Media Converter
    Media Converter is a plugin based video and audio converter. It uses FFmpeg as its engine, which allows it to convert to a lot of formats.
    Leader badge
    Downloads: 136 This Week
    Last Update:
    See Project
  • 6
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented...
    Downloads: 44 This Week
    Last Update:
    See Project
  • 7
    fre:ac

    fre:ac

    The fre:ac audio converter project

    fre:ac is a free audio converter and CD ripper with support for various popular formats and encoders. It converts freely between MP3, M4A/AAC, FLAC, WMA, Opus, Ogg Vorbis, Speex, Monkey's Audio (APE), WavPack, WAV and other formats. With fre:ac you easily rip your audio CDs to MP3 or M4A files for use with your hardware player or convert files that do not play with other audio software. You can even convert whole music libraries retaining the folder and filename structure. The integrated CD...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 8
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems. VALL...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 9
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using pip...
    Downloads: 22 This Week
    Last Update:
    See Project
  • ContractSafe: Contract Management Software Icon
    ContractSafe: Contract Management Software

    Take Control Of Your Contracts Without Wrecking The Budget

    Ditch those spreadsheets, shared drives & crazy-expensive solutions with too many bells & whistles. ContractSafe offers the simplest way to manage your contracts efficiently without breaking the bank.
  • 10
    Frescobaldi

    Frescobaldi

    LilyPond sheet music text editor

    Frescobaldi is a free and open source LilyPond sheet music text editor. Designed to be powerful yet lightweight and easy-to-use, Frescobaldi offers great functionality and a host of useful features such as music view with advanced two-way Point & Click, Midi capturing to enter music, a Snippet Manager and many more. Frescobaldi is named after Girolamo Frescobaldi (1583-1643), an Italian composer of keyboard music in the late Renaissance and early Baroque period.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 11
    Coqui TTS

    Coqui TTS

    A deep learning toolkit for Text-to-Speech, battle-tested in research

    TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects. High-performance Deep Learning models for Text2Speech tasks. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 12
    Whishper

    Whishper

    Transcribe any audio to text, translate and edit subtitles 100% locall

    Open-source, local-first audio transcription and subtitling suite with a simple web UI. Thanks to open-source technologies, Whishper can run 100% offline. Your data never leaves your computer. Whishper allows you to translate your transcriptions to and from more than 60 languages thanks to Argos Translate and LibreTranslate. Download the transcriptions in many formats (json, txt, vtt, srt). Easily edit your subtitles right in the Web-UI.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 13
    Basic Pitch

    Basic Pitch

    A lightweight audio-to-MIDI converter with pitch bend detection

    Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence Lab. It's small, easy-to-use, pip install-able and npm install-able via its sibling repo. Basic Pitch may be simple, but it's is far from "basic"! basic-pitch is efficient and easy to use, and its multi pitch support, its ability to generalize across instruments, and its note accuracy compete with much larger and more resource-hungry AMT systems...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 14
    Nextcloud Server

    Nextcloud Server

    A safe home for all your data

    Nextcloud server is a free and open source server software that allows you to store all of your data in a server of your choosing. With Nextcloud you can easily access and store data in the data center you trust, sync data among various devices, and share your data for collaboration purposes. It offers the best security in the self hosted file sync and share world, and is expandable with hundreds of apps.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 15
    JUCE

    JUCE

    JUCE is an open-source cross-platform C++ application framework

    JUCE is an open-source cross-platform C++ application framework for creating high-quality desktop and mobile applications, including VST, VST3, AU, AUv3, RTAS and AAX audio plug-ins. JUCE can be easily integrated with existing projects via CMake, or can be used as a project generation tool via the Projucer, which supports exporting projects for Xcode (macOS and iOS), Visual Studio, Android Studio, Code::Blocks and Linux Makefiles as well as containing a source code editor. JUCE projects can...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16
    StoryTeller

    StoryTeller

    Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.

    A multimodal AI story teller, built with Stable Diffusion, GPT, and neural text-to-speech (TTS). Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals. To develop locally, install dev dependencies and install pre-commit hooks. This will automatically trigger linting and code quality checks before each...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    Asciidoctor PDF

    Asciidoctor PDF

    Asciidoctor PDF: A native PDF converter for AsciiDoc

    A fast text processor & publishing toolchain for converting AsciiDoc to HTML5, DocBook & more. Asciidoctor is a fast, open source, Ruby-based text processor for parsing AsciiDoc® into a document model and converting it to output formats such as HTML 5, DocBook 5, manual pages, PDF, EPUB 3, and other formats. Asciidoctor also has an ecosystem of extensions, converters, build plugins, and tools to help you author and publish content written in AsciiDoc.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    The most flexible data annotation tool. Quickly installable. Build custom UIs or use pre-built labeling templates. Detect objects on image, bboxes, polygons, circular, and keypoints supported. Partition image into multiple segments. Use ML models to pre-label and optimize the process. Label Studio is an open-source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    sherpa-onnx

    sherpa-onnx

    Speech-to-text, text-to-speech, and speaker recognition

    Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Git Large File Storage

    Git Large File Storage

    Git extension for versioning large files

    An open source Git extension for versioning large files. Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise. Download and install the Git command line extension. Once downloaded and installed, set up Git LFS for your user account. In each Git repository where you want to use Git LFS, select the file types you'd like Git LFS...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Chorus

    Chorus

    The first editor for Spigot configurations

    The first editor for Spigot configurations. Lightweight and ergonomic. Chorus provides internal SFTP and FTP clients that let you can connect to your server and remotely edit the files you need, all in one place. Chorus is made to save your time. Interactive and high-fidelity previews let you see how your plugins will look in game. No more jumping back and forth! Chorus comes with an awesome rich-text editor to easily create colored and formatted strings. Insert items, effects, entities...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Aspia

    Aspia

    Remote desktop and file transfer tool

    Free open-source application for real-time desktop remote control and file transfer. With Aspia, you can create your own NAT traversal infrastructure (using Router and Relay servers) with connection by ID or use direct connections. Aspia supports many features. Among them, detailed information about the system, task manager, audio, and text chat. It is safe. All transmitted data is encrypted. Add computers for quick connection, and create computer groups. Encryption of address books...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    HospitalRun website

    HospitalRun website

    HospitalRun website

    With Jekyll 3 it was must necessary to switch from GitHub Pages to Netlify. hospitalrun.io is made with Jekyll a simple, blog-aware, static site generator. It takes a template directory containing raw text files in various formats, runs it through a converter (like Markdown) and our Liquid renderer, and spits out a complete, ready-to-publish static website suitable for serving on Netlify.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    OpenAI Web Application

    OpenAI Web Application

    A web application that allows users to interact with OpenAI's models

    A web application that allows users to interact with OpenAI's modles through a simple and user-friendly interface. This app is for demo purpose to test OpenAI API and may contain issues/bugs. User-friendly interface for making requests to the OpenAI API. Responses are displayed in a chat-like format. Select Models (Davinci, Codex, DALL·E, Whisper) based on your needs. Create AI Images (DALL·E). Audio-Text Transcribe (Whisper). Highlight code syntax. Type in the input field and press enter...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Intelligent Java

    Intelligent Java

    Integrate with the latest language models, image generation and speech

    ... results without tuning. Generate text; Cohere allows you to generate a language model to suit your specific needs. Generate audio from text; Access DeepMind’s speech models. The only dependencies is GSON. Required to add manually when using IntelliJava jar. However, if you imported this repo through Maven, it will handle the dependencies.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next