Showing 61 open source projects for "ebook-speaker"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 1
    MARS5

    MARS5

    MARS5 speech model (TTS) from CAMB.AI

    ...The model is built to handle prosodically challenging content such as sports commentary, anime dialogue, and other high-energy or highly varied speech patterns with realistic rhythm and intonation. To control speaker identity, MARS5 uses a short reference audio clip, typically between 2 and 12 seconds, from which it learns the voice characteristics. It supports two main inference modes: shallow clone, which is faster and only needs the reference audio, and deep clone, which additionally uses the transcript of the reference audio to increase similarity and naturalness at the cost of more computation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    StyleTTS 2

    StyleTTS 2

    Towards Human-Level Text-to-Speech through Style Diffusion

    ...The architecture uses a two-stage training process and leverages an auxiliary speech language model to guide generation toward more natural and coherent utterances. StyleTTS2 supports both single-speaker and multi-speaker configurations, with the ability to sample or transfer styles from reference audio, making it powerful for expressive TTS and character voices. The repository includes training scripts, configuration files, and pre-trained auxiliary modules such as a text aligner, pitch extractor, and PL-BERT-based linguistic encoder.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    SpeeD ReaD ("Speedy Read-y")

    SpeeD ReaD ("Speedy Read-y")

    SpeeD ReaD is a little program to help you read faster.

    SpeeD ReaD helps you to read faster and more efficiently. By minimizing subvocalization and saccades, you can process and comprehend the text you read much faster than with normal reading. First, subvocalization is the natural tendency for all of us to "hear" the words in our brains as we read. Think of it as reading out loud inside your head. But our minds do not need us to sound out the words we read - even inside our heads - in order to understand them. The words can be processed in a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Coqui TTS

    Coqui TTS

    A deep learning toolkit for Text-to-Speech, battle-tested in research

    ...TTS comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects. High-performance Deep Learning models for Text2Speech tasks. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings efficiently. Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN) Fast and efficient model training. Detailed training logs on the terminal and Tensorboard. Support for Multi-speaker TTS. Efficient, flexible, and lightweight but feature complete Trainer API. ...
    Downloads: 55 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    footswitch2

    footswitch2

    Audio Transcription software for Linux (Vlc) with a foot pedal

    Footswitch 2 is a media player for transcribers on Linux. Written in python and using the python bindings for VLC it allows a transcriber to control the audio or video with a USB footpedal, and includes a set of macros that integrate into LibreOffice. This allows the transcriber to control the media player from within Libreoffice as well, making it useful for those who do not yet own a footpedal/footswitch. Control of the media player from LibreOffice can be via Hotkeys or an integrated...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    vits_chinese

    vits_chinese

    Best practice TTS based on BERT and VITS

    vits_chinese is an implementation of the VITS end-to-end text-to-speech (TTS) architecture tailored for Chinese (and possibly multilingual) speech synthesis. VITS is a model combining variational autoencoders (VAEs), normalizing flows, adversarial learning, and a stochastic duration predictor — a design that enables generation of natural, expressive speech, capturing variations in rhythm and prosody. By customizing or porting VITS for Chinese, this project aims to produce high-quality TTS...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    VALL-E X

    VALL-E X

    Open source implementation of Microsoft's VALL-E X zero-shot TTS model

    VALL-E-X is an open-source implementation of Microsoft’s VALL-E X zero-shot text-to-speech model, focused on multilingual, cross-lingual voice cloning. It is capable of synthesizing speech in English, Chinese, and Japanese from text while mimicking the voice characteristics of a speaker given only a short 3–10 second prompt. The model attempts to match not just timbre, but also tone, pitch, emotion, and prosody of the reference audio, resulting in highly personalized output. VALL-E-X supports zero-shot cross-lingual synthesis, meaning a monolingual speaker’s voice can be used to speak other languages without additional training. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    wukong-robot

    wukong-robot

    Chinese voice dialogue robot/smart speaker project

    wukong-robot is a Chinese voice assistant / smart speaker project built to let makers and hackers design highly customizable voice-controlled devices. It combines wake-word detection, automatic speech recognition, natural language understanding, and text-to-speech into a single framework aimed at the Chinese-speaking ecosystem. The project is positioned as a simple, flexible, and elegant platform that can run on devices like Raspberry Pi and other Linux-based boards, making it suitable for DIY smart speakers and home-automation hubs. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    lora-svc

    lora-svc

    Singing voice change based on whisper, lora for singing voice clone

    singing voice change based on whisper, and lora for singing voice clone. You will feel the beauty of the code from this project. Uni-SVC main branch is for singing voice clone based on whisper with speaker encoder and speaker adapter. Uni-SVC main target is to develop lora for SVC. With lora, maybe clone a singer just need 10 stence after 10 minutes train. Each singer is a plug-in of the base model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    ...VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt. Experiment results show that VALL-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity. In addition, we find VALL-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    footswitch3

    footswitch3

    Audio Transcription software for Linux (Gstreamer) with a foot pedal

    Footswitch 3 is a media player for transcribers on Linux. Written in python using the python bindings for Gstreamer it allows a transcriber to control the audio or video with a foot pedal, and includes a set of macros that integrate into LibreOffice. This allows the transcriber to control the media player from within Libreoffice as well, making it useful for those who do not yet own a foot pedal/foot switch. Control of the media player from LibreOffice can be via Hotkeys or an integrated...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    VoiceSmith

    VoiceSmith

    [WIP] VoiceSmith makes training text to speech models easy

    ...It fine-tunes a pretty solid text to speech pipeline based on a modified version of DelightfulTTS and UnivNet on your dataset. Both models were pretrained on a proprietary 5000 speaker dataset. It also provides some tools for dataset preprocessing like automatic text normalization. Windows (only CPU supported currently) or any Linux based operating system. If you want to run this on macOS you have to follow the steps in build from source in order to create the installer. This is untested since I don't currently own a Mac. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SVoice (Speech Voice Separation)

    SVoice (Speech Voice Separation)

    We provide a PyTorch implementation of the paper Voice Separation

    SVoice is a PyTorch-based implementation of Facebook Research’s study on speaker voice separation as described in the paper “Voice Separation with an Unknown Number of Multiple Speakers.” This project presents a deep learning framework capable of separating mixed audio sequences where several people speak simultaneously, without prior knowledge of how many speakers are present. The model employs gated neural networks with recurrent processing blocks that disentangle voices over multiple computational steps, while maintaining speaker consistency across output channels. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    VITS

    VITS

    Conditional Variational Autoencoder with Adversarial Learning

    ...This architecture enables parallel generation (fast inference) while achieving speech quality that rivals or surpasses many two-stage systems. The repository provides training and inference pipelines for common datasets such as LJ Speech (single-speaker) and VCTK (multi-speaker), including filelists, configs, and preprocessing scripts. It also includes monotonic alignment search code and g2p preprocessing, which are crucial components for aligning text and speech in an end-to-end setup.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Multilingual Speech Synthesis

    Multilingual Speech Synthesis

    An implementation of Tacotron 2 that supports multilingual experiments

    ...It presents a model combining ideas from Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning, End-to-End Code-Switched TTS with Mix of Monolingual Recordings, and Contextual Parameter Generation for Universal Neural Machine Translation. We provide data for comparison of three multilingual text-to-speech models. The first shares the whole encoder and uses an adversarial classifier to remove speaker-dependent information from the encoder. The second has separate encoders for each language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    GROWbox Supervisor System (GROWSS)

    GROWbox Supervisor System (GROWSS)

    Automated Plant Environment Growing System using Raspberry Pi

    ...Hi & low values are also saved. The LEDs on the case & the mobile application indicate if there is a high/low temp alarm, hi/low humidity alarm, soil moisture alarm, or smoke alarm. A speaker (buzzer) is activated on the case if there is a smoke alarm. 2 other LEDs indicate if either the exhaust fan is on or if the humidifier is on.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Deepvoice3_pytorch

    Deepvoice3_pytorch

    PyTorch implementation of convolutional neural networks

    An open source implementation of Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DC-TTS

    DC-TTS

    TensorFlow Implementation of DC-TTS: yet another text-to-speech model

    ...Training scripts, data loaders, and hyperparameter configurations are provided to reproduce results on several datasets, including LJ Speech for English, a Korean single-speaker dataset, and audiobook data from Nick Offerman and Kate Winslet.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Lip Reading

    Lip Reading

    Cross Audio-Visual Recognition using 3D Architectures

    ...Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information. The essential problem is to find the correspondence between the audio and visual streams, which is the goal of this work. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Distant Speech Recognition

    Beamforming and Speech Recognition Toolkit

    BTK contains C++ and Python libraries that implement speech processing and microphone array techniques such as speech feature extraction, speech enhancement, speaker tracking, beamforming, dereverberation and echo cancellation algorithms. The Millennium ASR provides C++ and python libraries for automatic speech recognition. The Millennium ASR implements a weighted finite state transducer (WFST) decoder, training and adaptation methods. These toolkits are meant for facilitating research and development of automatic distant speech recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    calibre
    calibre - Ebook management
    Downloads: 173 This Week
    Last Update:
    See Project
  • 22
    Tools for accessing and converting various ebook file formats
    Leader badge
    Downloads: 90 This Week
    Last Update:
    See Project
  • 23

    avimmir

    (audio, video, image) Multimedia Multimodal Information Retrieval

    audio classification; speaker segmentation; speaker clustering; speaker recognition; spoken document retrieval; image retrieval; video retrieval; etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Fidus Writer

    The all in one solution for collaborative academic writing.

    ...The editor focuses on the content rather than the layout, so that with the same text, you can later on publish it in multiple ways: On a website, as a printed book, or as an ebook. In each case, you can choose from a number of layouts that are adequate for the medium of choice.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Booktype

    Booktype

    Open source platform to write and publish print and digital books

    Booktype makes it easier for people and organisations to collate, organise, edit and publish books. Delivering frictionlessly to print, lulu.com, and almost any ereader, Booktype facilitates collaborative production processes. No more lost manuscripts, overwritten Word files, awkward wikis or cumbersome CMSes.
    Downloads: 4 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB