Showing 19 open source projects for "work"

View related business solutions
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Voicebox

    Voicebox

    The open-source voice synthesis studio powered by Qwen3-TTS

    ...A standout capability is its multi-track timeline editor and supporting audio tools (like trimming and conversation mixing), which let creators compose multi-voice scenes instead of generating single clips in isolation. It is API-first, meaning you can use it as an app for production work or integrate its speech generation into your own software via an API layer.
    Downloads: 71 This Week
    Last Update:
    See Project
  • 2
    EPUB to Audiobook Converter

    EPUB to Audiobook Converter

    EPUB to audiobook converter, optimized for Audiobookshelf

    ...The project supports multiple TTS providers, including Microsoft Azure TTS, EdgeTTS, OpenAI TTS, local Piper, and Kokoro via an OpenAI-compatible endpoint, allowing users to choose between cloud and self-hosted voices. A recent addition is a Gradio-based WebUI, which wraps all configuration options in a graphical interface for users who prefer not to work with the command line. The tool offers advanced options such as controlling chapter ranges, handling paragraph detection via newline modes, removing endnote markers, and using regex-based search-and-replace files to tweak pronunciations. It can be run directly with Python or via Docker.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 3
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    pyttsx3 is an offline text-to-speech library for Python that wraps native speech engines instead of calling cloud APIs. It is designed to work entirely without an internet connection, making it suitable for local automation, kiosks, accessibility tools, and embedded applications. On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility. The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 4
    AI Runner

    AI Runner

    Offline inference engine for art, real-time voice conversations

    AI Runner is an offline inference engine designed to run a collection of AI workloads on your own machine, including image generation for art, real-time voice conversations, LLM-powered chatbots and automated workflows. It is implemented as a desktop-oriented Python application and emphasizes privacy and self-hosting, allowing users to work with text-to-speech, speech-to-text, text-to-image and multimodal models without sending data to external services. At the core of its LLM stack is a mode-based architecture with specialized “modes” such as Author, Code, Research, QA and General, and a workflow manager that automatically routes user requests to the right agent based on the task. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    ChatTTS webUI & API

    ChatTTS webUI & API

    A simple native web interface that uses ChatTTS to synthesize text

    ...The project supports Chinese, English, and mixed text with digits and control symbols, making it suitable for bilingual content and numerically heavy text like announcements or prompts. From version 0.96 onward, ffmpeg installation is required for deployment, and previous CSV/PT voice tables are no longer valid, so users instead work with updated “voice value” parameters. For convenience, there is a prepackaged Windows build: you download a release archive, extract it, and double-click app.exe to start the web UI, which opens on localhost:9966.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    Style-Bert-VITS2

    Style-Bert-VITS2

    Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles

    ...The project targets both power users and beginners: Windows users without Git or Python can install and run it using bundled .bat scripts, while advanced users can work with virtual environments, uv, and Python tooling. It includes a full GUI editor to script dialogue, set different styles per line, edit dictionaries, and save/load projects, plus a separate web UI and Colab notebooks for training and experimentation. For those who only need synthesis, the project is published as a Python library (pip install style-bert-vits2) and can run on CPU without an NVIDIA GPU, though training still requires GPU hardware.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    ...It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. For best quality, the model is designed to work with a reference speaker clip and will inherit emotion, style, and accent from that reference.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Sopro TTS

    Sopro TTS

    A lightweight text-to-speech model with zero-shot voice cloning

    ...Built with a 169 million-parameter architecture that uses dilated convolutions and cross-attention layers instead of large Transformer stacks, it achieves relatively fast real-time performance even on CPUs (about a 0.25 real-time factor measured on an M3 base). The model is designed to work with a small set of dependencies and to be accessible for developers who want offline TTS with customizable voice style, including options for streaming or non-streaming generation modes. Users can install it with standard Python tools, run a demo server locally, and experiment with CLI or Python API usage for producing synthetic speech.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    ...The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. Developers work with an agent abstraction that connects video edge providers, LLMs, and processors into pipelines, making it easier to orchestrate tasks like object detection, pose estimation, and conversational guidance. The project includes SDKs for React, Android, iOS, Flutter, React Native, and Unity, enabling integration into a wide variety of client environments such as mobile apps, web apps, and games.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    Amphion

    Amphion

    Toolkit for audio, music, and speech generation

    ...A distinctive feature of Amphion is its emphasis on visualization: it offers interactive visualizations of model architectures and generation processes, making it easier to understand how complex generative audio models work. The toolkit is organized with example experiments (“egs”) and visualization demos that guide users through training, evaluation, and inspection of models. Built on the broader OpenMMLab ecosystem, Amphion follows modular design patterns and configuration systems similar to other OpenMMLab projects, easing adoption for users who are already familiar with that stack.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    A series of open source files and programs available to use for developing programs to work with the WowWee Robotics RSMedia Robot. These include a USB serial console, a cross-compiler, a firmware dump program, text-to-speech and source code.
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12

    Russian Text-to-speech programs

    читание, чтение, говорение

    For Windows (on Linux trought Wine can work) 3 russian text-to-speech programs (Chitanie, Chtenie and Govorenie). If you want donate. paypal.me/alkbab Читание, Чтение, Говорение есть программы пробующие преобразовать русский текст в русскую речь . Для Windows. На Linux через Wine... Кто хочет может пожертвовать paypal.me/alkbab
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Parallel WaveGAN

    Parallel WaveGAN

    Unofficial Parallel WaveGAN

    Parallel WaveGAN is an unofficial PyTorch implementation of several state-of-the-art non-autoregressive neural vocoders, centered on Parallel WaveGAN but also including MelGAN, Multiband-MelGAN, HiFi-GAN, and StyleMelGAN. Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing voice synthesis pipelines. It includes a large collection of “Kaldi-style” recipes for many datasets such as LJSpeech, LibriTTS, VCTK, JSUT, CMU Arctic, and multiple singing voice corpora in Japanese, Mandarin, Korean, and more. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Read Aloud

    Read Aloud

    An awesome browser extension that reads aloud webpage content

    Read Aloud is a browser extension for Chrome, Firefox, and other Chromium-based browsers that converts webpage text to audio using text-to-speech technology. It is designed to work on a wide variety of sites, including news, blogs, online textbooks, course materials, fanfiction, and more. The extension targets users who prefer listening over reading, as well as people with dyslexia, other learning disabilities, or eye strain, and children learning to read. Read Aloud lets users choose from multiple voices: built-in browser voices, plus premium cloud voices from providers such as Google Wavenet, Amazon Polly, IBM Watson, and Microsoft. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Speechalyzer

    Speechalyzer

    Process large speech data wrt transcription, labeling and annotation

    Speechalyzer: a tool for the daily work of a 'speech worker' It is optimized to process large speech data sets with respect to transcription, labeling and annotation. It is implemented as a client server based framework in Java and interfaces software for speech recognition, synthesis, speech classification and quality evaluation. The application is mainly the processing of training data for speech recognition and classification models and performing benchmarking tests on speech-to-text, text-to-speech and speech classification software systems.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Cotovía

    Cotovía

    Text-to-Speech System for Galician and Spanish

    ...Cotovia has been developed by the University de Vigo and the center 'Ramón Piñeiro' for Research in Humanities, both in Galicia, Spain. Its development has involved a research group of linguists and engineers. Cotovía has been developed as a research project, therefore most of the work has been focused on the most interesting aspects from a scientific point of view. Although the performance of the whole TTS system is quite good, there are some parts that could be clearly improved. Cotovia files and installing instructions are available at the Files and Git sections.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 17
    Transcription Aid

    Transcription Aid

    Transcription Aid helps you type text from recordings.

    This software is to help type in text from speech recordings. It has several functions proven to help this type of work. However it is fully manual (aside from auto-completion), so no speech recognition if you are looking for that, but it is a great tool to do the job.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    eSpeakIt is a Firefox extension that converts text to speech (using the espeak command), and plays the audio or saves it for use in portable media players. eSpeak must be installed for this to work. (see http://espeak.sourceforge.net/)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Kacst Arabic Text-to-speech System (KATTSS) is the first open source system that speaks Arabic. Since TTS is an area which needs years of work and development, KATTSS is open to researchers and programmers to improve its speech quality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB