Showing 27 open source projects for "linux ai"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    AI Runner

    AI Runner

    Offline inference engine for art, real-time voice conversations

    AI Runner is an offline inference engine designed to run a collection of AI workloads on your own machine, including image generation for art, real-time voice conversations, LLM-powered chatbots and automated workflows. It is implemented as a desktop-oriented Python application and emphasizes privacy and self-hosting, allowing users to work with text-to-speech, speech-to-text, text-to-image and multimodal models without sending data to external services. At the core of its LLM stack is a...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    Luna AI

    Luna AI

    Virtual AI anchor that combines state-of-the-art technology

    Luna AI is a virtual AI streamer framework designed to power an interactive VTuber that can go live on major platforms and chat with viewers in real time. It is built around a core assistant persona called “Luna AI,” which can be driven by a wide range of large language models and platforms, including GPT-style APIs, Claude, LangChain-based backends, ChatGLM, Kimi, Ollama, and many others. The project supports multiple rendering backends for the avatar, such as Live2D, Unreal Engine (UE),...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Speech-AI-Forge

    Speech-AI-Forge

    Speech-AI-Forge is a project developed around TTS generation model

    Speech-AI-Forge is a full-stack project built around modern text-to-speech generation models, providing both an API server and a Gradio-based web UI for interactive use. At its core, it acts as a hub that wires together multiple speech-related capabilities, including TTS, speech-to-text and LLM-based control flows, behind a consistent interface. The system is designed to be deployed in several ways: you can try it online via hosted demos, spin it up in a one-click Colab environment, run it...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...
    Downloads: 2 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    Polyglot

    Polyglot

    Cross-platform AI language practice app

    Polyglot is a cross platform AI language practice application that runs as a desktop app and also offers a web version. It is built around conversational large language models and Azure based text to speech services, turning them into an interactive environment for speaking practice in multiple languages. Users can define custom AI personas, choose languages, and configure their own OpenAI and Azure keys so they retain control over which backends they use. The app supports speech recognition...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    Qwen3-TTS

    Qwen3-TTS

    Qwen3-TTS is an open-source series of TTS models

    Qwen3-TTS is an open-source text-to-speech (TTS) project built around the Qwen3 large language model family, focused on generating high-quality, natural-sounding speech from plain text input. It provides researchers and developers with tools to transform text into expressive, intelligible audio, supporting multiple languages and voice characteristics tuned for clarity and fluidity. The project includes pre-trained models and inference scripts that let users synthesize speech locally or...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 7
    NVIDIA NeMo Framework

    NVIDIA NeMo Framework

    Scalable generative AI framework built for researchers and developers

    NVIDIA NeMo is a scalable, cloud-native generative AI framework aimed at researchers and PyTorch developers working on large language models, multimodal models, and speech AI (ASR and TTS), with growing support for computer vision. It provides collections of domain-specific modules and reference implementations that make it easier to pre-train, fine-tune, and deploy very large models on multi-GPU and multi-node infrastructure. NeMo 2.0 introduces a Python-based configuration system,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 9
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    FireRedTTS2 is a next-generation open-source text-to-speech (TTS) system focused on long-form, streaming speech synthesis for multi-speaker dialogue, delivering stable natural speech with context-aware prosody and reliable speaker transitions that support real-time and conversational applications. It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    FastRTC

    FastRTC

    The python library for real-time communication

    FastRTC is a Python library designed to simplify real-time communication (RTC), especially for audio and video streaming applications. It abstracts away much of the complexity that typically comes with implementing WebRTC by providing a simple interface — e.g. a Stream class — that can be mounted within a web backend (for example a FastAPI application). This makes it particularly well suited for building real-time voice (or video) interfaces for applications such as AI assistants, live chat,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Readest

    Readest

    Readest is a modern, feature-rich ebook reader

    Readest is a project meant to facilitate reading, studying, or consuming content by integrating reading tools with AI-powered assistance. Although the repository is not as widely documented or popular as some, the idea is that Readest supports features to help with reading comprehension — likely combining OCR / text retrieval, translation, note-taking, or summarization for reading materials (eBooks, articles, PDFs). The goal appears to be to let users feed in arbitrary reading material and...
    Downloads: 40 This Week
    Last Update:
    See Project
  • 13
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    OpenAI.fm

    OpenAI.fm

    Code for openai.fm, a demo for the OpenAI Speech API

    OpenAI.fm is an official interactive demo application built to showcase the OpenAI Speech API and its advanced text-to-speech capabilities, providing developers and creators with a hands-on web interface to convert text into high-quality, customizable audio using state-of-the-art TTS models. Developed using Next.js and the OpenAI Speech API, this demo illustrates how the latest neural voice models can produce natural, expressive speech with adjustable styles and voices, highlighting features...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 15
    Auto Synced & Translated Dubs

    Auto Synced & Translated Dubs

    Automatically translates the text of a video based on a subtitle file

    Auto-Synced-Translated-Dubs is a toolchain that automatically translates and re-dubs videos using AI voices while keeping the new speech aligned to the original timing via subtitle files. It assumes you have a human-made SRT (or similar) subtitle file; the script then uses translation services such as Google Cloud or DeepL to generate translated subtitle tracks in one or more target languages. Using the timestamps of each subtitle line, it computes the required duration of each spoken...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    EasyVoice

    EasyVoice

    Open source text-to-speech tool, supports extra-long text

    easyVoice is an open-source text-to-speech platform aimed at turning long-form text and novels into high-quality audio, with a strong focus on usability and scalability. It provides a web interface where users can paste or upload large texts and generate speech and subtitles in a single workflow, even for works exceeding 100,000 characters. The system supports multi-role voice acting, letting users assign different neural voices to different characters or narrative roles and configure...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    comfyui-mixlab-nodes

    comfyui-mixlab-nodes

    Workflow and speech recognition app

    comfyui-mixlab-nodes is a large collection of custom nodes for ComfyUI that turns workflows into interactive apps and adds real-time multimedia, LLM, and TTS capabilities. It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Conversations

    Conversations

    App in java for chatting to a generative A.I. (involving tts and stt)

    Java application for chatting to generative AI Llama3. * The user can speak into the microphone (speechToText), edit the recognized text and send it to the AI. * The AI ​​responds and the server returns that response in real time, and the sentences converted to audio (textToSpeech), and the application broadcasts them through the speaker. The application is prepared so that only one user occupies the server's resources, so if the server is busy, in theory it will not let you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    AudioBC

    AudioBC

    Offline desktop app to convert EPUB to MP3 using Kokoro-82M neural TTS

    AudioBC is a powerful desktop application designed to turn your digital library into a personal audiobook collection. Unlike most Text-to-Speech (TTS) tools that require expensive cloud API subscriptions or an active internet connection, AudioBC runs entirely on your local machine. Powered by the state-of-the-art Kokoro-82M neural engine, AudioBC produces natural, human-like speech that rivals premium cloud services. It is built with a focus on privacy and simplicity, offering a...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    Voice Accounting For Blind & Mute People

    Voice Accounting For Blind & Mute People

    Free & Easy AI Voice Accounting Software For Blind & Speechless People

    Just download the above zip file, extract it and then open the index.html file on internet browsers like Firefox ( preferable ) or Google Chrome. Also, please view and download my full collection of softwares for people with disabilities, here : https://sourceforge.net/projects/softwares-for-disabled-people/ This full collection also includes the Voice Accounting Software as well.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Softwares For Blind, Deaf, Handicap

    Softwares For Blind, Deaf, Handicap

    Easy AI Softwares for Blind, Deaf, Handicapped, Disabled People

    Just download the above zip file, extract it first and then open the index.html file on internet browsers like Firefox ( preferable ) or Google Chrome. Also, keep NumLock ON while using the Numeric Keypad of any Keyboard. Can also attach an external USB keyboard, with seperate Numeric Keypad, if required. I have added some general guidelines for students, using these softwares, on the Wiki Page of this website. Please refer them for more instructions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Amica

    Amica

    Amica is an open source interface for interactive communication

    Amica is an open source interface for interacting with fully animated 3D characters that combine voice chat, vision, and an emotion engine into a single experience. It lets you hold natural conversations with AI characters that can see, listen, and speak, while expressing emotional states through facial expressions and body language. Users can import VRM character models, adjust their appearance, tune the voice to match the character, and define behavior using different large language models...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 25
    Piper TTS

    Piper TTS

    A fast, local neural text to speech system

    Piper is a fast, local neural text-to-speech (TTS) system developed by the Rhasspy team. Optimized for devices like the Raspberry Pi 4, Piper enables high-quality speech synthesis without relying on cloud services, making it ideal for privacy-conscious applications. It utilizes ONNX models trained with VITS to deliver natural-sounding voices across various languages and accents. Piper is particularly suited for offline voice assistants and embedded systems.
    Downloads: 476 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB