Search Results for "ai voice generator" - Page 2

Showing 103 open source projects for "ai voice generator"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    pyVideoTrans

    pyVideoTrans

    Translate the video from one language to another and embed dubbing

    pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 2
    MCP Server Home Assistant

    MCP Server Home Assistant

    A Model Context Protocol Server for Home Assistant

    The Home Assistant MCP Server is an MCP server that integrates with Home Assistant, enabling AI assistants to interact with smart home devices and systems. It exposes Home Assistant voice intents through the Model Context Protocol for enhanced home control. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    ...PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Ultravox

    Ultravox

    Fast multimodal LLM for real-time voice interaction and AI apps

    ...Internally, it leverages pretrained language models and speech encoders, with a multimodal adapter that integrates both modalities for inference and training. Ultravox is optimized for low latency, achieving fast response times suitable for interactive voice agents and real-time applications. It supports use cases such as conversational AI agents, speech-to-speech translation, and analysis of spoken audio content. Ultravox also includes tooling and configuration systems for training, evaluation, and dataset integration.
    Downloads: 1 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    LiveKit Agents

    LiveKit Agents

    Framework for building realtime multimodal voice AI agents apps

    LiveKit Agents is an open source framework designed for building realtime AI agents that can participate as programmable entities within communication sessions. It enables developers to create conversational and multimodal agents capable of processing voice, audio, and other inputs in realtime environments. These agents can join LiveKit rooms as participants and interact with users or systems through speech, text, and other modalities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DeerFlow

    DeerFlow

    Deep Research framework, combining language models with tools

    ...It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g. “planner,” “searcher,” “coder,” “report generator”) that collaborate in a structured workflow, allowing tasks like literature reviews, data gathering, data analysis, code execution, and final report generation to be largely automated. It supports asynchronous task coordination, modular tool integration, and orchestrates the data flow between agents — making it suitable for large-scale or multi-stage research pipelines. ...
    Downloads: 139 This Week
    Last Update:
    See Project
  • 7
    FAY

    FAY

    Framework for building AI-powered interactive digital humans and agent

    Fay is an open source framework designed to build and deploy interactive digital humans powered by large language models. It acts as a middleware layer that connects digital character technologies with conversational AI systems and business applications. Fay supports various types of digital humans, including 2.5D and 3D avatars, and can be integrated with applications running on mobile devices, PCs, web platforms, and embedded systems. Its architecture allows developers to combine different AI components such as speech recognition, text-to-speech, and large language models to create conversational digital agents. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10
    SEO Machine

    SEO Machine

    A specialized Claude Code workspace for creating long-form

    ...It incorporates real data sources like Google Analytics and Search Console to guide decision-making and improve content effectiveness. The architecture emphasizes context-awareness, using brand voice, style guides, and keyword strategies to maintain consistency across outputs. It also includes performance evaluation tools that score content and suggest improvements before publishing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    myGPTReader

    myGPTReader

    AI Slack bot for reading, summarizing, and chatting with content

    myGPTReader is an AI-powered Slack bot designed to help users read, summarize, and interact with various types of digital content through conversational interfaces. It enables users to quickly understand web pages, documents, and even video content by transforming them into interactive discussions rather than static reading experiences. myGPTReader supports a wide range of file formats, including eBooks, PDFs, and text-based documents, making it flexible for both casual and professional use cases. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Make-A-Video - Pytorch (wip)

    Make-A-Video - Pytorch (wip)

    Implementation of Make-A-Video, new SOTA text to video generator

    Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch. They combine pseudo-3d convolutions (axial convolutions) and temporal attention and show much better temporal fusion. The pseudo-3d convolutions isn't a new concept. It has been explored before in other contexts, say for protein contact prediction as "dimensional hybrid residual networks". The gist of the paper comes down to, take a SOTA text-to-image model (here they use DALL-E2, but the same learning points would easily apply to Imagen), make a few minor modifications for attention across time and other ways to skimp on the compute cost, do frame interpolation correctly, get a great video model out. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    OpenHome Abilities

    OpenHome Abilities

    Open-source abilities for OpenHome agents

    OpenHome Abilities is an open-source repository of modular voice AI plugins created for OpenHome agents, giving developers a lightweight way to extend what an agent can do through spoken triggers. Each ability is intentionally simple in structure, centering on a single main.py file that contains the core Python logic, which lowers the barrier to building and sharing custom behaviors. The system is meant to support a wide range of voice-driven actions, from API calls and media playback to quiz flows, device control, and multi-turn conversations, so it functions as a practical extension framework rather than a narrow template library. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Aider

    Aider

    Aider is AI pair programming in your terminal

    Aider is an AI pair programming tool that runs directly in your terminal, helping developers build new projects or extend existing codebases faster and more confidently. It works alongside you like a coding partner, using powerful large language models to understand your code and implement precise changes. Aider creates a structured map of your entire repository, allowing it to handle large and complex projects effectively. It supports over 100 programming languages, making it flexible for...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 17
    AgenticSeek

    AgenticSeek

    Fully Local Manus AI. No APIs, No $200 monthly bills

    AgenticSeek is a fully local autonomous AI assistant designed as a privacy-focused alternative to cloud-based agent platforms. It runs entirely on the user’s hardware and can autonomously browse the web, write code, and plan multi-step tasks without sending data to external services. The system is optimized for local reasoning models and emphasizes zero cloud dependency to maintain full user control. AgenticSeek includes intelligent agent selection, allowing it to determine the best internal...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    PyGPT

    PyGPT

    Open source personal AI Assistant for Linux, Windows and Mac

    PyGPT is a desktop application that allows you to talk to OpenAI's LLM models such as GPT4 and GPT3 using your own computer and OpenAI API. It allows you to talk in chat mode and in completion mode, as well as generate images using DALL-E 2. PyGPT also adds access to the Internet for GPT via Google Custom Search API and Wikipedia API and includes voice synthesis using Microsoft Azure Text-to-Speech API. Moreover, the application has implemented context memory support, context storage,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    Reflex Dev

    Reflex Dev

    Web apps in pure Python

    Reflex is a Python framework for building full-stack web apps entirely in Python—without writing JavaScript for the frontend. It provides fast live reloads, built-in state management, deployment tooling, and optional AI-powered scaffolding to accelerate development.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    InvokeAI

    InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models

    InvokeAI is an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator. It provides a streamlined process with various new features and options to aid the image generation process. It runs on Windows, Mac and Linux machines, and runs on GPU cards with as little as 4 GB or RAM. InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 21
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    TADA

    TADA

    Open Source Speech Language Model

    TADA is an open-source speech-language modeling framework designed to unify spoken audio and text representations within a single generative architecture. The system focuses on aligning speech and text streams using a dual-alignment mechanism that synchronizes the acoustic signal with its textual representation. By modeling both modalities together, the framework allows developers to build systems capable of generating, understanding, and transforming speech and language simultaneously. This...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    WanGP

    WanGP

    AI video generator optimized for low VRAM and older GPUs use

    Wan2GP is an open source AI video generation toolkit designed to make modern generative models accessible on consumer-grade hardware with limited GPU memory. It acts as a unified interface for running multiple video, image, and audio generation models, including Wan-based models as well as other systems like Hunyuan Video, Flux, and Qwen. A key focus of the project is reducing VRAM requirements, enabling some workflows to run on as little as 6 GB while still supporting older Nvidia and...
    Downloads: 53 This Week
    Last Update:
    See Project
  • 24
    DeepWiki Open

    DeepWiki Open

    AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories

    DeepWiki Open is an open-source, AI-powered wiki generator that automatically creates fully navigable, richly structured wiki documentation for GitHub, GitLab, or Bitbucket repositories by combining code analysis, vector embeddings, retrieval-augmented generation (RAG), and visualization tools. Users can enter a repository URL and the system will clone the project, build semantic embeddings of its codebase, extract architecture and relationships, generate human-readable documentation, and produce visual diagrams to help explain complex code structure. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    VCClient is a real-time voice conversion system that uses machine learning models to transform a speaker’s voice into another voice with minimal latency. It is designed for live applications such as streaming, gaming, and virtual communication, where immediate feedback is essential. The system supports multiple voice conversion models, including RVC and other neural network-based approaches, allowing users to switch between different voices or customize their output. It provides both a...
    Downloads: 28 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB