Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "python voice synthesis" - Page 6

x

Sort By:

Relevance

Clear All Filters

OS

Windows 222
Linux 220
Mac 201
More...
ChromeOS 101
BSD 100
Mobile Operating Systems 9
Desktop Operating Systems 1

Category

Artificial Intelligence 243
Multimedia 7
Software Development 6
Communications 3
Business 2
Scientific/Engineering 2
System 2
Education 1
Games 1
Internet 1
Productivity 1

License

OSI-Approved Open Source 222
Public Domain 2
Creative Commons Attribution License 1
GNU Free Documentation License 1

Translations

English 6
Bengali 1
German 1

Programming Language

Python 243
C++ 10
JavaScript 8
Unix Shell 5
Java 4
More...
TypeScript 4
C 2
C# 2
BASIC 1
Dart 1
Julia 1
Kotlin 1
Perl 1
PHP 1
PowerShell 1
Prolog 1
R 1
Rust 1
Swift 1

Status

Beta 8
Production/Stable 5
Alpha 4
Pre-Alpha 2
More...
Planning 1

Showing 243 open source projects for "python voice synthesis"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

Stable Diffusion Version 2

High-Resolution Image Synthesis with Latent Diffusion Models

Stable Diffusion (the stablediffusion repo by Stability-AI) is an open-source implementation and reference codebase for high-resolution latent diffusion image models that power many text-to-image systems. The repository provides code for training and running Stable Diffusion-style models, instructions for installing dependencies (with notes about performance libraries like xformers), and guidance on hardware/driver requirements for efficient GPU inference and training. It’s organized as a...

Downloads: 9 This Week

Last Update: 2025-10-02
See Project
2

MiroThinker

MiroThinker is an open source deep research agent

MiroThinker is an open-source deep research AI agent designed to perform complex reasoning, information gathering, and predictive analysis tasks. The system focuses on enabling long-horizon research workflows by allowing the agent to interact repeatedly with external tools, search systems, and data sources while refining its reasoning through iterative steps. Rather than simply generating responses from a single prompt, the agent performs structured multi-step reasoning processes that...

Downloads: 2 This Week

Last Update: 6 days ago
See Project
3

MaxKB

Open-source platform for building enterprise-grade agents

MaxKB (Max Knowledge Brain) is an open-source platform for building enterprise-grade AI agents with strong knowledge retrieval, RAG pipelines, and workflow orchestration. It focuses on practical deployments such as customer support, internal knowledge bases, research assistants, and education, bundling tools for data ingestion, chunking, embedding, retrieval, and answer synthesis. The system exposes flexible tool-use (including MCP), supports multi-model backends, and provides dashboards for...

Downloads: 4 This Week

Last Update: 2 days ago
See Project
4

Operit AI

Powerful Android AI agent with tools, automation, and Linux shell

Operit is a full-featured AI assistant and agent platform designed specifically for Android devices, aiming to go far beyond traditional chat-based interfaces. It integrates deep system-level capabilities with a wide range of tools, allowing the AI to perform real tasks such as file management, automation, and system control directly on the device. A standout aspect of the project is its built-in Ubuntu 24 environment, which enables users to run Linux commands, scripts, and development tools...

Downloads: 8 This Week

Last Update: 1 day ago
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
5

Qwen3-ASR

Qwen3-ASR is an open-source series of ASR models

Qwen3-ASR is an automatic speech recognition system in the QwenLM family, developed to convert spoken language into text with strong accuracy and real-time performance. As a specialized ASR variant of the broader Qwen language model ecosystem, it focuses on capturing reliable transcriptions from audio sources such as recordings, live streams, or conversational inputs while supporting low latency use cases. The architecture combines advanced neural acoustic modeling with context-aware...

Downloads: 2 This Week

Last Update: 2026-02-09
See Project
6

RealtimeTTS

Converts text to speech in realtime

RealtimeTTS is a low-latency text-to-speech library built for real-time applications such as voice chat with LLMs, assistants, and interactive tools. It is designed around a streaming model: you can feed it text incrementally (for example, as an LLM responds) and get audio output almost immediately, which keeps end-to-end latency very low. The library is engine-agnostic and plugs into a wide range of cloud and local TTS systems, including OpenAI, ElevenLabs, Azure, Coqui, Piper, StyleTTS2,...

Downloads: 4 This Week

Last Update: 2026-03-28
See Project
7

Fish Speech

SOTA Open Source TTS

Fish Speech is a state-of-the-art open-source text-to-speech project that has evolved into the OpenAudio series of advanced TTS models. The repository hosts the code and tooling for training, fine-tuning, and serving high-quality TTS, while the current flagship models (OpenAudio-S1 and S1-mini) are distributed via Fish Audio’s playground and Hugging Face. The models are evaluated with Seed TTS metrics and achieve exceptionally low word and character error rates, indicating strong...

Downloads: 12 This Week

Last Update: 2025-11-28
See Project
8

CUDA Agent

Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDA Agent is a research-driven agentic reinforcement learning system designed to automatically generate and optimize high-performance CUDA kernels for GPU workloads. The project addresses the long-standing challenge that efficient CUDA programming typically requires deep hardware expertise by training an autonomous coding agent capable of iterative improvement through execution feedback. Its architecture combines large-scale data synthesis, a skill-augmented CUDA development environment,...

Downloads: 1 This Week

Last Update: 2026-03-03
See Project
9

EmotiVoice

Multi-Voice and Prompt-Controlled TTS Engine

...EmotiVoice provides multiple ways to interact with it, including a web interface, a Docker image, an HTTP API (including an OpenAI-compatible TTS API), and Python scripts for batch synthesis. It also supports voice cloning with your own data, backed by recipes for popular datasets like DataBaker and LJSpeech, so you can train or adapt voices to custom personas.

Downloads: 2 This Week

Last Update: 2025-11-30
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
10

NVIDIA NeMo

Toolkit for conversational AI

NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...

Downloads: 3 This Week

Last Update: 2026-03-23
See Project
11

TurboDiffusion

100–200× Acceleration for Video Diffusion Models

TurboDiffusion is an advanced open-source framework designed to dramatically accelerate video diffusion model generation, aiming for performance improvements on the order of 100–200× compared with traditional implementations while retaining high output quality. It achieves this by combining a suite of algorithmic and engineering optimizations, including attention acceleration techniques, efficient step distillation methods, and quantization strategies that reduce computational overhead. The...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
12

HunyuanWorld-Voyager

RGBD video generation model conditioned on camera input

HunyuanWorld-Voyager is a next-generation video diffusion framework developed by Tencent-Hunyuan for generating world-consistent 3D scene videos from a single input image. By leveraging user-defined camera paths, it enables immersive scene exploration and supports controllable video synthesis with high realism. The system jointly produces aligned RGB and depth video sequences, making it directly applicable to 3D reconstruction tasks. At its core, Voyager integrates a world-consistent video...

Downloads: 12 This Week

Last Update: 4 days ago
See Project
13

YuE

Open source AI model for generating full songs from lyrics prompts

YuE is an open source project that provides a foundation model designed for full-song music generation using artificial intelligence. It focuses on transforming text inputs such as lyrics and genre prompts into complete musical compositions that include both vocal and instrumental tracks. Unlike many shorter audio generators, the model is capable of producing songs that last several minutes while maintaining coherent musical structure and alignment with the provided lyrics. YuE introduces a...

Downloads: 5 This Week

Last Update: 4 days ago
See Project
14

HY-Motion 1.0

HY-Motion model for 3D character animation generation

HY-Motion 1.0 is an open-source, large-scale AI model suite developed by Tencent’s Hunyuan team that generates high-quality 3D human motion from simple text prompts, enabling the automatic production of fluid, diverse, and semantically accurate animations without manual keyframing or rigging. Built on advanced deep learning architectures that combine Diffusion Transformer (DiT) and flow matching techniques, HY-Motion scales these approaches to the billion-parameter level, resulting in strong...

Downloads: 2 This Week

Last Update: 2026-01-29
See Project
15

CogView4

CogView4, CogView3-Plus and CogView3(ECCV 2024)

CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets,...

Downloads: 2 This Week

Last Update: 6 days ago
See Project
16

Video Diffusion - Pytorch

Implementation of Video Diffusion Models

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch. Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch. It uses a special space-time factored U-net, extending generation from 2D images to 3D videos. 14k for difficult moving mnist (converging much faster and better than NUWA) - wip. Any new developments for text-to-video synthesis will be centralized at...

Downloads: 4 This Week

Last Update: 2024-05-03
See Project
17

Imagen - Pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network

Implementation of Imagen, Google's Text-to-Image Neural Network that beats DALL-E2, in Pytorch. It is the new SOTA for text-to-image synthesis. Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pre-trained T5 model (attention network). It also contains dynamic clipping for improved classifier-free guidance, noise level conditioning, and a memory-efficient unit design. It appears neither CLIP nor prior...

Downloads: 5 This Week

Last Update: 2024-10-07
See Project
18

SkillForge

Ultimate meta-skill for generating best-in-class Claude Code skills

SkillForge is a systematic methodology and tooling framework for creating high-quality AI “skills” specifically optimized for Claude Code integrations, treating skill creation as an engineering discipline rather than an ad-hoc art form. It introduces a multi-phase architecture where every input or request is triaged intelligently, analyzed deeply through structured lenses, specified formally, synthesized with automated generation, and finally subjected to multi-agent review before...

Downloads: 0 This Week

Last Update: 2026-02-25
See Project
19

Wan Move

Motion-controllable Video Generation via Latent Trajectory Guidance

Wan Move is an open-source research codebase for motion-controllable video generation that focuses on enabling fine-grained control of motion within generative video models. It is designed to guide the temporal evolution of visual content by leveraging latent trajectory guidance, allowing users to manipulate how objects move over time without modifying the underlying generative architecture. By representing motion information as dense point trajectories and integrating them into the latent...

Downloads: 0 This Week

Last Update: 2026-01-30
See Project
20

HumanEval

Code for the paper "Evaluating Large Language Models Trained on Code"

human-eval is a benchmark dataset and evaluation framework created by OpenAI for measuring the ability of language models to generate correct code. It consists of hand-written programming problems with unit tests, designed to assess functional correctness rather than superficial metrics like text similarity. Each task includes a natural language prompt and a function signature, requiring the model to generate an implementation that passes all provided tests. The benchmark has become a...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
21

Hunyuan3D-1

A Unified Framework for Text-to-3D and Image-to-3D Generation

Hunyuan3D-1 is an earlier version in the same 3D generation line (the unified framework for text-to-3D and image-to-3D tasks) by Tencent Hunyuan. It provides a framework combining shape generation and texture synthesis, enabling users to create 3D assets from images or text conditions. While less advanced than version 2.1, it laid the foundations for the later PBR, higher resolution, and open-source enhancements. (Note: less detailed public documentation was found for Hunyuan3D-1 compared to...

Downloads: 0 This Week

Last Update: 2025-11-19
See Project
22

Auto Synced & Translated Dubs

Automatically translates the text of a video based on a subtitle file

Auto-Synced-Translated-Dubs is a toolchain that automatically translates and re-dubs videos using AI voices while keeping the new speech aligned to the original timing via subtitle files. It assumes you have a human-made SRT (or similar) subtitle file; the script then uses translation services such as Google Cloud or DeepL to generate translated subtitle tracks in one or more target languages. Using the timestamps of each subtitle line, it computes the required duration of each spoken...

Downloads: 2 This Week

Last Update: 2025-11-28
See Project
23

LlamaGen

Autoregressive Model Beats Diffusion

LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
24

Magicoder

Empowering Code Generation with OSS-Instruct

Magicoder is an open-source family of large language models designed specifically for code generation and software development tasks. The project focuses on improving the quality and diversity of code generation by training models with a novel dataset construction approach known as OSS-Instruct. This technique uses open-source code repositories as a foundation for generating more realistic and diverse instruction datasets for training language models. By grounding training data in real...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
25

StreamSpeech

StreamSpeech is a seamless model for offline speech recognition

StreamSpeech is an “all-in-one” speech model designed to perform offline and simultaneous speech recognition, speech translation, and speech synthesis within a single unified architecture. Developed as part of an ACL 2024 paper, it targets streaming and low-latency scenarios where intermediate results and final translations or synthetic speech must be produced continuously as audio is being received. The model supports eight tasks: offline ASR, speech-to-text translation, speech-to-speech...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project

Previous
2
3
4
5
You're on page 6
7
8
9
10
Next

Related Searches

ai

text to speech

google tts nvda

arabic text to speech

nvidia

video ai

image to 3d

subtitle

srt file

ubuntu

Related Categories

Artificial Intelligence

Multimedia

Software Development

Communications

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise