Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.
Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
Start Free Trial
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.
Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
SPTK is a suite of speech signal processing tools for UNIX
environments, e.g., LPC analysis, PARCOR analysis, LSP analysis,
PARCOR synthesis filter, LSP synthesis filter, vector
quantization techniques, and other extended versions of them.
Text-to-Speech conversor for Basque and Spanish. It includes linguistic processing and built voices for the languages aforementioned. Its acoustic engine is based on hts_engine and it uses a high quality vocoder called AhoCoder.
Developed by Aholab Signal Processing Laboratory: https://aholab.ehu.es/aholab/
http://aholab.ehu.es/ahocoder/
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud
Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
Text-to-Speech TTS for Basque, Spanish, Catalan, Galician and English
Text-to-Speech conversor for Basque, Spanish, Catalan, Galician and English.
It includes linguistic processing and built voices for all the languages aforementioned. Its acoustic engine is based on hts_engine and it uses a high quality vocoder called AhoCoder.
Developed by Aholab Signal Processing Laboratory: https://aholab.ehu.es/aholab/
http://aholab.ehu.es/ahocoder/
hts_engine is software to synthesize speech waveform from HMMs trained by the HMM-based speech synthesis system (HTS). This software is released under the Modified BSD license.
AhoTTS Iparrahotsa is the TTS developed at the Aholab Signal Processing Laboratory of the University of the Basque Country (UPV/EHU) for the Lapurdian dialect of Basque. This dialect is spoken at the Northern area of the Basque speaking area (French region). This project was funded by the Euroregion Aquitaine-Euskadi under grant EUSKADI-2012-004.
The project provides a ready-to-use interface for the julius CSR engine for a handicapped child which is not able to use the keyboard well. It integrates into X11 and Windows.
Find out how you can help: http://simon-listens.org/index.php?support
A multimodal, multiliteracy aware programmed approach to creating virtual realities inspired by Plato's Symposium and all surrounding, mostly later, texts. Designed for the author's doctoral dissertation.
Speect is a multilingual TTS system. It offers a full text-to-speech system with various API's, as well as an environment for research and development of TTS systems and voices.
It is written in ANSI C and uses a plug-in mechanism for extensions. Speect also includes an extensive set of Python bindings for quick implementation of new ideas, these bindings are derived from SWIG interface files and can easily be extended for other languages supported by SWIG.
Speect is free and open...
Advanced Speech Signal Analysis library provides a structure to handle various file formats and a variety of analysis functions commonly used in speech processing.
1.) Investigation with cosine transform, and anti transform algorithm, with some voice recognition code. 2.) Translator: Croatian, English. 3.) 2D to 3D picture algorithm (principle) and new 2Dto3D video conversion code with AviSynth video scripting
Simple testing tool to generate RTP data packets and send it via netwok interface or save into pcap file. Primarily intended for use with SIPp application to test speech quality with different codecs.
This is a fast C implementation of Arturo Camacho's SWIPE' pitch extraction algorithm. See the project homepage for more about the advantages of the SWIPE' algorithm. swipe-1.0.tar.gz contains the current source, which should compile quite neatly.
Scalable Language API (SLAPI) The most comprehensive architecture for conversational natural-language applications including speech recognition/synthesis, semantics, & machine translation. Used on Android & other mobile app platforms.
Asterisk Dialplan application, which allows you to use Lia_Phon and Mbrola as a French speech synthesizer. Application du plan de numérotation d'Asterisk, qui permet d'utiliser Lia_Phon et Mbrola comme synthétiseur vocal français sous Asterisk.
MyDSReader is an homebrew for the Nintendo DS that helps visually impaired users:
1. Read documents in digital format (text, word, pdf, DAISY)
2. Take voice annotations
3. Read e-mails and reply/write using recorded voice clips
Linux on Sound is a project to create a transparent console interface over eSpeak (http://espeak.sourceforge.net/), to increase linux console's accessibility.
Webvoice is a text to speech cgi program. You can embed a link in a html page to send things you want to say, via sound. No software is required on the client side. Festival and sox are needed on the server. Webvoice has its own interface (if needed).
BladeWareVXML is a portable VoiceXML 2.1 interpreter that is an enhanced version (performance, usability and integration) of OpenVXI. A commercial version, with documentation, sample code, and support options, is available from the Commetrex Website.
eSpeak text-to-speech module for Asterisk. This provides the "espeak" dialplan application, which allows you to use the eSpeak TTS Engine as a speech synthesizer in Asterisk.
eXtace is a 3D audio visualization tool (or eye candy depending on how you look at it). eXtace requires ESD (Esound) for its sound input source. It performs a FFT (fast fourier transform) on audio and displays it via various graphical modes.