DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. A pre-trained English model is available for use and can be downloaded following the...
Yet Another Audio Feature Extractor is a toolbox for audio analysis. Easy to use and efficient at extracting a large number of audio features simultaneously. WAV and MP3 files supported, or embedding in C++, Python or Matlab applications.
RNNLIB is a recurrent neural network library for sequence learning problems. Applicable to most types of spatiotemporal data, it has proven particularly effective for speech and handwriting recognition.
full installation and usage instructions given at
http://sourceforge.net/p/rnnl/wiki/Home/
VoiceCode is an Open Source initiative started by the National Research Council of Canada, to develop a programming by voice toolbox. The aim of the project is to make programming through voice input as easy and productive as with mouse and keyboard.
For install, Use subversion, as described in this page: http://sourceforge.net/apps/mediawiki/voicecode/index.php?title=VCode_1_Doc/InstallationManual.
...It offers a full text-to-speech system with various API's, as well as an environment for research and development of TTS systems and voices.
It is written in ANSI C and uses a plug-in mechanism for extensions. Speect also includes an extensive set of Python bindings for quick implementation of new ideas, these bindings are derived from SWIG interface files and can easily be extended for other languages supported by SWIG.
Speect is free and open source software. As a collection it is distributed under a MIT license.
This is a fast C implementation of Arturo Camacho's SWIPE' pitch extraction algorithm. See the project homepage for more about the advantages of the SWIPE' algorithm. swipe-1.0.tar.gz contains the current source, which should compile quite neatly.
A collection of scripts and programs to automatically annotate video/audio for subtitles. Basically relies on a MARSYAS (Music Analysis, Retrieval and Synthesis for Audio Signals) plug-in for detecting human voice in polyphonic recordings.
AGTK is a suite of software components for building tools for annotating linguistic signals,
time-series data which documents any kind of linguistic behavior (e.g. audio, video).
The internal data structures are based on annotation graphs.
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.
Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
This project is intended for users who want to get more out of the voice modem they may have. Why another project for modem? Looking for the good quality software for the voice communication trough the modem, I could find only Win32 based. Linux now :)