StreamSpeech

StreamSpeech is an “all-in-one” speech model designed to perform offline and simultaneous speech recognition, speech translation, and speech synthesis within a single unified architecture. Developed as part of an ACL 2024 paper, it targets streaming and low-latency scenarios where intermediate results and final translations or synthetic speech must be produced continuously as audio is being received. The model supports eight tasks: offline ASR, speech-to-text translation, speech-to-speech translation, and TTS, as well as their streaming or simultaneous counterparts, all handled by the same underlying system. During simultaneous translation, StreamSpeech can optionally output intermediate ASR transcripts and text translations, giving users or downstream applications real-time visibility into what the system is hearing and how it is translating.

Features

Unified model for ASR, speech translation, and TTS in both offline and streaming modes
Supports eight distinct tasks including simultaneous S2ST, S2TT, and real-time TTS
Outputs intermediate transcripts and translations for richer low-latency interaction
SimulEval integration and agent scripts for systematic streaming evaluation
Web GUI demo and project page with audio samples and visualizations
Achieves state-of-the-art performance on offline and simultaneous speech-to-speech translation

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow StreamSpeech

StreamSpeech Web Site

Other Useful Business Software

Gen AI apps are built with MongoDB Atlas

The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free

Rate This Project

User Reviews

Be the first to post a review of StreamSpeech!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software

Registered

3 hours ago

Report inappropriate content

StreamSpeech

StreamSpeech is a seamless model for offline speech recognition

Get an email when there's a new version of StreamSpeech

Features

Project Samples

Project Activity

Categories

License

Follow StreamSpeech

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered