Open Speech Corpora is a curated catalog of speech datasets intended to support research and development in automatic speech recognition, text-to-speech, and other speech technologies. The repository is organized as a set of tables that list corpora along with their languages, total hours, number of speakers, download links, and licenses, giving practitioners a quick way to find data that matches their needs. It emphasizes free and truly “open” datasets, favoring those released under Creative Commons or community-friendly data licenses, though it also lists corpora that are accessible for research and many commercial uses. The catalog covers well-known resources such as Mozilla Common Voice, Yesno, LJ Speech and numerous Nordic and parliamentary speech corpora, along with their license variants like CC-0 and CC-BY. It is actively maintained as a community resource: users are encouraged to propose new corpora via issues, and there is a backlog of datasets waiting to be integrated.

Features

  • Centralized catalog of speech corpora for ASR, TTS and related tasks
  • Detailed metadata including language, duration, speakers, download links and licenses
  • Emphasis on free and open datasets suitable for research and many commercial uses
  • Coverage of popular corpora like Common Voice, LJ Speech and multiple Nordic resources
  • Community-driven updates via issues and pull requests to keep the list evolving
  • License-based grouping (CC-0, CC-BY and more) to simplify compliance and dataset selection

Project Samples

Project Activity

See All Activity >

Categories

Text to Speech

License

MIT License

Follow Open Speech Corpora

Open Speech Corpora Web Site

Other Useful Business Software
Earn up to 16% annual interest with Nexo. Icon
Earn up to 16% annual interest with Nexo.

Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
Get started with Nexo.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Open Speech Corpora!

Additional Project Details

Registered

2025-11-28