Open Speech Corpora is a curated catalog of speech datasets intended to support research and development in automatic speech recognition, text-to-speech, and other speech technologies. The repository is organized as a set of tables that list corpora along with their languages, total hours, number of speakers, download links, and licenses, giving practitioners a quick way to find data that matches their needs. It emphasizes free and truly “open” datasets, favoring those released under Creative Commons or community-friendly data licenses, though it also lists corpora that are accessible for research and many commercial uses. The catalog covers well-known resources such as Mozilla Common Voice, Yesno, LJ Speech and numerous Nordic and parliamentary speech corpora, along with their license variants like CC-0 and CC-BY. It is actively maintained as a community resource: users are encouraged to propose new corpora via issues, and there is a backlog of datasets waiting to be integrated.

Features

  • Centralized catalog of speech corpora for ASR, TTS and related tasks
  • Detailed metadata including language, duration, speakers, download links and licenses
  • Emphasis on free and open datasets suitable for research and many commercial uses
  • Coverage of popular corpora like Common Voice, LJ Speech and multiple Nordic resources
  • Community-driven updates via issues and pull requests to keep the list evolving
  • License-based grouping (CC-0, CC-BY and more) to simplify compliance and dataset selection

Project Samples

Project Activity

See All Activity >

Categories

Text to Speech

License

MIT License

Follow Open Speech Corpora

Open Speech Corpora Web Site

Other Useful Business Software
Gen AI apps are built with MongoDB Atlas Icon
Gen AI apps are built with MongoDB Atlas

The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Open Speech Corpora!

Additional Project Details

Registered

3 days ago