Pocketsphinx with TEDLIUM database

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Pocketsphinx with TEDLIUM database

Forum: Help

Created: 2017-09-14

Updated: 2017-09-14

Vamsi - 2017-09-14

Hi, I am new to Pocketsphinx and i want to build a model using TEDLIUM. I have downloaded the TEDLIUM data source and found out that the transcripts are present in individual folders and in stm format. However, in the tutorials, I see that there should be a single long transcript file, which is the combination of transcripts of all the audio files present in my project. Can someone please guide me on how to proceed? Am I supposed to read all the stm files in TEDLIUM database and manually create a single transcript file or is there some way to read these stm files directly? Is there any similar project available in git which i can look out for reference?

Thanks,
Vamsi

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-09-14
  
  Write the code to perform format conversion in Python or any other scripting language you know (Javascript, Perl, etc).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.