Hi, I am new to Pocketsphinx and i want to build a model using TEDLIUM. I have downloaded the TEDLIUM data source and found out that the transcripts are present in individual folders and in stm format. However, in the tutorials, I see that there should be a single long transcript file, which is the combination of transcripts of all the audio files present in my project. Can someone please guide me on how to proceed? Am I supposed to read all the stm files in TEDLIUM database and manually create a single transcript file or is there some way to read these stm files directly? Is there any similar project available in git which i can look out for reference?
Thanks,
Vamsi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I am new to Pocketsphinx and i want to build a model using TEDLIUM. I have downloaded the TEDLIUM data source and found out that the transcripts are present in individual folders and in stm format. However, in the tutorials, I see that there should be a single long transcript file, which is the combination of transcripts of all the audio files present in my project. Can someone please guide me on how to proceed? Am I supposed to read all the stm files in TEDLIUM database and manually create a single transcript file or is there some way to read these stm files directly? Is there any similar project available in git which i can look out for reference?
Thanks,
Vamsi
Write the code to perform format conversion in Python or any other scripting language you know (Javascript, Perl, etc).