Kaldi / Discussion / Help: Training using example scripts with own data

Training using example scripts with own data

Forum: Help

Creator: Orest

Created: 2015-07-15

Updated: 2015-07-15

Orest - 2015-07-15

Hi, I am having a look at Kaldi, I'd like to train large-vocabulary speaker-independent models that transcribe 8Khz telephonic Audio (1 speaker per audio, conversational speech) recorded with strong British English accents, I am able to use 2000+ hours of audio (and transcriptions), I used CMU Sphinx and I want to also try Kaldi and check what's the difference in terms of Accuracy, my priority is Accuracy rather than speed.

I followed the tutorial for the data preparation (http://kaldi.sourceforge.net/data_prep.html), and It would be great to use an example-recipe and modify it where appropriate

my question is: is "fisher_english" the most appropriate example recipe to start from for my task?

Last edit: Orest 2015-07-15

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Angel Castro - 2015-07-15

I would start with wsj since it is a large vocabulary task. It is very well commented and divided in very comprehensible stages. That is how I started

If you would like to refer to this comment somewhere else in this project, copy and paste the following link: