Menu

Training using example scripts with own data

Help
Orest
2015-07-15
2015-07-15
  • Orest

    Orest - 2015-07-15

    Hi, I am having a look at Kaldi, I'd like to train large-vocabulary speaker-independent models that transcribe 8Khz telephonic Audio (1 speaker per audio, conversational speech) recorded with strong British English accents, I am able to use 2000+ hours of audio (and transcriptions), I used CMU Sphinx and I want to also try Kaldi and check what's the difference in terms of Accuracy, my priority is Accuracy rather than speed.

    I followed the tutorial for the data preparation (http://kaldi.sourceforge.net/data_prep.html), and It would be great to use an example-recipe and modify it where appropriate

    my question is: is "fisher_english" the most appropriate example recipe to start from for my task?

     

    Last edit: Orest 2015-07-15
  • Angel Castro

    Angel Castro - 2015-07-15

    I would start with wsj since it is a large vocabulary task. It is very well commented and divided in very comprehensible stages. That is how I started