Menu

Librispeech sample scripts

Help
Hitesh
2016-11-13
2016-11-15
  • Hitesh

    Hitesh - 2016-11-13

    Hi everyone,

    I was going to train an acoustic model using the Librispeech data and was wondering if there are any sample scripts that anyone already has anywhere.
    I also wanted to know if its possible to use Kaldi generated models in Sphinx and if so how? Can they be used even on a mobile device using Pocketsphinx or are they too large for that purpose?

    Thanks,
    Hitesh

     
    • Nickolay V. Shmyrev

      I was going to train an acoustic model using the Librispeech data and was wondering if there are any sample scripts that anyone already has anywhere.

      sphinxtrain has same scripts for everything, you just need to prepare the data in a proper format.

      I also wanted to know if its possible to use Kaldi generated models in Sphinx and if so how?

      No.

      Can they be used even on a mobile device using Pocketsphinx or are they too large for that purpose?

      Not now.

       
  • Hitesh

    Hitesh - 2016-11-14

    Hi Nickolay,

    Thanks for the info.
    I was able to setup most of the files required to run training. I encountered problems with the dictionary though. During the verification after feature extraction, I get warnings such as the following:

    WARNING: This word: BOZZLE was in the transcript file, but is not in the dictionary ( MISSUS BOZZLE WHO WELL UNDERSTOOD THAT BUSINESS WAS BUSINESS AND THAT WIVES WERE NOT BUSINESS FELT NO ANGER AT THIS AND HANDED HER HUSBAND HIS BEST COAT ). Do cases match?

    I tried using the librispeech-lexicon from http://www.openslr.org/11/, but noticed that it doesn't contain a lot of the words that are actually there in the transcriptions. This seems to not allow the training to proceed.
    Also, this contains phones such as AH0, AH1, etc. which are not there in the default phone set. Would it be a better idea to add these to the phoneset or could I convert all of them to their root phones, such as AH here, in the dictionary itself?
    I also tried using the dictionary and phoneset from http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/sphinxdict/ but that errors out as well.

    Thanks.

     
    • Nickolay V. Shmyrev

      I tried using the librispeech-lexicon from http://www.openslr.org/11/, but noticed that it doesn't contain a lot of the words that are actually there in the transcriptions. This seems to not allow the training to proceed.

      You can create missing pronunciations with g2p

      Also, this contains phones such as AH0, AH1, etc. which are not there in the default phone set. Would it be a better idea to add these to the phoneset or could I convert all of them to their root phones, such as AH here, in the dictionary itself?

      There is no such thing as "default phone set". You create phoneset based on the dictionary you decide to use.

       
  • Hitesh

    Hitesh - 2016-11-15

    Is there a way I can ignore words not in the dictionary or treat them as OOV?

     
    • Nickolay V. Shmyrev

      Is there a way I can ignore words not in the dictionary or treat them as OOV?

      No

       

Log in to post a comment.