Menu

Final.mdl seems to be empty after training

Help
2015-04-28
2015-05-01
  • Gustavo Mendonça

    Hi everyone,

    First of all, thank you for this amazing toolkit!

    I am trying to adapt the Librispeech s5 recipe to my own data, but I am experiencing some problems. I was able to run all test scripts without any problem, however when I ran it with my own data, I noticed that the mkgraph was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it consumed 60Gb of RAM model and it had not finished even after running for 33 hours (I had to kill the process due to memory overflow). Then I was checking the data and I realized that my exp/mono/final.mdl was empty, although exp/mono/tree seems to have some content: 8kb. Could you please help me with this? Following, there are some logs and also some info about my data.

    Thanks in advance!

    Gustavo


    [Cmds and logs]
    When I call train_mono:

    steps/train_mono.sh --boost-silence 1.25 --nj 10 --cmd "$train_cmd" \
    data/train_2kshort data/lang exp/mono

    After the 39 passes, I get some warnings, but no errors:

    steps/train_mono.sh: Aligning data
    steps/train_mono.sh: Pass 39
    71 warnings in exp/mono/log/acc...log
    480 warnings in exp/mono/log/align...log
    Done

    When I grep all acc.*log warnings and remove the duplicates, I end up with three warnings:

    WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance wp-f48br08b11k1-s034
    WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance wp-m01br16b22k1-s162
    WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance wp-m01br16b22k1-s177

    I think I shouldn't worry about these, right? They are bad audios or bad transcriptions, but Kaldi is able to filter these.

    When I do grep all warnings from align.*log, these 3 files again show up:

    WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file wp-f48br08b11k1-s034, len = 219
    WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file wp-m01br16b22k1-s162, len = 72
    WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file wp-m01br16b22k1-s177, len = 197

    And messages for about 40 files that alignment was tried was a different beam number, as in:

    WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 24
    WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 40

    And a final message about the pdfs for the silence phones being shared.

    WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)

    [My data]
    This is the head and tail for each file in my data/train_2kshort directory:

    ==> data/train_2kshort/cmvn.scp <==
    TIMIT-DR1-FCJF0 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:271
    TIMIT-DR1-FDML0 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1036
    TIMIT-DR1-FELC0 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1546
    ...
    wp-m66br08b11k1 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:166786
    wp-m67br08b11k1 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167041
    wp-m68br08b11k1 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167296

    ==> data/train_2kshort/feats.scp <==
    TIMIT-DR1-FCJF0-SA2 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:51435
    TIMIT-DR1-FCJF0-SX127 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:67144
    TIMIT-DR1-FCJF0-SX217 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:69267
    ...
    wp-m68br08b11k1-s035 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5652668
    wp-m68br08b11k1-s089 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5709693
    wp-m68br08b11k1-s091 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5712673

    ==> data/train_2kshort/spk2utt <==
    TIMIT-DR1-FCJF0 TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0-SX127 TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0-SX307 TIMIT-DR1-FCJF0-SX37
    TIMIT-DR1-FDML0 TIMIT-DR1-FDML0-SI2075 TIMIT-DR1-FDML0-SX249
    TIMIT-DR1-FELC0 TIMIT-DR1-FELC0-SX216
    ...
    wp-m67br08b11k1 wp-m67br08b11k1-s034 wp-m67br08b11k1-s035 wp-m67br08b11k1-s089
    wp-m68br08b11k1 wp-m68br08b11k1-s034 wp-m68br08b11k1-s035 wp-m68br08b11k1-s089 wp-m68br08b11k1-s091

    ==> data/train_2kshort/text <==
    TIMIT-DR1-FCJF0-SA2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
    TIMIT-DR1-FCJF0-SX127 THE EMPEROR HAD A MEAN TEMPER
    TIMIT-DR1-FCJF0-SX217 HOW PERMANENT ARE THEIR RECORDS
    ...
    wp-m68br08b11k1-s035 OLHA SÓ
    wp-m68br08b11k1-s089 ATÉ LOGO
    wp-m68br08b11k1-s091 TUDO BEM

    ==> data/train_2kshort/utt2spk <==
    TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0
    TIMIT-DR1-FCJF0-SX127 TIMIT-DR1-FCJF0
    TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0
    ...
    wp-m68br08b11k1-s035 wp-m68br08b11k1
    wp-m68br08b11k1-s089 wp-m68br08b11k1
    wp-m68br08b11k1-s091 wp-m68br08b11k1

    ==> data/train_2kshort/wav.scp <==
    TIMIT-DR1-FCJF0-SA2 corpora/TIMIT/WAV/DR1-FCJF0-SA2.WAV
    TIMIT-DR1-FCJF0-SX127 corpora/TIMIT/WAV/DR1-FCJF0-SX127.WAV
    TIMIT-DR1-FCJF0-SX217 corpora/TIMIT/WAV/DR1-FCJF0-SX217.WAV
    ...
    wp-m68br08b11k1-s035 corpora/westpoint/treino/m68br08b11k1/s035.wav
    wp-m68br08b11k1-s089 corpora/westpoint/treino/m68br08b11k1/s089.wav
    wp-m68br08b11k1-s091 corpora/westpoint/treino/m68br08b11k1/s091.wav

    This is the output of running validate_lang.pl on data/lang:

    Checking data/lang/phones.txt ...
    --> data/lang/phones.txt is OK

    Checking words.txt: #0 ...
    --> data/lang/words.txt has "#0"
    --> data/lang/words.txt is OK

    Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
    --> silence.txt and nonsilence.txt are disjoint
    --> silence.txt and disambig.txt are disjoint
    --> disambig.txt and nonsilence.txt are disjoint
    --> disjoint property is OK

    Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
    --> summation property is OK

    Checking data/lang/phones/context_indep.{txt, int, csl} ...
    --> 10 entry/entries in data/lang/phones/context_indep.txt
    --> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt
    --> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt
    --> data/lang/phones/context_indep.{txt, int, csl} are OK

    Checking data/lang/phones/disambig.{txt, int, csl} ...
    --> 18 entry/entries in data/lang/phones/disambig.txt
    --> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt
    --> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt
    --> data/lang/phones/disambig.{txt, int, csl} are OK

    Checking data/lang/phones/nonsilence.{txt, int, csl} ...
    --> 200 entry/entries in data/lang/phones/nonsilence.txt
    --> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt
    --> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt
    --> data/lang/phones/nonsilence.{txt, int, csl} are OK

    Checking data/lang/phones/silence.{txt, int, csl} ...
    --> 10 entry/entries in data/lang/phones/silence.txt
    --> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt
    --> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt
    --> data/lang/phones/silence.{txt, int, csl} are OK

    Checking data/lang/phones/optional_silence.{txt, int, csl} ...
    --> 1 entry/entries in data/lang/phones/optional_silence.txt
    --> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt
    --> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt
    --> data/lang/phones/optional_silence.{txt, int, csl} are OK

    Checking data/lang/phones/roots.{txt, int} ...
    --> 52 entry/entries in data/lang/phones/roots.txt
    --> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
    --> data/lang/phones/roots.{txt, int} are OK

    Checking data/lang/phones/sets.{txt, int} ...
    --> 52 entry/entries in data/lang/phones/sets.txt
    --> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
    --> data/lang/phones/sets.{txt, int} are OK

    Checking data/lang/phones/extra_questions.{txt, int} ...
    --> 9 entry/entries in data/lang/phones/extra_questions.txt
    --> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt
    --> data/lang/phones/extra_questions.{txt, int} are OK

    Checking data/lang/phones/word_boundary.{txt, int} ...
    --> 210 entry/entries in data/lang/phones/word_boundary.txt
    --> data/lang/phones/word_boundary.int corresponds to data/lang/phones/word_boundary.txt
    --> data/lang/phones/word_boundary.{txt, int} are OK

    Checking optional_silence.txt ...
    --> reading data/lang/phones/optional_silence.txt
    --> data/lang/phones/optional_silence.txt is OK

    Checking disambiguation symbols: #0 and #1
    --> data/lang/phones/disambig.txt has "#0" and "#1"
    --> data/lang/phones/disambig.txt is OK

    Checking topo ...
    --> data/lang/topo's nonsilence section is OK
    --> data/lang/topo's silence section is OK
    --> data/lang/topo is OK

    Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
    --> data/lang/phones/word_boundary.txt doesn't include disambiguation symbols
    --> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
    --> data/lang/phones/word_boundary.txt is OK

    Checking word_boundary.int and disambig.int
    --> generating a 40 word sequence
    --> resulting phone sequence from L.fst corresponds to the word sequence
    --> L.fst is OK
    --> generating a 29 word sequence
    --> resulting phone sequence from L_disambig.fst corresponds to the word sequence
    --> L_disambig.fst is OK

    Checking data/lang/oov.{txt, int} ...
    --> 1 entry/entries in data/lang/oov.txt
    --> data/lang/oov.int corresponds to data/lang/oov.txt
    --> data/lang/oov.{txt, int} are OK

    --> data/lang/L.fst is olabel sorted
    --> data/lang/L_disambig.fst is olabel sorted
    --> SUCCESS [validating lang directory data/lang]

    [Some context]
    I am trying to build an ASR for Brazilian-accented English. I have a few corpora of native Brazilian Portuguese and native american English, which I intend to use together for training. However, for this test, I am just using a BR corpus, with about 8h of data in 6282 files, each file lasts about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids 'f35br16b22k1' of the speakers in the log, they indicate that the audios are 22k, but I downsampled the data using SoX and just kept the original id.

    Sorry for the very long message!

     
    • Jan "yenda" Trmal

      Gustavo, these seem like either two separate issues or the problems with
      ARPA conversion are results of the faulty AM.
      While the compilation and determinization of the decoding graph can take
      significant amount of time, I think for 13MB LM these should be relatively
      quick (minutes) -- of course depends on many things (order of the LM, for
      example).
      So first, try to figure out why the problems during training of monophone
      system. The content of the data dirs looks reasonable.
      For example, you should check, how many utterances had been omited -- it's
      ok to have some, but it's suspicious, if you will see some large number
      (even if for monophone system, even quite a large number can happen)
      y.

      On Tue, Apr 28, 2015 at 6:48 PM, "Gustavo Mendonça" <gustavomen@users.sf.net

      wrote:

      Hi everyone,

      First of all, thank you for this amazing toolkit!

      I am trying to adapt the Librispeech s5 recipe to my own data, but I am
      experiencing some problems. I was able to run all test scripts without any
      problem, however when I ran it with my own data, I noticed that the mkgraph
      was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it
      consumed 60Gb of RAM model and it had not finished even after running for
      33 hours (I had to kill the process due to memory overflow). Then I was
      checking the data and I realized that my exp/mono/final.mdl was empty,
      although exp/mono/tree seems to have some content: 8kb. Could you please
      help me with this? Following, there are some logs and also some info about
      my data.

      Thanks in advance!

      Gustavo


      [Cmds and logs]
      When I call train_mono:

      steps/train_mono.sh --boost-silence 1.25 --nj 10 --cmd "$train_cmd" \
      data/train_2kshort data/lang exp/mono

      After the 39 passes, I get some warnings, but no errors:

      steps/train_mono.sh: Aligning data
      steps/train_mono.sh: Pass 39
      71 warnings in exp/mono/log/acc...log
      480 warnings in exp/mono/log/align...log
      Done

      When I grep all acc.*log warnings and remove the duplicates, I end up
      with three warnings:

      WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
      for utterance wp-f48br08b11k1-s034
      WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
      for utterance wp-m01br16b22k1-s162
      WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
      for utterance wp-m01br16b22k1-s177

      I think I shouldn't worry about these, right? They are bad audios or bad
      transcriptions, but Kaldi is able to filter these.

      When I do grep all warnings from align.*log, these 3 files again show
      up:

      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
      not successfully decode file wp-f48br08b11k1-s034, len = 219
      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
      not successfully decode file wp-m01br16b22k1-s162, len = 72
      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
      not successfully decode file wp-m01br16b22k1-s177, len = 197

      And messages for about 40 files that alignment was tried was a different
      beam number, as in:

      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
      Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 24
      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
      Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 40

      And a final message about the pdfs for the silence phones being shared.

      WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for
      the silence phones may be shared by other phones (note: this probably does
      not matter.)

      [My data]
      This is the head and tail for each file in my data/train_2kshort
      directory:

      ==> data/train_2kshort/cmvn.scp <==
      TIMIT-DR1-FCJF0
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:271
      TIMIT-DR1-FDML0
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1036
      TIMIT-DR1-FELC0
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1546
      ...
      wp-m66br08b11k1
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:166786
      wp-m67br08b11k1
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167041
      wp-m68br08b11k1
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167296

      ==> data/train_2kshort/feats.scp <==
      TIMIT-DR1-FCJF0-SA2
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:51435
      TIMIT-DR1-FCJF0-SX127
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:67144
      TIMIT-DR1-FCJF0-SX217
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:69267
      ...
      wp-m68br08b11k1-s035
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5652668
      wp-m68br08b11k1-s089
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5709693
      wp-m68br08b11k1-s091
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5712673

      ==> data/train_2kshort/spk2utt <==
      TIMIT-DR1-FCJF0 TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0-SX127
      TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0-SX307 TIMIT-DR1-FCJF0-SX37
      TIMIT-DR1-FDML0 TIMIT-DR1-FDML0-SI2075 TIMIT-DR1-FDML0-SX249
      TIMIT-DR1-FELC0 TIMIT-DR1-FELC0-SX216
      ...
      wp-m67br08b11k1 wp-m67br08b11k1-s034 wp-m67br08b11k1-s035
      wp-m67br08b11k1-s089
      wp-m68br08b11k1 wp-m68br08b11k1-s034 wp-m68br08b11k1-s035
      wp-m68br08b11k1-s089 wp-m68br08b11k1-s091

      ==> data/train_2kshort/text <==
      TIMIT-DR1-FCJF0-SA2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
      TIMIT-DR1-FCJF0-SX127 THE EMPEROR HAD A MEAN TEMPER
      TIMIT-DR1-FCJF0-SX217 HOW PERMANENT ARE THEIR RECORDS
      ...
      wp-m68br08b11k1-s035 OLHA SÓ
      wp-m68br08b11k1-s089 ATÉ LOGO
      wp-m68br08b11k1-s091 TUDO BEM

      ==> data/train_2kshort/utt2spk <==
      TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0
      TIMIT-DR1-FCJF0-SX127 TIMIT-DR1-FCJF0
      TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0
      ...
      wp-m68br08b11k1-s035 wp-m68br08b11k1
      wp-m68br08b11k1-s089 wp-m68br08b11k1
      wp-m68br08b11k1-s091 wp-m68br08b11k1

      ==> data/train_2kshort/wav.scp <==
      TIMIT-DR1-FCJF0-SA2 corpora/TIMIT/WAV/DR1-FCJF0-SA2.WAV
      TIMIT-DR1-FCJF0-SX127 corpora/TIMIT/WAV/DR1-FCJF0-SX127.WAV
      TIMIT-DR1-FCJF0-SX217 corpora/TIMIT/WAV/DR1-FCJF0-SX217.WAV
      ...
      wp-m68br08b11k1-s035 corpora/westpoint/treino/m68br08b11k1/s035.wav
      wp-m68br08b11k1-s089 corpora/westpoint/treino/m68br08b11k1/s089.wav
      wp-m68br08b11k1-s091 corpora/westpoint/treino/m68br08b11k1/s091.wav

      This is the output of running validate_lang.pl on data/lang:

      Checking data/lang/phones.txt ...
      --> data/lang/phones.txt is OK

      Checking words.txt: #0 ...
      --> data/lang/words.txt has "#0"
      --> data/lang/words.txt is OK

      Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
      --> silence.txt and nonsilence.txt are disjoint
      --> silence.txt and disambig.txt are disjoint
      --> disambig.txt and nonsilence.txt are disjoint
      --> disjoint property is OK

      Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
      --> summation property is OK

      Checking data/lang/phones/context_indep.{txt, int, csl} ...
      --> 10 entry/entries in data/lang/phones/context_indep.txt
      --> data/lang/phones/context_indep.int corresponds to
      data/lang/phones/context_indep.txt
      --> data/lang/phones/context_indep.csl corresponds to
      data/lang/phones/context_indep.txt
      --> data/lang/phones/context_indep.{txt, int, csl} are OK

      Checking data/lang/phones/disambig.{txt, int, csl} ...
      --> 18 entry/entries in data/lang/phones/disambig.txt
      --> data/lang/phones/disambig.int corresponds to
      data/lang/phones/disambig.txt
      --> data/lang/phones/disambig.csl corresponds to
      data/lang/phones/disambig.txt
      --> data/lang/phones/disambig.{txt, int, csl} are OK

      Checking data/lang/phones/nonsilence.{txt, int, csl} ...
      --> 200 entry/entries in data/lang/phones/nonsilence.txt
      --> data/lang/phones/nonsilence.int corresponds to
      data/lang/phones/nonsilence.txt
      --> data/lang/phones/nonsilence.csl corresponds to
      data/lang/phones/nonsilence.txt
      --> data/lang/phones/nonsilence.{txt, int, csl} are OK

      Checking data/lang/phones/silence.{txt, int, csl} ...
      --> 10 entry/entries in data/lang/phones/silence.txt
      --> data/lang/phones/silence.int corresponds to
      data/lang/phones/silence.txt
      --> data/lang/phones/silence.csl corresponds to
      data/lang/phones/silence.txt
      --> data/lang/phones/silence.{txt, int, csl} are OK

      Checking data/lang/phones/optional_silence.{txt, int, csl} ...
      --> 1 entry/entries in data/lang/phones/optional_silence.txt
      --> data/lang/phones/optional_silence.int corresponds to
      data/lang/phones/optional_silence.txt
      --> data/lang/phones/optional_silence.csl corresponds to
      data/lang/phones/optional_silence.txt
      --> data/lang/phones/optional_silence.{txt, int, csl} are OK

      Checking data/lang/phones/roots.{txt, int} ...
      --> 52 entry/entries in data/lang/phones/roots.txt
      --> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
      --> data/lang/phones/roots.{txt, int} are OK

      Checking data/lang/phones/sets.{txt, int} ...
      --> 52 entry/entries in data/lang/phones/sets.txt
      --> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
      --> data/lang/phones/sets.{txt, int} are OK

      Checking data/lang/phones/extra_questions.{txt, int} ...
      --> 9 entry/entries in data/lang/phones/extra_questions.txt
      --> data/lang/phones/extra_questions.int corresponds to
      data/lang/phones/extra_questions.txt
      --> data/lang/phones/extra_questions.{txt, int} are OK

      Checking data/lang/phones/word_boundary.{txt, int} ...
      --> 210 entry/entries in data/lang/phones/word_boundary.txt
      --> data/lang/phones/word_boundary.int corresponds to
      data/lang/phones/word_boundary.txt
      --> data/lang/phones/word_boundary.{txt, int} are OK

      Checking optional_silence.txt ...
      --> reading data/lang/phones/optional_silence.txt
      --> data/lang/phones/optional_silence.txt is OK

      Checking disambiguation symbols: #0 and #1
      --> data/lang/phones/disambig.txt has "#0" and "#1"
      --> data/lang/phones/disambig.txt is OK

      Checking topo ...
      --> data/lang/topo's nonsilence section is OK
      --> data/lang/topo's silence section is OK
      --> data/lang/topo is OK

      Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
      --> data/lang/phones/word_boundary.txt doesn't include disambiguation
      symbols
      --> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and
      silence.txt
      --> data/lang/phones/word_boundary.txt is OK

      Checking word_boundary.int and disambig.int
      --> generating a 40 word sequence
      --> resulting phone sequence from L.fst corresponds to the word sequence
      --> L.fst is OK
      --> generating a 29 word sequence
      --> resulting phone sequence from L_disambig.fst corresponds to the word
      sequence
      --> L_disambig.fst is OK

      Checking data/lang/oov.{txt, int} ...
      --> 1 entry/entries in data/lang/oov.txt
      --> data/lang/oov.int corresponds to data/lang/oov.txt
      --> data/lang/oov.{txt, int} are OK

      --> data/lang/L.fst is olabel sorted
      --> data/lang/L_disambig.fst is olabel sorted
      --> SUCCESS [validating lang directory data/lang]

      [Some context]
      I am trying to build an ASR for Brazilian-accented English. I have a few
      corpora of native Brazilian Portuguese and native american English, which I
      intend to use together for training. However, for this test, I am just
      using a BR corpus, with about 8h of data in 6282 files, each file lasts
      about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids
      'f35br16b22k1' of the speakers in the log, they indicate that the audios
      are 22k, but I downsampled the data using SoX and just kept the original id.

      Sorry for the very long message!


      Final.mdl seems to be empty after training (sourceforge.net)


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/kaldi/discussion/1355348/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
    • Daniel Povey

      Daniel Povey - 2015-04-29

      Determinization errors are usually caused by issues with the lexicon, such
      as words with the same pronunciation, but it's surprising it would happen
      if validate_lang.pl succeeded. Or it could be something broken in your
      arpa file, possibly. Again, validate_lang.pl on the directory your G.fst
      is in should pick up errors, I'm not sure if you ran that script on that
      directory also (e.g. if it's differently named than the directory you ran
      the script on ).

      If your final.mdl is empty, it's surprising the train_mono.sh script
      succeeded- errors would usually cause it to die. But possibly it's a soft
      link and you just got confused when measuring the size?
      You may have to look at the timestamps to figure out what went wrong.
      Dan

      On Tue, Apr 28, 2015 at 6:48 PM, "Gustavo Mendonça" <gustavomen@users.sf.net

      wrote:

      Hi everyone,

      First of all, thank you for this amazing toolkit!

      I am trying to adapt the Librispeech s5 recipe to my own data, but I am
      experiencing some problems. I was able to run all test scripts without any
      problem, however when I ran it with my own data, I noticed that the mkgraph
      was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it
      consumed 60Gb of RAM model and it had not finished even after running for
      33 hours (I had to kill the process due to memory overflow). Then I was
      checking the data and I realized that my exp/mono/final.mdl was empty,
      although exp/mono/tree seems to have some content: 8kb. Could you please
      help me with this? Following, there are some logs and also some info about
      my data.

      Thanks in advance!

      Gustavo

      [Cmds and logs]
      When I call train_mono:

      steps/train_mono.sh --boost-silence 1.25 --nj 10 --cmd "$train_cmd" \
      data/train_2kshort data/lang exp/mono

      After the 39 passes, I get some warnings, but no errors:

      steps/train_mono.sh: Aligning data
      steps/train_mono.sh: Pass 39
      71 warnings in exp/mono/log/acc...log
      480 warnings in exp/mono/log/align...log
      Done

      When I grep all acc.log* warnings and remove the duplicates, I end up
      with three warnings:

      WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
      for utterance wp-f48br08b11k1-s034
      WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
      for utterance wp-m01br16b22k1-s162
      WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
      for utterance wp-m01br16b22k1-s177

      I think I shouldn't worry about these, right? They are bad audios or bad
      transcriptions, but Kaldi is able to filter these.

      When I do grep all warnings from align.log*, these 3 files again show
      up:

      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
      not successfully decode file wp-f48br08b11k1-s034, len = 219
      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
      not successfully decode file wp-m01br16b22k1-s162, len = 72
      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
      not successfully decode file wp-m01br16b22k1-s177, len = 197

      And messages for about 40 files that alignment was tried was a different
      beam number, as in:

      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
      Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 24
      WARNING
      (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
      Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 40

      And a final message about the pdfs for the silence phones being shared.

      WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for
      the silence phones may be shared by other phones (note: this probably does
      not matter.)

      [My data]
      This is the head and tail for each file in my data/train_2kshort
      directory:

      ==> data/train_2kshort/cmvn.scp <==
      TIMIT-DR1-FCJF0
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:271
      TIMIT-DR1-FDML0
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1036
      TIMIT-DR1-FELC0
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1546
      ...
      wp-m66br08b11k1
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:166786
      wp-m67br08b11k1
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167041
      wp-m68br08b11k1
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167296

      ==> data/train_2kshort/feats.scp <==
      TIMIT-DR1-FCJF0-SA2
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:51435
      TIMIT-DR1-FCJF0-SX127
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:67144
      TIMIT-DR1-FCJF0-SX217
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:69267
      ...
      wp-m68br08b11k1-s035
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5652668
      wp-m68br08b11k1-s089
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5709693
      wp-m68br08b11k1-s091
      /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5712673

      ==> data/train_2kshort/spk2utt <==
      TIMIT-DR1-FCJF0 TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0-SX127
      TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0-SX307 TIMIT-DR1-FCJF0-SX37
      TIMIT-DR1-FDML0 TIMIT-DR1-FDML0-SI2075 TIMIT-DR1-FDML0-SX249
      TIMIT-DR1-FELC0 TIMIT-DR1-FELC0-SX216
      ...
      wp-m67br08b11k1 wp-m67br08b11k1-s034 wp-m67br08b11k1-s035
      wp-m67br08b11k1-s089
      wp-m68br08b11k1 wp-m68br08b11k1-s034 wp-m68br08b11k1-s035
      wp-m68br08b11k1-s089 wp-m68br08b11k1-s091

      ==> data/train_2kshort/text <==
      TIMIT-DR1-FCJF0-SA2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
      TIMIT-DR1-FCJF0-SX127 THE EMPEROR HAD A MEAN TEMPER
      TIMIT-DR1-FCJF0-SX217 HOW PERMANENT ARE THEIR RECORDS
      ...
      wp-m68br08b11k1-s035 OLHA SÓ
      wp-m68br08b11k1-s089 ATÉ LOGO
      wp-m68br08b11k1-s091 TUDO BEM

      ==> data/train_2kshort/utt2spk <==
      TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0
      TIMIT-DR1-FCJF0-SX127 TIMIT-DR1-FCJF0
      TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0
      ...
      wp-m68br08b11k1-s035 wp-m68br08b11k1
      wp-m68br08b11k1-s089 wp-m68br08b11k1
      wp-m68br08b11k1-s091 wp-m68br08b11k1

      ==> data/train_2kshort/wav.scp <==
      TIMIT-DR1-FCJF0-SA2 corpora/TIMIT/WAV/DR1-FCJF0-SA2.WAV
      TIMIT-DR1-FCJF0-SX127 corpora/TIMIT/WAV/DR1-FCJF0-SX127.WAV
      TIMIT-DR1-FCJF0-SX217 corpora/TIMIT/WAV/DR1-FCJF0-SX217.WAV
      ...
      wp-m68br08b11k1-s035 corpora/westpoint/treino/m68br08b11k1/s035.wav
      wp-m68br08b11k1-s089 corpora/westpoint/treino/m68br08b11k1/s089.wav
      wp-m68br08b11k1-s091 corpora/westpoint/treino/m68br08b11k1/s091.wav

      This is the output of running validate_lang.pl http://validate_lang.pl
      on data/lang:

      Checking data/lang/phones.txt ...
      --> data/lang/phones.txt is OK

      Checking words.txt: #0 ...
      --> data/lang/words.txt has "#0"
      --> data/lang/words.txt is OK

      Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
      --> silence.txt and nonsilence.txt are disjoint
      --> silence.txt and disambig.txt are disjoint
      --> disambig.txt and nonsilence.txt are disjoint
      --> disjoint property is OK

      Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
      --> summation property is OK

      Checking data/lang/phones/context_indep.{txt, int, csl} ...
      --> 10 entry/entries in data/lang/phones/context_indep.txt
      --> data/lang/phones/context_indep.int corresponds to
      data/lang/phones/context_indep.txt
      --> data/lang/phones/context_indep.csl corresponds to
      data/lang/phones/context_indep.txt
      --> data/lang/phones/context_indep.{txt, int, csl} are OK

      Checking data/lang/phones/disambig.{txt, int, csl} ...
      --> 18 entry/entries in data/lang/phones/disambig.txt
      --> data/lang/phones/disambig.int corresponds to
      data/lang/phones/disambig.txt
      --> data/lang/phones/disambig.csl corresponds to
      data/lang/phones/disambig.txt
      --> data/lang/phones/disambig.{txt, int, csl} are OK

      Checking data/lang/phones/nonsilence.{txt, int, csl} ...
      --> 200 entry/entries in data/lang/phones/nonsilence.txt
      --> data/lang/phones/nonsilence.int corresponds to
      data/lang/phones/nonsilence.txt
      --> data/lang/phones/nonsilence.csl corresponds to
      data/lang/phones/nonsilence.txt
      --> data/lang/phones/nonsilence.{txt, int, csl} are OK

      Checking data/lang/phones/silence.{txt, int, csl} ...
      --> 10 entry/entries in data/lang/phones/silence.txt
      --> data/lang/phones/silence.int corresponds to
      data/lang/phones/silence.txt
      --> data/lang/phones/silence.csl corresponds to
      data/lang/phones/silence.txt
      --> data/lang/phones/silence.{txt, int, csl} are OK

      Checking data/lang/phones/optional_silence.{txt, int, csl} ...
      --> 1 entry/entries in data/lang/phones/optional_silence.txt
      --> data/lang/phones/optional_silence.int corresponds to
      data/lang/phones/optional_silence.txt
      --> data/lang/phones/optional_silence.csl corresponds to
      data/lang/phones/optional_silence.txt
      --> data/lang/phones/optional_silence.{txt, int, csl} are OK

      Checking data/lang/phones/roots.{txt, int} ...
      --> 52 entry/entries in data/lang/phones/roots.txt
      --> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
      --> data/lang/phones/roots.{txt, int} are OK

      Checking data/lang/phones/sets.{txt, int} ...
      --> 52 entry/entries in data/lang/phones/sets.txt
      --> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
      --> data/lang/phones/sets.{txt, int} are OK

      Checking data/lang/phones/extra_questions.{txt, int} ...
      --> 9 entry/entries in data/lang/phones/extra_questions.txt
      --> data/lang/phones/extra_questions.int corresponds to
      data/lang/phones/extra_questions.txt
      --> data/lang/phones/extra_questions.{txt, int} are OK

      Checking data/lang/phones/word_boundary.{txt, int} ...
      --> 210 entry/entries in data/lang/phones/word_boundary.txt
      --> data/lang/phones/word_boundary.int corresponds to
      data/lang/phones/word_boundary.txt
      --> data/lang/phones/word_boundary.{txt, int} are OK

      Checking optional_silence.txt ...
      --> reading data/lang/phones/optional_silence.txt
      --> data/lang/phones/optional_silence.txt is OK

      Checking disambiguation symbols: #0 and #1
      --> data/lang/phones/disambig.txt has "#0" and "#1"
      --> data/lang/phones/disambig.txt is OK

      Checking topo ...
      --> data/lang/topo's nonsilence section is OK
      --> data/lang/topo's silence section is OK
      --> data/lang/topo is OK

      Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
      --> data/lang/phones/word_boundary.txt doesn't include disambiguation
      symbols
      --> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and
      silence.txt
      --> data/lang/phones/word_boundary.txt is OK

      Checking word_boundary.int and disambig.int
      --> generating a 40 word sequence
      --> resulting phone sequence from L.fst corresponds to the word sequence
      --> L.fst is OK
      --> generating a 29 word sequence
      --> resulting phone sequence from L_disambig.fst corresponds to the word
      sequence
      --> L_disambig.fst is OK

      Checking data/lang/oov.{txt, int} ...
      --> 1 entry/entries in data/lang/oov.txt
      --> data/lang/oov.int corresponds to data/lang/oov.txt
      --> data/lang/oov.{txt, int} are OK

      --> data/lang/L.fst is olabel sorted
      --> data/lang/L_disambig.fst is olabel sorted
      --> SUCCESS [validating lang directory data/lang]

      [Some context]
      I am trying to build an ASR for Brazilian-accented English. I have a few
      corpora of native Brazilian Portuguese and native american English, which I
      intend to use together for training. However, for this test, I am just
      using a BR corpus, with about 8h of data in 6282 files, each file lasts
      about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids
      'f35br16b22k1' of the speakers in the log, they indicate that the audios
      are 22k, but I downsampled the data using SoX and just kept the original id.

      Sorry for the very long message!

      Final.mdl seems to be empty after training
      https://sourceforge.net/p/kaldi/discussion/1355348/thread/1ea760d8/?limit=25#5246


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Gustavo Mendonça

    Hi Jan and Dan,

    Thank you very much for your help! I will check each of the issues you raised.

    It is good to know that a 13Mb LM should take just a few minutes to process. I was basically waiting a day each time I run the script it! The order of the LM is not high, I am using standard trigrams, so I assume there would be no problem.

    About the final.mdl, I am so sorry about it. I feel dumb, it was indeed a soft link and I did not notice. :-)

    I will double check my lexicon and LM to try to find the problem.

    Thanks again!

     

    Last edit: Gustavo Mendonça 2015-04-30
    • Jan "yenda" Trmal

      Just a note: when I say "a few minutes", it's not something that could be
      guaranteed. Depends on the complexity of the LM, especially on how many
      alternative paths is necessary to track during determinization (I.think)
      Especially when you manipulate or create the LM artifically, these things
      might occur.
      y.

      On Thu, Apr 30, 2015 at 7:56 PM, "Gustavo Mendonça" <gustavomen@users.sf.net

      wrote:

      Hi Jan and Dan,

      Thank you very much for your help! I will check each of the issues you
      raised.

      It is good to know that a 13Mb LM should take just a few minutes to
      process. I was basically waiting a day each time I run the script it! The
      order of the LM is not high, I am using standard trigrams, so I assume
      there would be no problem.

      About the final.mdl, I am so sorry about it. I feel dumb, it was indeed a
      soft link and I did not notice. :-)

      I will double check my lexicon and LM to try to find the problem.

      Thanks again!


      Final.mdl seems to be empty after training (sourceforge.net)


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/kaldi/discussion/1355348/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
MongoDB Logo MongoDB