Kaldi / Discussion / Help: Final.mdl seems to be empty after training

Gustavo Mendonça - 2015-04-28

Hi everyone,

First of all, thank you for this amazing toolkit!

I am trying to adapt the Librispeech s5 recipe to my own data, but I am experiencing some problems. I was able to run all test scripts without any problem, however when I ran it with my own data, I noticed that the mkgraph was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it consumed 60Gb of RAM model and it had not finished even after running for 33 hours (I had to kill the process due to memory overflow). Then I was checking the data and I realized that my exp/mono/final.mdl was empty, although exp/mono/tree seems to have some content: 8kb. Could you please help me with this? Following, there are some logs and also some info about my data.

Thanks in advance!

Gustavo

[Cmds and logs]
When I call train_mono:

steps/train_mono.sh --boost-silence 1.25 --nj 10 --cmd "$train_cmd" \
data/train_2kshort data/lang exp/mono

After the 39 passes, I get some warnings, but no errors:

steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 39
71 warnings in exp/mono/log/acc...log
480 warnings in exp/mono/log/align...log
Done

When I grep all acc.*log warnings and remove the duplicates, I end up with three warnings:

WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance wp-f48br08b11k1-s034
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance wp-m01br16b22k1-s162
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance wp-m01br16b22k1-s177

I think I shouldn't worry about these, right? They are bad audios or bad transcriptions, but Kaldi is able to filter these.

When I do grep all warnings from align.*log, these 3 files again show up:

WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file wp-f48br08b11k1-s034, len = 219
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file wp-m01br16b22k1-s162, len = 72
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file wp-m01br16b22k1-s177, len = 197

And messages for about 40 files that alignment was tried was a different beam number, as in:

WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 24
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 40

And a final message about the pdfs for the silence phones being shared.

WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)

[My data]
This is the head and tail for each file in my data/train_2kshort directory:

==> data/train_2kshort/cmvn.scp <==
TIMIT-DR1-FCJF0 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:271
TIMIT-DR1-FDML0 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1036
TIMIT-DR1-FELC0 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1546
...
wp-m66br08b11k1 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:166786
wp-m67br08b11k1 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167041
wp-m68br08b11k1 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167296

==> data/train_2kshort/feats.scp <==
TIMIT-DR1-FCJF0-SA2 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:51435
TIMIT-DR1-FCJF0-SX127 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:67144
TIMIT-DR1-FCJF0-SX217 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:69267
...
wp-m68br08b11k1-s035 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5652668
wp-m68br08b11k1-s089 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5709693
wp-m68br08b11k1-s091 /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5712673

==> data/train_2kshort/spk2utt <==
TIMIT-DR1-FCJF0 TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0-SX127 TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0-SX307 TIMIT-DR1-FCJF0-SX37
TIMIT-DR1-FDML0 TIMIT-DR1-FDML0-SI2075 TIMIT-DR1-FDML0-SX249
TIMIT-DR1-FELC0 TIMIT-DR1-FELC0-SX216
...
wp-m67br08b11k1 wp-m67br08b11k1-s034 wp-m67br08b11k1-s035 wp-m67br08b11k1-s089
wp-m68br08b11k1 wp-m68br08b11k1-s034 wp-m68br08b11k1-s035 wp-m68br08b11k1-s089 wp-m68br08b11k1-s091

==> data/train_2kshort/text <==
TIMIT-DR1-FCJF0-SA2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
TIMIT-DR1-FCJF0-SX127 THE EMPEROR HAD A MEAN TEMPER
TIMIT-DR1-FCJF0-SX217 HOW PERMANENT ARE THEIR RECORDS
...
wp-m68br08b11k1-s035 OLHA SÓ
wp-m68br08b11k1-s089 ATÉ LOGO
wp-m68br08b11k1-s091 TUDO BEM

==> data/train_2kshort/utt2spk <==
TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0
TIMIT-DR1-FCJF0-SX127 TIMIT-DR1-FCJF0
TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0
...
wp-m68br08b11k1-s035 wp-m68br08b11k1
wp-m68br08b11k1-s089 wp-m68br08b11k1
wp-m68br08b11k1-s091 wp-m68br08b11k1

==> data/train_2kshort/wav.scp <==
TIMIT-DR1-FCJF0-SA2 corpora/TIMIT/WAV/DR1-FCJF0-SA2.WAV
TIMIT-DR1-FCJF0-SX127 corpora/TIMIT/WAV/DR1-FCJF0-SX127.WAV
TIMIT-DR1-FCJF0-SX217 corpora/TIMIT/WAV/DR1-FCJF0-SX217.WAV
...
wp-m68br08b11k1-s035 corpora/westpoint/treino/m68br08b11k1/s035.wav
wp-m68br08b11k1-s089 corpora/westpoint/treino/m68br08b11k1/s089.wav
wp-m68br08b11k1-s091 corpora/westpoint/treino/m68br08b11k1/s091.wav

This is the output of running validate_lang.pl on data/lang:

Checking data/lang/phones.txt ...
--> data/lang/phones.txt is OK

Checking words.txt: #0 ...
--> data/lang/words.txt has "#0"
--> data/lang/words.txt is OK

Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK

Checking data/lang/phones/context_indep.{txt, int, csl} ...
--> 10 entry/entries in data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.{txt, int, csl} are OK

Checking data/lang/phones/disambig.{txt, int, csl} ...
--> 18 entry/entries in data/lang/phones/disambig.txt
--> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.{txt, int, csl} are OK

Checking data/lang/phones/nonsilence.{txt, int, csl} ...
--> 200 entry/entries in data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.{txt, int, csl} are OK

Checking data/lang/phones/silence.{txt, int, csl} ...
--> 10 entry/entries in data/lang/phones/silence.txt
--> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.{txt, int, csl} are OK

Checking data/lang/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.{txt, int, csl} are OK

Checking data/lang/phones/roots.{txt, int} ...
--> 52 entry/entries in data/lang/phones/roots.txt
--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
--> data/lang/phones/roots.{txt, int} are OK

Checking data/lang/phones/sets.{txt, int} ...
--> 52 entry/entries in data/lang/phones/sets.txt
--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
--> data/lang/phones/sets.{txt, int} are OK

Checking data/lang/phones/extra_questions.{txt, int} ...
--> 9 entry/entries in data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.{txt, int} are OK

Checking data/lang/phones/word_boundary.{txt, int} ...
--> 210 entry/entries in data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.int corresponds to data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.{txt, int} are OK

Checking optional_silence.txt ...
--> reading data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1
--> data/lang/phones/disambig.txt has "#0" and "#1"
--> data/lang/phones/disambig.txt is OK

Checking topo ...
--> data/lang/topo's nonsilence section is OK
--> data/lang/topo's silence section is OK
--> data/lang/topo is OK

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> data/lang/phones/word_boundary.txt doesn't include disambiguation symbols
--> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> data/lang/phones/word_boundary.txt is OK

Checking word_boundary.int and disambig.int
--> generating a 40 word sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> generating a 29 word sequence
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK

Checking data/lang/oov.{txt, int} ...
--> 1 entry/entries in data/lang/oov.txt
--> data/lang/oov.int corresponds to data/lang/oov.txt
--> data/lang/oov.{txt, int} are OK

--> data/lang/L.fst is olabel sorted
--> data/lang/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang]

[Some context]
I am trying to build an ASR for Brazilian-accented English. I have a few corpora of native Brazilian Portuguese and native american English, which I intend to use together for training. However, for this test, I am just using a BR corpus, with about 8h of data in 6282 files, each file lasts about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids 'f35br16b22k1' of the speakers in the log, they indicate that the audios are 22k, but I downsampled the data using SoX and just kept the original id.

Sorry for the very long message!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jan "yenda" Trmal - 2015-04-29
  
  Gustavo, these seem like either two separate issues or the problems with
  ARPA conversion are results of the faulty AM.
  While the compilation and determinization of the decoding graph can take
  significant amount of time, I think for 13MB LM these should be relatively
  quick (minutes) -- of course depends on many things (order of the LM, for
  example).
  So first, try to figure out why the problems during training of monophone
  system. The content of the data dirs looks reasonable.
  For example, you should check, how many utterances had been omited -- it's
  ok to have some, but it's suspicious, if you will see some large number
  (even if for monophone system, even quite a large number can happen)
  y.
  
  On Tue, Apr 28, 2015 at 6:48 PM, "Gustavo Mendonça" <gustavomen@users.sf.net
  
  wrote:
  
  Hi everyone,
  
  First of all, thank you for this amazing toolkit!
  
  I am trying to adapt the Librispeech s5 recipe to my own data, but I am
  experiencing some problems. I was able to run all test scripts without any
  problem, however when I ran it with my own data, I noticed that the mkgraph
  was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it
  consumed 60Gb of RAM model and it had not finished even after running for
  33 hours (I had to kill the process due to memory overflow). Then I was
  checking the data and I realized that my exp/mono/final.mdl was empty,
  although exp/mono/tree seems to have some content: 8kb. Could you please
  help me with this? Following, there are some logs and also some info about
  my data.
  
  Thanks in advance!
  
  Gustavo
  
  [Cmds and logs]
  When I call train_mono:
  
  steps/train_mono.sh --boost-silence 1.25 --nj 10 --cmd "$train_cmd" \
  data/train_2kshort data/lang exp/mono
  
  After the 39 passes, I get some warnings, but no errors:
  
  steps/train_mono.sh: Aligning data
  steps/train_mono.sh: Pass 39
  71 warnings in exp/mono/log/acc...log
  480 warnings in exp/mono/log/align...log
  Done
  
  When I grep all acc.*log warnings and remove the duplicates, I end up
  with three warnings:
  
  WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
  for utterance wp-f48br08b11k1-s034
  WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
  for utterance wp-m01br16b22k1-s162
  WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
  for utterance wp-m01br16b22k1-s177
  
  I think I shouldn't worry about these, right? They are bad audios or bad
  transcriptions, but Kaldi is able to filter these.
  
  When I do grep all warnings from align.*log, these 3 files again show
  up:
  
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
  not successfully decode file wp-f48br08b11k1-s034, len = 219
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
  not successfully decode file wp-m01br16b22k1-s162, len = 72
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
  not successfully decode file wp-m01br16b22k1-s177, len = 197
  
  And messages for about 40 files that alignment was tried was a different
  beam number, as in:
  
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
  Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 24
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
  Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 40
  
  And a final message about the pdfs for the silence phones being shared.
  
  WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for
  the silence phones may be shared by other phones (note: this probably does
  not matter.)
  
  [My data]
  This is the head and tail for each file in my data/train_2kshort
  directory:
  
  ==> data/train_2kshort/cmvn.scp <==
  TIMIT-DR1-FCJF0
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:271
  TIMIT-DR1-FDML0
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1036
  TIMIT-DR1-FELC0
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1546
  ...
  wp-m66br08b11k1
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:166786
  wp-m67br08b11k1
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167041
  wp-m68br08b11k1
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167296
  
  ==> data/train_2kshort/feats.scp <==
  TIMIT-DR1-FCJF0-SA2
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:51435
  TIMIT-DR1-FCJF0-SX127
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:67144
  TIMIT-DR1-FCJF0-SX217
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:69267
  ...
  wp-m68br08b11k1-s035
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5652668
  wp-m68br08b11k1-s089
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5709693
  wp-m68br08b11k1-s091
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5712673
  
  ==> data/train_2kshort/spk2utt <==
  TIMIT-DR1-FCJF0 TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0-SX127
  TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0-SX307 TIMIT-DR1-FCJF0-SX37
  TIMIT-DR1-FDML0 TIMIT-DR1-FDML0-SI2075 TIMIT-DR1-FDML0-SX249
  TIMIT-DR1-FELC0 TIMIT-DR1-FELC0-SX216
  ...
  wp-m67br08b11k1 wp-m67br08b11k1-s034 wp-m67br08b11k1-s035
  wp-m67br08b11k1-s089
  wp-m68br08b11k1 wp-m68br08b11k1-s034 wp-m68br08b11k1-s035
  wp-m68br08b11k1-s089 wp-m68br08b11k1-s091
  
  ==> data/train_2kshort/text <==
  TIMIT-DR1-FCJF0-SA2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
  TIMIT-DR1-FCJF0-SX127 THE EMPEROR HAD A MEAN TEMPER
  TIMIT-DR1-FCJF0-SX217 HOW PERMANENT ARE THEIR RECORDS
  ...
  wp-m68br08b11k1-s035 OLHA SÓ
  wp-m68br08b11k1-s089 ATÉ LOGO
  wp-m68br08b11k1-s091 TUDO BEM
  
  ==> data/train_2kshort/utt2spk <==
  TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0
  TIMIT-DR1-FCJF0-SX127 TIMIT-DR1-FCJF0
  TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0
  ...
  wp-m68br08b11k1-s035 wp-m68br08b11k1
  wp-m68br08b11k1-s089 wp-m68br08b11k1
  wp-m68br08b11k1-s091 wp-m68br08b11k1
  
  ==> data/train_2kshort/wav.scp <==
  TIMIT-DR1-FCJF0-SA2 corpora/TIMIT/WAV/DR1-FCJF0-SA2.WAV
  TIMIT-DR1-FCJF0-SX127 corpora/TIMIT/WAV/DR1-FCJF0-SX127.WAV
  TIMIT-DR1-FCJF0-SX217 corpora/TIMIT/WAV/DR1-FCJF0-SX217.WAV
  ...
  wp-m68br08b11k1-s035 corpora/westpoint/treino/m68br08b11k1/s035.wav
  wp-m68br08b11k1-s089 corpora/westpoint/treino/m68br08b11k1/s089.wav
  wp-m68br08b11k1-s091 corpora/westpoint/treino/m68br08b11k1/s091.wav
  
  This is the output of running validate_lang.pl on data/lang:
  
  Checking data/lang/phones.txt ...
  --> data/lang/phones.txt is OK
  
  Checking words.txt: #0 ...
  --> data/lang/words.txt has "#0"
  --> data/lang/words.txt is OK
  
  Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
  --> silence.txt and nonsilence.txt are disjoint
  --> silence.txt and disambig.txt are disjoint
  --> disambig.txt and nonsilence.txt are disjoint
  --> disjoint property is OK
  
  Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
  --> summation property is OK
  
  Checking data/lang/phones/context_indep.{txt, int, csl} ...
  --> 10 entry/entries in data/lang/phones/context_indep.txt
  --> data/lang/phones/context_indep.int corresponds to
  data/lang/phones/context_indep.txt
  --> data/lang/phones/context_indep.csl corresponds to
  data/lang/phones/context_indep.txt
  --> data/lang/phones/context_indep.{txt, int, csl} are OK
  
  Checking data/lang/phones/disambig.{txt, int, csl} ...
  --> 18 entry/entries in data/lang/phones/disambig.txt
  --> data/lang/phones/disambig.int corresponds to
  data/lang/phones/disambig.txt
  --> data/lang/phones/disambig.csl corresponds to
  data/lang/phones/disambig.txt
  --> data/lang/phones/disambig.{txt, int, csl} are OK
  
  Checking data/lang/phones/nonsilence.{txt, int, csl} ...
  --> 200 entry/entries in data/lang/phones/nonsilence.txt
  --> data/lang/phones/nonsilence.int corresponds to
  data/lang/phones/nonsilence.txt
  --> data/lang/phones/nonsilence.csl corresponds to
  data/lang/phones/nonsilence.txt
  --> data/lang/phones/nonsilence.{txt, int, csl} are OK
  
  Checking data/lang/phones/silence.{txt, int, csl} ...
  --> 10 entry/entries in data/lang/phones/silence.txt
  --> data/lang/phones/silence.int corresponds to
  data/lang/phones/silence.txt
  --> data/lang/phones/silence.csl corresponds to
  data/lang/phones/silence.txt
  --> data/lang/phones/silence.{txt, int, csl} are OK
  
  Checking data/lang/phones/optional_silence.{txt, int, csl} ...
  --> 1 entry/entries in data/lang/phones/optional_silence.txt
  --> data/lang/phones/optional_silence.int corresponds to
  data/lang/phones/optional_silence.txt
  --> data/lang/phones/optional_silence.csl corresponds to
  data/lang/phones/optional_silence.txt
  --> data/lang/phones/optional_silence.{txt, int, csl} are OK
  
  Checking data/lang/phones/roots.{txt, int} ...
  --> 52 entry/entries in data/lang/phones/roots.txt
  --> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
  --> data/lang/phones/roots.{txt, int} are OK
  
  Checking data/lang/phones/sets.{txt, int} ...
  --> 52 entry/entries in data/lang/phones/sets.txt
  --> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
  --> data/lang/phones/sets.{txt, int} are OK
  
  Checking data/lang/phones/extra_questions.{txt, int} ...
  --> 9 entry/entries in data/lang/phones/extra_questions.txt
  --> data/lang/phones/extra_questions.int corresponds to
  data/lang/phones/extra_questions.txt
  --> data/lang/phones/extra_questions.{txt, int} are OK
  
  Checking data/lang/phones/word_boundary.{txt, int} ...
  --> 210 entry/entries in data/lang/phones/word_boundary.txt
  --> data/lang/phones/word_boundary.int corresponds to
  data/lang/phones/word_boundary.txt
  --> data/lang/phones/word_boundary.{txt, int} are OK
  
  Checking optional_silence.txt ...
  --> reading data/lang/phones/optional_silence.txt
  --> data/lang/phones/optional_silence.txt is OK
  
  Checking disambiguation symbols: #0 and #1
  --> data/lang/phones/disambig.txt has "#0" and "#1"
  --> data/lang/phones/disambig.txt is OK
  
  Checking topo ...
  --> data/lang/topo's nonsilence section is OK
  --> data/lang/topo's silence section is OK
  --> data/lang/topo is OK
  
  Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
  --> data/lang/phones/word_boundary.txt doesn't include disambiguation
  symbols
  --> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and
  silence.txt
  --> data/lang/phones/word_boundary.txt is OK
  
  Checking word_boundary.int and disambig.int
  --> generating a 40 word sequence
  --> resulting phone sequence from L.fst corresponds to the word sequence
  --> L.fst is OK
  --> generating a 29 word sequence
  --> resulting phone sequence from L_disambig.fst corresponds to the word
  sequence
  --> L_disambig.fst is OK
  
  Checking data/lang/oov.{txt, int} ...
  --> 1 entry/entries in data/lang/oov.txt
  --> data/lang/oov.int corresponds to data/lang/oov.txt
  --> data/lang/oov.{txt, int} are OK
  
  --> data/lang/L.fst is olabel sorted
  --> data/lang/L_disambig.fst is olabel sorted
  --> SUCCESS [validating lang directory data/lang]
  
  [Some context]
  I am trying to build an ASR for Brazilian-accented English. I have a few
  corpora of native Brazilian Portuguese and native american English, which I
  intend to use together for training. However, for this test, I am just
  using a BR corpus, with about 8h of data in 6282 files, each file lasts
  about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids
  'f35br16b22k1' of the speakers in the log, they indicate that the audios
  are 22k, but I downsampled the data using SoX and just kept the original id.
  
  Sorry for the very long message!
  
  Final.mdl seems to be empty after training (sourceforge.net)
  
  Sent from sourceforge.net because you indicated interest in <
  https://sourceforge.net/p/kaldi/discussion/1355348/>
  
  To unsubscribe from further messages, please visit <
  https://sourceforge.net/auth/subscriptions/>
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2015-04-29
  
  Determinization errors are usually caused by issues with the lexicon, such
  as words with the same pronunciation, but it's surprising it would happen
  if validate_lang.pl succeeded. Or it could be something broken in your
  arpa file, possibly. Again, validate_lang.pl on the directory your G.fst
  is in should pick up errors, I'm not sure if you ran that script on that
  directory also (e.g. if it's differently named than the directory you ran
  the script on ).
  
  If your final.mdl is empty, it's surprising the train_mono.sh script
  succeeded- errors would usually cause it to die. But possibly it's a soft
  link and you just got confused when measuring the size?
  You may have to look at the timestamps to figure out what went wrong.
  Dan
  
  On Tue, Apr 28, 2015 at 6:48 PM, "Gustavo Mendonça" <gustavomen@users.sf.net
  
  wrote:
  
  Hi everyone,
  
  First of all, thank you for this amazing toolkit!
  
  I am trying to adapt the Librispeech s5 recipe to my own data, but I am
  experiencing some problems. I was able to run all test scripts without any
  problem, however when I ran it with my own data, I noticed that the mkgraph
  was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it
  consumed 60Gb of RAM model and it had not finished even after running for
  33 hours (I had to kill the process due to memory overflow). Then I was
  checking the data and I realized that my exp/mono/final.mdl was empty,
  although exp/mono/tree seems to have some content: 8kb. Could you please
  help me with this? Following, there are some logs and also some info about
  my data.
  
  Thanks in advance!
  
  Gustavo
  
  [Cmds and logs]
  When I call train_mono:
  
  steps/train_mono.sh --boost-silence 1.25 --nj 10 --cmd "$train_cmd" \
  data/train_2kshort data/lang exp/mono
  
  After the 39 passes, I get some warnings, but no errors:
  
  steps/train_mono.sh: Aligning data
  steps/train_mono.sh: Pass 39
  71 warnings in exp/mono/log/acc...log
  480 warnings in exp/mono/log/align...log
  Done
  
  When I grep all acc.log* warnings and remove the duplicates, I end up
  with three warnings:
  
  WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
  for utterance wp-f48br08b11k1-s034
  WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
  for utterance wp-m01br16b22k1-s162
  WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
  for utterance wp-m01br16b22k1-s177
  
  I think I shouldn't worry about these, right? They are bad audios or bad
  transcriptions, but Kaldi is able to filter these.
  
  When I do grep all warnings from align.log*, these 3 files again show
  up:
  
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
  not successfully decode file wp-f48br08b11k1-s034, len = 219
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
  not successfully decode file wp-m01br16b22k1-s162, len = 72
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
  not successfully decode file wp-m01br16b22k1-s177, len = 197
  
  And messages for about 40 files that alignment was tried was a different
  beam number, as in:
  
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
  Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 24
  WARNING
  (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
  Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 40
  
  And a final message about the pdfs for the silence phones being shared.
  
  WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for
  the silence phones may be shared by other phones (note: this probably does
  not matter.)
  
  [My data]
  This is the head and tail for each file in my data/train_2kshort
  directory:
  
  ==> data/train_2kshort/cmvn.scp <==
  TIMIT-DR1-FCJF0
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:271
  TIMIT-DR1-FDML0
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1036
  TIMIT-DR1-FELC0
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:1546
  ...
  wp-m66br08b11k1
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:166786
  wp-m67br08b11k1
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167041
  wp-m68br08b11k1
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/cmvn_train_all.ark:167296
  
  ==> data/train_2kshort/feats.scp <==
  TIMIT-DR1-FCJF0-SA2
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:51435
  TIMIT-DR1-FCJF0-SX127
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:67144
  TIMIT-DR1-FCJF0-SX217
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.1.ark:69267
  ...
  wp-m68br08b11k1-s035
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5652668
  wp-m68br08b11k1-s089
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5709693
  wp-m68br08b11k1-s091
  /home/gustavo/projects/kaldi/models/timit_westpoint/mfcc/raw_mfcc_train_all.10.ark:5712673
  
  ==> data/train_2kshort/spk2utt <==
  TIMIT-DR1-FCJF0 TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0-SX127
  TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0-SX307 TIMIT-DR1-FCJF0-SX37
  TIMIT-DR1-FDML0 TIMIT-DR1-FDML0-SI2075 TIMIT-DR1-FDML0-SX249
  TIMIT-DR1-FELC0 TIMIT-DR1-FELC0-SX216
  ...
  wp-m67br08b11k1 wp-m67br08b11k1-s034 wp-m67br08b11k1-s035
  wp-m67br08b11k1-s089
  wp-m68br08b11k1 wp-m68br08b11k1-s034 wp-m68br08b11k1-s035
  wp-m68br08b11k1-s089 wp-m68br08b11k1-s091
  
  ==> data/train_2kshort/text <==
  TIMIT-DR1-FCJF0-SA2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
  TIMIT-DR1-FCJF0-SX127 THE EMPEROR HAD A MEAN TEMPER
  TIMIT-DR1-FCJF0-SX217 HOW PERMANENT ARE THEIR RECORDS
  ...
  wp-m68br08b11k1-s035 OLHA SÓ
  wp-m68br08b11k1-s089 ATÉ LOGO
  wp-m68br08b11k1-s091 TUDO BEM
  
  ==> data/train_2kshort/utt2spk <==
  TIMIT-DR1-FCJF0-SA2 TIMIT-DR1-FCJF0
  TIMIT-DR1-FCJF0-SX127 TIMIT-DR1-FCJF0
  TIMIT-DR1-FCJF0-SX217 TIMIT-DR1-FCJF0
  ...
  wp-m68br08b11k1-s035 wp-m68br08b11k1
  wp-m68br08b11k1-s089 wp-m68br08b11k1
  wp-m68br08b11k1-s091 wp-m68br08b11k1
  
  ==> data/train_2kshort/wav.scp <==
  TIMIT-DR1-FCJF0-SA2 corpora/TIMIT/WAV/DR1-FCJF0-SA2.WAV
  TIMIT-DR1-FCJF0-SX127 corpora/TIMIT/WAV/DR1-FCJF0-SX127.WAV
  TIMIT-DR1-FCJF0-SX217 corpora/TIMIT/WAV/DR1-FCJF0-SX217.WAV
  ...
  wp-m68br08b11k1-s035 corpora/westpoint/treino/m68br08b11k1/s035.wav
  wp-m68br08b11k1-s089 corpora/westpoint/treino/m68br08b11k1/s089.wav
  wp-m68br08b11k1-s091 corpora/westpoint/treino/m68br08b11k1/s091.wav
  
  This is the output of running validate_lang.pl http://validate_lang.pl
  on data/lang:
  
  Checking data/lang/phones.txt ...
  --> data/lang/phones.txt is OK
  
  Checking words.txt: #0 ...
  --> data/lang/words.txt has "#0"
  --> data/lang/words.txt is OK
  
  Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
  --> silence.txt and nonsilence.txt are disjoint
  --> silence.txt and disambig.txt are disjoint
  --> disambig.txt and nonsilence.txt are disjoint
  --> disjoint property is OK
  
  Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
  --> summation property is OK
  
  Checking data/lang/phones/context_indep.{txt, int, csl} ...
  --> 10 entry/entries in data/lang/phones/context_indep.txt
  --> data/lang/phones/context_indep.int corresponds to
  data/lang/phones/context_indep.txt
  --> data/lang/phones/context_indep.csl corresponds to
  data/lang/phones/context_indep.txt
  --> data/lang/phones/context_indep.{txt, int, csl} are OK
  
  Checking data/lang/phones/disambig.{txt, int, csl} ...
  --> 18 entry/entries in data/lang/phones/disambig.txt
  --> data/lang/phones/disambig.int corresponds to
  data/lang/phones/disambig.txt
  --> data/lang/phones/disambig.csl corresponds to
  data/lang/phones/disambig.txt
  --> data/lang/phones/disambig.{txt, int, csl} are OK
  
  Checking data/lang/phones/nonsilence.{txt, int, csl} ...
  --> 200 entry/entries in data/lang/phones/nonsilence.txt
  --> data/lang/phones/nonsilence.int corresponds to
  data/lang/phones/nonsilence.txt
  --> data/lang/phones/nonsilence.csl corresponds to
  data/lang/phones/nonsilence.txt
  --> data/lang/phones/nonsilence.{txt, int, csl} are OK
  
  Checking data/lang/phones/silence.{txt, int, csl} ...
  --> 10 entry/entries in data/lang/phones/silence.txt
  --> data/lang/phones/silence.int corresponds to
  data/lang/phones/silence.txt
  --> data/lang/phones/silence.csl corresponds to
  data/lang/phones/silence.txt
  --> data/lang/phones/silence.{txt, int, csl} are OK
  
  Checking data/lang/phones/optional_silence.{txt, int, csl} ...
  --> 1 entry/entries in data/lang/phones/optional_silence.txt
  --> data/lang/phones/optional_silence.int corresponds to
  data/lang/phones/optional_silence.txt
  --> data/lang/phones/optional_silence.csl corresponds to
  data/lang/phones/optional_silence.txt
  --> data/lang/phones/optional_silence.{txt, int, csl} are OK
  
  Checking data/lang/phones/roots.{txt, int} ...
  --> 52 entry/entries in data/lang/phones/roots.txt
  --> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
  --> data/lang/phones/roots.{txt, int} are OK
  
  Checking data/lang/phones/sets.{txt, int} ...
  --> 52 entry/entries in data/lang/phones/sets.txt
  --> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
  --> data/lang/phones/sets.{txt, int} are OK
  
  Checking data/lang/phones/extra_questions.{txt, int} ...
  --> 9 entry/entries in data/lang/phones/extra_questions.txt
  --> data/lang/phones/extra_questions.int corresponds to
  data/lang/phones/extra_questions.txt
  --> data/lang/phones/extra_questions.{txt, int} are OK
  
  Checking data/lang/phones/word_boundary.{txt, int} ...
  --> 210 entry/entries in data/lang/phones/word_boundary.txt
  --> data/lang/phones/word_boundary.int corresponds to
  data/lang/phones/word_boundary.txt
  --> data/lang/phones/word_boundary.{txt, int} are OK
  
  Checking optional_silence.txt ...
  --> reading data/lang/phones/optional_silence.txt
  --> data/lang/phones/optional_silence.txt is OK
  
  Checking disambiguation symbols: #0 and #1
  --> data/lang/phones/disambig.txt has "#0" and "#1"
  --> data/lang/phones/disambig.txt is OK
  
  Checking topo ...
  --> data/lang/topo's nonsilence section is OK
  --> data/lang/topo's silence section is OK
  --> data/lang/topo is OK
  
  Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
  --> data/lang/phones/word_boundary.txt doesn't include disambiguation
  symbols
  --> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and
  silence.txt
  --> data/lang/phones/word_boundary.txt is OK
  
  Checking word_boundary.int and disambig.int
  --> generating a 40 word sequence
  --> resulting phone sequence from L.fst corresponds to the word sequence
  --> L.fst is OK
  --> generating a 29 word sequence
  --> resulting phone sequence from L_disambig.fst corresponds to the word
  sequence
  --> L_disambig.fst is OK
  
  Checking data/lang/oov.{txt, int} ...
  --> 1 entry/entries in data/lang/oov.txt
  --> data/lang/oov.int corresponds to data/lang/oov.txt
  --> data/lang/oov.{txt, int} are OK
  
  --> data/lang/L.fst is olabel sorted
  --> data/lang/L_disambig.fst is olabel sorted
  --> SUCCESS [validating lang directory data/lang]
  
  [Some context]
  I am trying to build an ASR for Brazilian-accented English. I have a few
  corpora of native Brazilian Portuguese and native american English, which I
  intend to use together for training. However, for this test, I am just
  using a BR corpus, with about 8h of data in 6282 files, each file lasts
  about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids
  'f35br16b22k1' of the speakers in the log, they indicate that the audios
  are 22k, but I downsampled the data using SoX and just kept the original id.
  
  Sorry for the very long message!
  
  Final.mdl seems to be empty after training
  https://sourceforge.net/p/kaldi/discussion/1355348/thread/1ea760d8/?limit=25#5246
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355348/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gustavo Mendonça - 2015-04-30

Hi Jan and Dan,

Thank you very much for your help! I will check each of the issues you raised.

It is good to know that a 13Mb LM should take just a few minutes to process. I was basically waiting a day each time I run the script it! The order of the LM is not high, I am using standard trigrams, so I assume there would be no problem.

About the final.mdl, I am so sorry about it. I feel dumb, it was indeed a soft link and I did not notice. :-)

I will double check my lexicon and LM to try to find the problem.

Thanks again!

Last edit: Gustavo Mendonça 2015-04-30

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jan "yenda" Trmal - 2015-05-01
  
  Just a note: when I say "a few minutes", it's not something that could be
  guaranteed. Depends on the complexity of the LM, especially on how many
  alternative paths is necessary to track during determinization (I.think)
  Especially when you manipulate or create the LM artifically, these things
  might occur.
  y.
  
  On Thu, Apr 30, 2015 at 7:56 PM, "Gustavo Mendonça" <gustavomen@users.sf.net
  
  wrote:
  
  Hi Jan and Dan,
  
  Thank you very much for your help! I will check each of the issues you
  raised.
  
  It is good to know that a 13Mb LM should take just a few minutes to
  process. I was basically waiting a day each time I run the script it! The
  order of the LM is not high, I am using standard trigrams, so I assume
  there would be no problem.
  
  About the final.mdl, I am so sorry about it. I feel dumb, it was indeed a
  soft link and I did not notice. :-)
  
  I will double check my lexicon and LM to try to find the problem.
  
  Thanks again!
  
  Final.mdl seems to be empty after training (sourceforge.net)
  
  Sent from sourceforge.net because you indicated interest in <
  https://sourceforge.net/p/kaldi/discussion/1355348/>
  
  To unsubscribe from further messages, please visit <
  https://sourceforge.net/auth/subscriptions/>
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Final.mdl seems to be empty after training

Forums

Help

Final.mdl seems to be empty after training document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Gustavo

Sorry for the very long message!

Final.mdl seems to be empty after training