I am trying to adapt the Librispeech s5 recipe to my own data, but I am experiencing some problems. I was able to run all test scripts without any problem, however when I ran it with my own data, I noticed that the mkgraph was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it consumed 60Gb of RAM model and it had not finished even after running for 33 hours (I had to kill the process due to memory overflow). Then I was checking the data and I realized that my exp/mono/final.mdl was empty, although exp/mono/tree seems to have some content: 8kb. Could you please help me with this? Following, there are some logs and also some info about my data.
After the 39 passes, I get some warnings, but no errors:
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 39
71 warnings in exp/mono/log/acc...log
480 warnings in exp/mono/log/align...log
Done
When I grep all acc.*log warnings and remove the duplicates, I end up with three warnings:
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance wp-f48br08b11k1-s034
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance wp-m01br16b22k1-s162
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment for utterance wp-m01br16b22k1-s177
I think I shouldn't worry about these, right? They are bad audios or bad transcriptions, but Kaldi is able to filter these.
When I do grep all warnings from align.*log, these 3 files again show up:
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file wp-f48br08b11k1-s034, len = 219
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file wp-m01br16b22k1-s162, len = 72
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file wp-m01br16b22k1-s177, len = 197
And messages for about 40 files that alignment was tried was a different beam number, as in:
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 24
WARNING (gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 40
And a final message about the pdfs for the silence phones being shared.
WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
[My data]
This is the head and tail for each file in my data/train_2kshort directory:
==> data/train_2kshort/text <==
TIMIT-DR1-FCJF0-SA2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
TIMIT-DR1-FCJF0-SX127 THE EMPEROR HAD A MEAN TEMPER
TIMIT-DR1-FCJF0-SX217 HOW PERMANENT ARE THEIR RECORDS
...
wp-m68br08b11k1-s035 OLHA SÓ
wp-m68br08b11k1-s089 ATÉ LOGO
wp-m68br08b11k1-s091 TUDO BEM
This is the output of running validate_lang.pl on data/lang:
Checking data/lang/phones.txt ...
--> data/lang/phones.txt is OK
Checking words.txt: #0 ...
--> data/lang/words.txt has "#0"
--> data/lang/words.txt is OK
Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK
Checking data/lang/phones/context_indep.{txt, int, csl} ...
--> 10 entry/entries in data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.{txt, int, csl} are OK
Checking data/lang/phones/disambig.{txt, int, csl} ...
--> 18 entry/entries in data/lang/phones/disambig.txt
--> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.{txt, int, csl} are OK
Checking data/lang/phones/nonsilence.{txt, int, csl} ...
--> 200 entry/entries in data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.{txt, int, csl} are OK
Checking data/lang/phones/silence.{txt, int, csl} ...
--> 10 entry/entries in data/lang/phones/silence.txt
--> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.{txt, int, csl} are OK
Checking data/lang/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.{txt, int, csl} are OK
Checking data/lang/phones/roots.{txt, int} ...
--> 52 entry/entries in data/lang/phones/roots.txt
--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
--> data/lang/phones/roots.{txt, int} are OK
Checking data/lang/phones/sets.{txt, int} ...
--> 52 entry/entries in data/lang/phones/sets.txt
--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
--> data/lang/phones/sets.{txt, int} are OK
Checking data/lang/phones/extra_questions.{txt, int} ...
--> 9 entry/entries in data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.{txt, int} are OK
Checking data/lang/phones/word_boundary.{txt, int} ...
--> 210 entry/entries in data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.int corresponds to data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.{txt, int} are OK
Checking optional_silence.txt ...
--> reading data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.txt is OK
Checking disambiguation symbols: #0 and #1
--> data/lang/phones/disambig.txt has "#0" and "#1"
--> data/lang/phones/disambig.txt is OK
Checking topo ...
--> data/lang/topo's nonsilence section is OK
--> data/lang/topo's silence section is OK
--> data/lang/topo is OK
Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> data/lang/phones/word_boundary.txt doesn't include disambiguation symbols
--> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> data/lang/phones/word_boundary.txt is OK
Checking word_boundary.int and disambig.int
--> generating a 40 word sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> generating a 29 word sequence
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK
Checking data/lang/oov.{txt, int} ...
--> 1 entry/entries in data/lang/oov.txt
--> data/lang/oov.int corresponds to data/lang/oov.txt
--> data/lang/oov.{txt, int} are OK
--> data/lang/L.fst is olabel sorted
--> data/lang/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang]
[Some context]
I am trying to build an ASR for Brazilian-accented English. I have a few corpora of native Brazilian Portuguese and native american English, which I intend to use together for training. However, for this test, I am just using a BR corpus, with about 8h of data in 6282 files, each file lasts about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids 'f35br16b22k1' of the speakers in the log, they indicate that the audios are 22k, but I downsampled the data using SoX and just kept the original id.
Sorry for the very long message!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Gustavo, these seem like either two separate issues or the problems with
ARPA conversion are results of the faulty AM.
While the compilation and determinization of the decoding graph can take
significant amount of time, I think for 13MB LM these should be relatively
quick (minutes) -- of course depends on many things (order of the LM, for
example).
So first, try to figure out why the problems during training of monophone
system. The content of the data dirs looks reasonable.
For example, you should check, how many utterances had been omited -- it's
ok to have some, but it's suspicious, if you will see some large number
(even if for monophone system, even quite a large number can happen)
y.
On Tue, Apr 28, 2015 at 6:48 PM, "Gustavo Mendonça" <gustavomen@users.sf.net
wrote:
Hi everyone,
First of all, thank you for this amazing toolkit!
I am trying to adapt the Librispeech s5 recipe to my own data, but I am
experiencing some problems. I was able to run all test scripts without any
problem, however when I ran it with my own data, I noticed that the mkgraph
was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it
consumed 60Gb of RAM model and it had not finished even after running for
33 hours (I had to kill the process due to memory overflow). Then I was
checking the data and I realized that my exp/mono/final.mdl was empty,
although exp/mono/tree seems to have some content: 8kb. Could you please
help me with this? Following, there are some logs and also some info about
my data.
After the 39 passes, I get some warnings, but no errors:
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 39
71 warnings in exp/mono/log/acc...log
480 warnings in exp/mono/log/align...log
Done
When I grep all acc.*log warnings and remove the duplicates, I end up
with three warnings:
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
for utterance wp-f48br08b11k1-s034
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
for utterance wp-m01br16b22k1-s162
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
for utterance wp-m01br16b22k1-s177
I think I shouldn't worry about these, right? They are bad audios or bad
transcriptions, but Kaldi is able to filter these.
When I do grep all warnings from align.*log, these 3 files again show
up:
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
not successfully decode file wp-f48br08b11k1-s034, len = 219
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
not successfully decode file wp-m01br16b22k1-s162, len = 72
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
not successfully decode file wp-m01br16b22k1-s177, len = 197
And messages for about 40 files that alignment was tried was a different
beam number, as in:
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 24
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 40
And a final message about the pdfs for the silence phones being shared.
WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for
the silence phones may be shared by other phones (note: this probably does
not matter.)
[My data]
This is the head and tail for each file in my data/train_2kshort
directory:
==> data/train_2kshort/text <==
TIMIT-DR1-FCJF0-SA2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
TIMIT-DR1-FCJF0-SX127 THE EMPEROR HAD A MEAN TEMPER
TIMIT-DR1-FCJF0-SX217 HOW PERMANENT ARE THEIR RECORDS
...
wp-m68br08b11k1-s035 OLHA SÓ
wp-m68br08b11k1-s089 ATÉ LOGO
wp-m68br08b11k1-s091 TUDO BEM
This is the output of running validate_lang.pl on data/lang:
Checking data/lang/phones.txt ...
--> data/lang/phones.txt is OK
Checking words.txt: #0 ...
--> data/lang/words.txt has "#0"
--> data/lang/words.txt is OK
Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK
Checking data/lang/phones/context_indep.{txt, int, csl} ...
--> 10 entry/entries in data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.int corresponds to
data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.csl corresponds to
data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.{txt, int, csl} are OK
Checking data/lang/phones/disambig.{txt, int, csl} ...
--> 18 entry/entries in data/lang/phones/disambig.txt
--> data/lang/phones/disambig.int corresponds to
data/lang/phones/disambig.txt
--> data/lang/phones/disambig.csl corresponds to
data/lang/phones/disambig.txt
--> data/lang/phones/disambig.{txt, int, csl} are OK
Checking data/lang/phones/nonsilence.{txt, int, csl} ...
--> 200 entry/entries in data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.int corresponds to
data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.csl corresponds to
data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.{txt, int, csl} are OK
Checking data/lang/phones/silence.{txt, int, csl} ...
--> 10 entry/entries in data/lang/phones/silence.txt
--> data/lang/phones/silence.int corresponds to
data/lang/phones/silence.txt
--> data/lang/phones/silence.csl corresponds to
data/lang/phones/silence.txt
--> data/lang/phones/silence.{txt, int, csl} are OK
Checking data/lang/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.int corresponds to
data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.csl corresponds to
data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.{txt, int, csl} are OK
Checking data/lang/phones/roots.{txt, int} ...
--> 52 entry/entries in data/lang/phones/roots.txt
--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
--> data/lang/phones/roots.{txt, int} are OK
Checking data/lang/phones/sets.{txt, int} ...
--> 52 entry/entries in data/lang/phones/sets.txt
--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
--> data/lang/phones/sets.{txt, int} are OK
Checking data/lang/phones/extra_questions.{txt, int} ...
--> 9 entry/entries in data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.int corresponds to
data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.{txt, int} are OK
Checking data/lang/phones/word_boundary.{txt, int} ...
--> 210 entry/entries in data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.int corresponds to
data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.{txt, int} are OK
Checking optional_silence.txt ...
--> reading data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.txt is OK
Checking disambiguation symbols: #0 and #1
--> data/lang/phones/disambig.txt has "#0" and "#1"
--> data/lang/phones/disambig.txt is OK
Checking topo ...
--> data/lang/topo's nonsilence section is OK
--> data/lang/topo's silence section is OK
--> data/lang/topo is OK
Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> data/lang/phones/word_boundary.txt doesn't include disambiguation
symbols
--> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and
silence.txt
--> data/lang/phones/word_boundary.txt is OK
Checking word_boundary.int and disambig.int
--> generating a 40 word sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> generating a 29 word sequence
--> resulting phone sequence from L_disambig.fst corresponds to the word
sequence
--> L_disambig.fst is OK
Checking data/lang/oov.{txt, int} ...
--> 1 entry/entries in data/lang/oov.txt
--> data/lang/oov.int corresponds to data/lang/oov.txt
--> data/lang/oov.{txt, int} are OK
--> data/lang/L.fst is olabel sorted
--> data/lang/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang]
[Some context]
I am trying to build an ASR for Brazilian-accented English. I have a few
corpora of native Brazilian Portuguese and native american English, which I
intend to use together for training. However, for this test, I am just
using a BR corpus, with about 8h of data in 6282 files, each file lasts
about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids
'f35br16b22k1' of the speakers in the log, they indicate that the audios
are 22k, but I downsampled the data using SoX and just kept the original id.
Determinization errors are usually caused by issues with the lexicon, such
as words with the same pronunciation, but it's surprising it would happen
if validate_lang.pl succeeded. Or it could be something broken in your
arpa file, possibly. Again, validate_lang.pl on the directory your G.fst
is in should pick up errors, I'm not sure if you ran that script on that
directory also (e.g. if it's differently named than the directory you ran
the script on ).
If your final.mdl is empty, it's surprising the train_mono.sh script
succeeded- errors would usually cause it to die. But possibly it's a soft
link and you just got confused when measuring the size?
You may have to look at the timestamps to figure out what went wrong.
Dan
On Tue, Apr 28, 2015 at 6:48 PM, "Gustavo Mendonça" <gustavomen@users.sf.net
wrote:
Hi everyone,
First of all, thank you for this amazing toolkit!
I am trying to adapt the Librispeech s5 recipe to my own data, but I am
experiencing some problems. I was able to run all test scripts without any
problem, however when I ran it with my own data, I noticed that the mkgraph
was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it
consumed 60Gb of RAM model and it had not finished even after running for
33 hours (I had to kill the process due to memory overflow). Then I was
checking the data and I realized that my exp/mono/final.mdl was empty,
although exp/mono/tree seems to have some content: 8kb. Could you please
help me with this? Following, there are some logs and also some info about
my data.
After the 39 passes, I get some warnings, but no errors:
steps/train_mono.sh: Aligning data
steps/train_mono.sh: Pass 39
71 warnings in exp/mono/log/acc...log
480 warnings in exp/mono/log/align...log
Done
When I grep all acc.log* warnings and remove the duplicates, I end up
with three warnings:
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
for utterance wp-f48br08b11k1-s034
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
for utterance wp-m01br16b22k1-s162
WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) No alignment
for utterance wp-m01br16b22k1-s177
I think I shouldn't worry about these, right? They are bad audios or bad
transcriptions, but Kaldi is able to filter these.
When I do grep all warnings from align.log*, these 3 files again show
up:
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
not successfully decode file wp-f48br08b11k1-s034, len = 219
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
not successfully decode file wp-m01br16b22k1-s162, len = 72
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did
not successfully decode file wp-m01br16b22k1-s177, len = 197
And messages for about 40 files that alignment was tried was a different
beam number, as in:
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 24
WARNING
(gmm-align-compiled:AlignUtteranceWrapper():decoder-wrappers.cc:466)
Retrying utterance TIMIT-DR1-FCJF0-SA2 with beam 40
And a final message about the pdfs for the silence phones being shared.
WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for
the silence phones may be shared by other phones (note: this probably does
not matter.)
[My data]
This is the head and tail for each file in my data/train_2kshort
directory:
==> data/train_2kshort/text <==
TIMIT-DR1-FCJF0-SA2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
TIMIT-DR1-FCJF0-SX127 THE EMPEROR HAD A MEAN TEMPER
TIMIT-DR1-FCJF0-SX217 HOW PERMANENT ARE THEIR RECORDS
...
wp-m68br08b11k1-s035 OLHA SÓ
wp-m68br08b11k1-s089 ATÉ LOGO
wp-m68br08b11k1-s091 TUDO BEM
Checking data/lang/phones.txt ...
--> data/lang/phones.txt is OK
Checking words.txt: #0 ...
--> data/lang/words.txt has "#0"
--> data/lang/words.txt is OK
Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK
Checking data/lang/phones/context_indep.{txt, int, csl} ...
--> 10 entry/entries in data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.int corresponds to
data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.csl corresponds to
data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.{txt, int, csl} are OK
Checking data/lang/phones/disambig.{txt, int, csl} ...
--> 18 entry/entries in data/lang/phones/disambig.txt
--> data/lang/phones/disambig.int corresponds to
data/lang/phones/disambig.txt
--> data/lang/phones/disambig.csl corresponds to
data/lang/phones/disambig.txt
--> data/lang/phones/disambig.{txt, int, csl} are OK
Checking data/lang/phones/nonsilence.{txt, int, csl} ...
--> 200 entry/entries in data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.int corresponds to
data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.csl corresponds to
data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.{txt, int, csl} are OK
Checking data/lang/phones/silence.{txt, int, csl} ...
--> 10 entry/entries in data/lang/phones/silence.txt
--> data/lang/phones/silence.int corresponds to
data/lang/phones/silence.txt
--> data/lang/phones/silence.csl corresponds to
data/lang/phones/silence.txt
--> data/lang/phones/silence.{txt, int, csl} are OK
Checking data/lang/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.int corresponds to
data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.csl corresponds to
data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.{txt, int, csl} are OK
Checking data/lang/phones/roots.{txt, int} ...
--> 52 entry/entries in data/lang/phones/roots.txt
--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
--> data/lang/phones/roots.{txt, int} are OK
Checking data/lang/phones/sets.{txt, int} ...
--> 52 entry/entries in data/lang/phones/sets.txt
--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
--> data/lang/phones/sets.{txt, int} are OK
Checking data/lang/phones/extra_questions.{txt, int} ...
--> 9 entry/entries in data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.int corresponds to
data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.{txt, int} are OK
Checking data/lang/phones/word_boundary.{txt, int} ...
--> 210 entry/entries in data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.int corresponds to
data/lang/phones/word_boundary.txt
--> data/lang/phones/word_boundary.{txt, int} are OK
Checking optional_silence.txt ...
--> reading data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.txt is OK
Checking disambiguation symbols: #0 and #1
--> data/lang/phones/disambig.txt has "#0" and "#1"
--> data/lang/phones/disambig.txt is OK
Checking topo ...
--> data/lang/topo's nonsilence section is OK
--> data/lang/topo's silence section is OK
--> data/lang/topo is OK
Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> data/lang/phones/word_boundary.txt doesn't include disambiguation
symbols
--> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and
silence.txt
--> data/lang/phones/word_boundary.txt is OK
Checking word_boundary.int and disambig.int
--> generating a 40 word sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> generating a 29 word sequence
--> resulting phone sequence from L_disambig.fst corresponds to the word
sequence
--> L_disambig.fst is OK
Checking data/lang/oov.{txt, int} ...
--> 1 entry/entries in data/lang/oov.txt
--> data/lang/oov.int corresponds to data/lang/oov.txt
--> data/lang/oov.{txt, int} are OK
--> data/lang/L.fst is olabel sorted
--> data/lang/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang]
[Some context]
I am trying to build an ASR for Brazilian-accented English. I have a few
corpora of native Brazilian Portuguese and native american English, which I
intend to use together for training. However, for this test, I am just
using a BR corpus, with about 8h of data in 6282 files, each file lasts
about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids
'f35br16b22k1' of the speakers in the log, they indicate that the audios
are 22k, but I downsampled the data using SoX and just kept the original id.
Thank you very much for your help! I will check each of the issues you raised.
It is good to know that a 13Mb LM should take just a few minutes to process. I was basically waiting a day each time I run the script it! The order of the LM is not high, I am using standard trigrams, so I assume there would be no problem.
About the final.mdl, I am so sorry about it. I feel dumb, it was indeed a soft link and I did not notice. :-)
I will double check my lexicon and LM to try to find the problem.
Thanks again!
Last edit: Gustavo Mendonça 2015-04-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just a note: when I say "a few minutes", it's not something that could be
guaranteed. Depends on the complexity of the LM, especially on how many
alternative paths is necessary to track during determinization (I.think)
Especially when you manipulate or create the LM artifically, these things
might occur.
y.
On Thu, Apr 30, 2015 at 7:56 PM, "Gustavo Mendonça" <gustavomen@users.sf.net
wrote:
Hi Jan and Dan,
Thank you very much for your help! I will check each of the issues you
raised.
It is good to know that a 13Mb LM should take just a few minutes to
process. I was basically waiting a day each time I run the script it! The
order of the LM is not high, I am using standard trigrams, so I assume
there would be no problem.
About the final.mdl, I am so sorry about it. I feel dumb, it was indeed a
soft link and I did not notice. :-)
I will double check my lexicon and LM to try to find the problem.
Hi everyone,
First of all, thank you for this amazing toolkit!
I am trying to adapt the Librispeech s5 recipe to my own data, but I am experiencing some problems. I was able to run all test scripts without any problem, however when I ran it with my own data, I noticed that the mkgraph was taking too long for exp/mono. For a 13 Mb pruned ARPA model, it consumed 60Gb of RAM model and it had not finished even after running for 33 hours (I had to kill the process due to memory overflow). Then I was checking the data and I realized that my exp/mono/final.mdl was empty, although exp/mono/tree seems to have some content: 8kb. Could you please help me with this? Following, there are some logs and also some info about my data.
Thanks in advance!
Gustavo
[Cmds and logs]
When I call train_mono:
steps/train_mono.sh --boost-silence 1.25 --nj 10 --cmd "$train_cmd" \
data/train_2kshort data/lang exp/mono
After the 39 passes, I get some warnings, but no errors:
When I grep all acc.*log warnings and remove the duplicates, I end up with three warnings:
I think I shouldn't worry about these, right? They are bad audios or bad transcriptions, but Kaldi is able to filter these.
When I do grep all warnings from align.*log, these 3 files again show up:
And messages for about 40 files that alignment was tried was a different beam number, as in:
And a final message about the pdfs for the silence phones being shared.
[My data]
This is the head and tail for each file in my data/train_2kshort directory:
This is the output of running validate_lang.pl on data/lang:
[Some context]
I am trying to build an ASR for Brazilian-accented English. I have a few corpora of native Brazilian Portuguese and native american English, which I intend to use together for training. However, for this test, I am just using a BR corpus, with about 8h of data in 6282 files, each file lasts about 7-10 seconds. All files are 16kHz 16bit, please disregard the ids 'f35br16b22k1' of the speakers in the log, they indicate that the audios are 22k, but I downsampled the data using SoX and just kept the original id.
Sorry for the very long message!
Gustavo, these seem like either two separate issues or the problems with
ARPA conversion are results of the faulty AM.
While the compilation and determinization of the decoding graph can take
significant amount of time, I think for 13MB LM these should be relatively
quick (minutes) -- of course depends on many things (order of the LM, for
example).
So first, try to figure out why the problems during training of monophone
system. The content of the data dirs looks reasonable.
For example, you should check, how many utterances had been omited -- it's
ok to have some, but it's suspicious, if you will see some large number
(even if for monophone system, even quite a large number can happen)
y.
On Tue, Apr 28, 2015 at 6:48 PM, "Gustavo Mendonça" <gustavomen@users.sf.net
Determinization errors are usually caused by issues with the lexicon, such
as words with the same pronunciation, but it's surprising it would happen
if validate_lang.pl succeeded. Or it could be something broken in your
arpa file, possibly. Again, validate_lang.pl on the directory your G.fst
is in should pick up errors, I'm not sure if you ran that script on that
directory also (e.g. if it's differently named than the directory you ran
the script on ).
If your final.mdl is empty, it's surprising the train_mono.sh script
succeeded- errors would usually cause it to die. But possibly it's a soft
link and you just got confused when measuring the size?
You may have to look at the timestamps to figure out what went wrong.
Dan
On Tue, Apr 28, 2015 at 6:48 PM, "Gustavo Mendonça" <gustavomen@users.sf.net
Hi Jan and Dan,
Thank you very much for your help! I will check each of the issues you raised.
It is good to know that a 13Mb LM should take just a few minutes to process. I was basically waiting a day each time I run the script it! The order of the LM is not high, I am using standard trigrams, so I assume there would be no problem.
About the final.mdl, I am so sorry about it. I feel dumb, it was indeed a soft link and I did not notice. :-)
I will double check my lexicon and LM to try to find the problem.
Thanks again!
Last edit: Gustavo Mendonça 2015-04-30
Just a note: when I say "a few minutes", it's not something that could be
guaranteed. Depends on the complexity of the LM, especially on how many
alternative paths is necessary to track during determinization (I.think)
Especially when you manipulate or create the LM artifically, these things
might occur.
y.
On Thu, Apr 30, 2015 at 7:56 PM, "Gustavo Mendonça" <gustavomen@users.sf.net