Re: [Kaldi-users] word timing information

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

It's possible that your word_boundary.txt is OK.
You could try to get the one best from the lattice using lattice-1best
(I think), get the phone sequence from the 1-best lattice using
lat-to-phones (I think), doing output in text form using ark,t:- and
then get the text form of the phone-level lattice using
utils/int2sym.pl -f 3 g300_lang/phones.txt (or something similar), and
see if the sequence of phonemes looks reasonable for the word sequence
you have.

Dan

On Thu, Jul 11, 2013 at 1:00 AM, Nathan Dunn <nd...@me...> wrote:
>
> I think that was part of it.   I fixed one problem with the oov.txt / oov.int
>
> I'll try to recompile that bug fix and see if that works.   Its possible that I'm creating word_boundaries incorrectly.  How many entries would you expect to get (I am getting 315).   I wonder if I am using word_boundaries for the wrong set of phones . .
>
> Checking g300_lang/phones.txt ...
> --> g300_lang/phones.txt is OK
>
> Checking words.txt: #0 ...
> --> g300_lang/words.txt has "#0"
> --> g300_lang/words.txt is OK
>
> Checking g300_lang/phones/context_indep.{txt, int, csl} ...
> --> 75 entry/entries in g300_lang/phones/context_indep.txt
> --> g300_lang/phones/context_indep.int corresponds to g300_lang/phones/context_indep.txt
> --> g300_lang/phones/context_indep.csl corresponds to g300_lang/phones/context_indep.txt
> --> g300_lang/phones/context_indep.{txt, int, csl} are OK
>
> Checking g300_lang/phones/disambig.{txt, int, csl} ...
> --> 28 entry/entries in g300_lang/phones/disambig.txt
> --> g300_lang/phones/disambig.int corresponds to g300_lang/phones/disambig.txt
> --> g300_lang/phones/disambig.csl corresponds to g300_lang/phones/disambig.txt
> --> g300_lang/phones/disambig.{txt, int, csl} are OK
>
> Checking g300_lang/phones/nonsilence.{txt, int, csl} ...
> --> 240 entry/entries in g300_lang/phones/nonsilence.txt
> --> g300_lang/phones/nonsilence.int corresponds to g300_lang/phones/nonsilence.txt
> --> g300_lang/phones/nonsilence.csl corresponds to g300_lang/phones/nonsilence.txt
> --> g300_lang/phones/nonsilence.{txt, int, csl} are OK
>
> Checking g300_lang/phones/silence.{txt, int, csl} ...
> --> 75 entry/entries in g300_lang/phones/silence.txt
> --> g300_lang/phones/silence.int corresponds to g300_lang/phones/silence.txt
> --> g300_lang/phones/silence.csl corresponds to g300_lang/phones/silence.txt
> --> g300_lang/phones/silence.{txt, int, csl} are OK
>
> Checking g300_lang/phones/optional_silence.{txt, int, csl} ...
> --> 1 entry/entries in g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.int corresponds to g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.csl corresponds to g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.{txt, int, csl} are OK
>
> Checking g300_lang/phones/extra_questions.{txt, int} ...
> --> ERROR: fail to open g300_lang/phones/extra_questions.txt
>
> Checking g300_lang/phones/roots.{txt, int} ...
> --> 75 entry/entries in g300_lang/phones/roots.txt
> --> g300_lang/phones/roots.int corresponds to g300_lang/phones/roots.txt
> --> g300_lang/phones/roots.{txt, int} are OK
>
> Checking g300_lang/phones/sets.{txt, int} ...
> --> ERROR: fail to open g300_lang/phones/sets.int
>
> Checking g300_lang/phones/word_boundary.{txt, int} ...
> --> 315 entry/entries in g300_lang/phones/word_boundary.txt
> --> g300_lang/phones/word_boundary.int corresponds to g300_lang/phones/word_boundary.txt
> --> g300_lang/phones/word_boundary.{txt, int} are OK
>
> Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
> --> silence.txt and nonsilence.txt are disjoint
> --> silence.txt and disambig.txt are disjoint
> --> disambig.txt and nonsilence.txt are disjoint
> --> disjoint property is OK
>
> Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
> --> summation property is OK
>
> Checking optional_silence.txt ...
> --> reading g300_lang/phones/optional_silence.txt
> --> g300_lang/phones/optional_silence.txt is OK
>
> Checking disambiguation symbols: #0 and #1
> --> g300_lang/phones/disambig.txt has "#0" and "#1"
> --> g300_lang/phones/disambig.txt is OK
>
> Checking topo ...
> --> g300_lang/topo's nonsilence section is OK
> --> g300_lang/topo's silence section is OK
> --> g300_lang/topo is OK
>
> Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
> --> g300_lang/phones/word_boundary.txt doesn't include disambiguation symbols
> --> g300_lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
> --> g300_lang/phones/word_boundary.txt is OK
> --> checking L.fst and L_disambig.fst...
> --> generating a 46 words sequence
> --> resulting phone sequence from L.fst corresponds to the word sequence
> --> L.fst is OK
> --> resulting phone sequence from L_disambig.fst corresponds to the word sequence
> --> L_disambig.fst is OK
>
> Checking g300_lang/oov.{txt, int} ...
> --> 1 entry/entries in g300_lang/oov.txt
> --> g300_lang/oov.int corresponds to g300_lang/oov.txt
> --> g300_lang/oov.{txt, int} are OK
>
>
>
> Nathan
>
> On Jul 10, 2013, at 9:12 PM, Daniel Povey wrote:
>
>> OK-- so the word-alignment seems to have failed.  Generally that is
>> because of invalid word-boundary information.  That file is indexed by
>> phones, not words.  Issues can include a mismatch in phone set; words
>> that don't have any phones in them; or phones that have only one state
>> in their topology (this is a bug that was recently fixed, those should
>> work now if you update and recompile).
>> That program should not generally output any warnings, if all is OK.
>> Try to use the program utils/validate_lang.pl to make sure your
>> g300_lang/ directory is OK.
>>
>> Dan
>>
>>
>> On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>>>
>>> Sorry, and it ends with this:
>>>
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 98.cut1
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 98.cut2
>>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
>>> path, 0 had errors.
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 98.cut3
>>> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
>>> aligned 0 lattices; 132 had errors.
>>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
>>> to ctm format; 0 had errors.
>>> ndunn:childspeech%
>>>
>>>
>>> Nathan
>>>
>>> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>>>
>>>
>>> The std err output is this:
>>>
>>> ndunn:childspeech% lattice-1best "ark:gunzip -c
>>> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
>>> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
>>> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
>>> exp/tri2a/ctm2/output.txt
>>> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
>>> ark:-
>>> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
>>> ark:- ark:-
>>> nbest-to-ctm ark:- -
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 02.cut1
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 02.cut2
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 02.cut3
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 03.cut1
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 03.cut2
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>> partial lattice for 03.cut3
>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>
>>>
>>> Nathan Dunn, Ph.D.
>>> Scientific Programer
>>> College of Arts and Science IT
>>> 541-221-2418
>>> nd...@ca...
>>>
>>>
>>>
>>> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>>>
>>> Can you provide the logging output, at least some representative lines
>>> from it.  Are there any warnings?
>>> Dan
>>>
>>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>>> Communication and Updates <kal...@li...> wrote:
>>>
>>>
>>> I'm trying to get word timing information out of a successfully trained
>>> language model that I've already been able to successfully decode with
>>> following these instructions.
>>>
>>>
>>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>>>
>>>
>>> This is command I've run:
>>>
>>>
>>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
>>> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
>>> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
>>> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>>>
>>>
>>>
>>> The problem is that I only have one entry per transcript (these transcripts
>>> are 1 minute long) and I don't see any bearing on this relative to the word
>>> input.    the
>>>
>>>
>>> 02.cut1 1 0.00 67.11 I
>>>
>>> 02.cut2 1 0.00 62.44 HIS
>>>
>>> 02.cut3 1 0.00 65.76 MOUNT
>>>
>>> 03.cut1 1 0.00 62.62 I
>>>
>>> 03.cut2 1 0.00 62.41 WHO
>>>
>>> 03.cut3 1 0.00 63.72 I
>>>
>>> 06.cut1 1 0.00 62.13 STANDING
>>>
>>> 06.cut2 1 0.00 57.95 A
>>>
>>> 06.cut3 1 0.00 66.78 I
>>>
>>> . . .
>>>
>>> What I want is the things for each word:
>>>
>>> 02.cut1 1 0.00 43.7 YOU
>>>
>>> 02.cut1 1 81.2 121.3 ARE
>>>
>>> 02.cut1 1 145.4 163.8 STANDING
>>>
>>> . . .
>>>
>>>
>>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>>>
>>> 1 nonword
>>>
>>> 2 begin
>>>
>>> 3 end
>>>
>>> 4 internal
>>>
>>> 5 singleton
>>>
>>> 6 nonword
>>>
>>> 7 begin
>>>
>>> 8 end
>>>
>>> . . .
>>>
>>>
>>>
>>> Any help is much appreciated.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Nathan
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> See everything from the browser to the database with AppDynamics
>>>
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>
>>> Start your free trial of AppDynamics Pro today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>
>>> _______________________________________________
>>>
>>> Kaldi-users mailing list
>>>
>>> Kal...@li...
>>>
>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>
>>>
>>>
>