Re: [Kaldi-users] word timing information

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Something is definitely wrong there.  You shouldn't see something with
an _E suffix right at the start like that, if it's the only phone in a
word it should have the singleton _S suffix, or if it doesn't have a
word symbol it should have no suffix at all.  I suspect you may have
built the system with a different phone set, or the word-boundary info
is very wrong.
Dan

On Thu, Jul 11, 2013 at 7:03 PM, Nathan Dunn <nd...@ca...> wrote:
>
> Alright, I updated the output, which looks closer to what I want, but I'm a little unclear how to pull stuff out of this:
>
> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|" ark:-  | lattice-to-phone-lattice exp/tri2a/final.mdl ark:- ark,t:- | utils/int2sym.pl -f 3 g300_lang/phones.txt
>
>
>
>
> The first few lines look like this where "02.cut1-1" is the name of the transcript:
>
> 02.cut1-1
> 0 1 SEE_TRANSCRIPT_E 14.9888,31091.3,2960_2962_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961_2961
> 1 2 END_CROSSTALK_NOISE_E 0,0,656_655_655_655_655_655_655_655_655_655_655_655_655_655_706_705_705_705
> 2 3 SEE_TRANSCRIPT_E 0,0,2960_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2962
> 3 4 END_MICROPHONE_NOISE_I 3.56562,5210.82,854_853_872
> 4 5 YAWN_B 0,0,114_113_113_113_178_177_177_177_177_177_177_177_177_177
> 5 6 END_YAWN_B 0,0,2008_2007_2007_2074_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073_2073
> 6 7 SEE_TRANSCRIPT_E 11.9189,5022.05,2960_2959_2959_2962_2961_2961
> 7 8 END_NOISE_B 0,0,952_951_951_951_951_996_995_995_995_995_995_995_995
> 8 9 END_YAWN_B 0,0,1958_1957_1957_2036_2035_2035_2035
> 9 10 END_HUMAN_NOISE 0,0,1540_1539_1539_1539_1539_1539_1539_1539_1594_1593_1593
> 10 11 SEE_TRANSCRIPT_E 0,0,2960_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2959_2962
> 11 12 END_MICROPHONE_NOISE_I 7.45918,2101.25,854_872
>
>
> Nathan
>
> On Jul 10, 2013, at 10:10 PM, Daniel Povey wrote:
>
>> It's possible that your word_boundary.txt is OK.
>> You could try to get the one best from the lattice using lattice-1best
>> (I think), get the phone sequence from the 1-best lattice using
>> lat-to-phones (I think), doing output in text form using ark,t:- and
>> then get the text form of the phone-level lattice using
>> utils/int2sym.pl -f 3 g300_lang/phones.txt (or something similar), and
>> see if the sequence of phonemes looks reasonable for the word sequence
>> you have.
>>
>> Dan
>>
>>
>> On Thu, Jul 11, 2013 at 1:00 AM, Nathan Dunn <nd...@me...> wrote:
>>>
>>> I think that was part of it.   I fixed one problem with the oov.txt / oov.int
>>>
>>> I'll try to recompile that bug fix and see if that works.   Its possible that I'm creating word_boundaries incorrectly.  How many entries would you expect to get (I am getting 315).   I wonder if I am using word_boundaries for the wrong set of phones . .
>>>
>>> Checking g300_lang/phones.txt ...
>>> --> g300_lang/phones.txt is OK
>>>
>>> Checking words.txt: #0 ...
>>> --> g300_lang/words.txt has "#0"
>>> --> g300_lang/words.txt is OK
>>>
>>> Checking g300_lang/phones/context_indep.{txt, int, csl} ...
>>> --> 75 entry/entries in g300_lang/phones/context_indep.txt
>>> --> g300_lang/phones/context_indep.int corresponds to g300_lang/phones/context_indep.txt
>>> --> g300_lang/phones/context_indep.csl corresponds to g300_lang/phones/context_indep.txt
>>> --> g300_lang/phones/context_indep.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/disambig.{txt, int, csl} ...
>>> --> 28 entry/entries in g300_lang/phones/disambig.txt
>>> --> g300_lang/phones/disambig.int corresponds to g300_lang/phones/disambig.txt
>>> --> g300_lang/phones/disambig.csl corresponds to g300_lang/phones/disambig.txt
>>> --> g300_lang/phones/disambig.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/nonsilence.{txt, int, csl} ...
>>> --> 240 entry/entries in g300_lang/phones/nonsilence.txt
>>> --> g300_lang/phones/nonsilence.int corresponds to g300_lang/phones/nonsilence.txt
>>> --> g300_lang/phones/nonsilence.csl corresponds to g300_lang/phones/nonsilence.txt
>>> --> g300_lang/phones/nonsilence.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/silence.{txt, int, csl} ...
>>> --> 75 entry/entries in g300_lang/phones/silence.txt
>>> --> g300_lang/phones/silence.int corresponds to g300_lang/phones/silence.txt
>>> --> g300_lang/phones/silence.csl corresponds to g300_lang/phones/silence.txt
>>> --> g300_lang/phones/silence.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/optional_silence.{txt, int, csl} ...
>>> --> 1 entry/entries in g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.int corresponds to g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.csl corresponds to g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.{txt, int, csl} are OK
>>>
>>> Checking g300_lang/phones/extra_questions.{txt, int} ...
>>> --> ERROR: fail to open g300_lang/phones/extra_questions.txt
>>>
>>> Checking g300_lang/phones/roots.{txt, int} ...
>>> --> 75 entry/entries in g300_lang/phones/roots.txt
>>> --> g300_lang/phones/roots.int corresponds to g300_lang/phones/roots.txt
>>> --> g300_lang/phones/roots.{txt, int} are OK
>>>
>>> Checking g300_lang/phones/sets.{txt, int} ...
>>> --> ERROR: fail to open g300_lang/phones/sets.int
>>>
>>> Checking g300_lang/phones/word_boundary.{txt, int} ...
>>> --> 315 entry/entries in g300_lang/phones/word_boundary.txt
>>> --> g300_lang/phones/word_boundary.int corresponds to g300_lang/phones/word_boundary.txt
>>> --> g300_lang/phones/word_boundary.{txt, int} are OK
>>>
>>> Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
>>> --> silence.txt and nonsilence.txt are disjoint
>>> --> silence.txt and disambig.txt are disjoint
>>> --> disambig.txt and nonsilence.txt are disjoint
>>> --> disjoint property is OK
>>>
>>> Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
>>> --> summation property is OK
>>>
>>> Checking optional_silence.txt ...
>>> --> reading g300_lang/phones/optional_silence.txt
>>> --> g300_lang/phones/optional_silence.txt is OK
>>>
>>> Checking disambiguation symbols: #0 and #1
>>> --> g300_lang/phones/disambig.txt has "#0" and "#1"
>>> --> g300_lang/phones/disambig.txt is OK
>>>
>>> Checking topo ...
>>> --> g300_lang/topo's nonsilence section is OK
>>> --> g300_lang/topo's silence section is OK
>>> --> g300_lang/topo is OK
>>>
>>> Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
>>> --> g300_lang/phones/word_boundary.txt doesn't include disambiguation symbols
>>> --> g300_lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
>>> --> g300_lang/phones/word_boundary.txt is OK
>>> --> checking L.fst and L_disambig.fst...
>>> --> generating a 46 words sequence
>>> --> resulting phone sequence from L.fst corresponds to the word sequence
>>> --> L.fst is OK
>>> --> resulting phone sequence from L_disambig.fst corresponds to the word sequence
>>> --> L_disambig.fst is OK
>>>
>>> Checking g300_lang/oov.{txt, int} ...
>>> --> 1 entry/entries in g300_lang/oov.txt
>>> --> g300_lang/oov.int corresponds to g300_lang/oov.txt
>>> --> g300_lang/oov.{txt, int} are OK
>>>
>>>
>>>
>>> Nathan
>>>
>>> On Jul 10, 2013, at 9:12 PM, Daniel Povey wrote:
>>>
>>>> OK-- so the word-alignment seems to have failed.  Generally that is
>>>> because of invalid word-boundary information.  That file is indexed by
>>>> phones, not words.  Issues can include a mismatch in phone set; words
>>>> that don't have any phones in them; or phones that have only one state
>>>> in their topology (this is a bug that was recently fixed, those should
>>>> work now if you update and recompile).
>>>> That program should not generally output any warnings, if all is OK.
>>>> Try to use the program utils/validate_lang.pl to make sure your
>>>> g300_lang/ directory is OK.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>>>>>
>>>>> Sorry, and it ends with this:
>>>>>
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 98.cut1
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 98.cut2
>>>>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
>>>>> path, 0 had errors.
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 98.cut3
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
>>>>> aligned 0 lattices; 132 had errors.
>>>>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
>>>>> to ctm format; 0 had errors.
>>>>> ndunn:childspeech%
>>>>>
>>>>>
>>>>> Nathan
>>>>>
>>>>> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>>>>>
>>>>>
>>>>> The std err output is this:
>>>>>
>>>>> ndunn:childspeech% lattice-1best "ark:gunzip -c
>>>>> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
>>>>> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
>>>>> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
>>>>> exp/tri2a/ctm2/output.txt
>>>>> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
>>>>> ark:-
>>>>> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
>>>>> ark:- ark:-
>>>>> nbest-to-ctm ark:- -
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 02.cut1
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 02.cut2
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 02.cut3
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 03.cut1
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 03.cut2
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>>>>> partial lattice for 03.cut3
>>>>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>>>>> Invalid word at end of lattice [partial lattice, forced out?]
>>>>>
>>>>>
>>>>> Nathan Dunn, Ph.D.
>>>>> Scientific Programer
>>>>> College of Arts and Science IT
>>>>> 541-221-2418
>>>>> nd...@ca...
>>>>>
>>>>>
>>>>>
>>>>> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>>>>>
>>>>> Can you provide the logging output, at least some representative lines
>>>>> from it.  Are there any warnings?
>>>>> Dan
>>>>>
>>>>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>>>>> Communication and Updates <kal...@li...> wrote:
>>>>>
>>>>>
>>>>> I'm trying to get word timing information out of a successfully trained
>>>>> language model that I've already been able to successfully decode with
>>>>> following these instructions.
>>>>>
>>>>>
>>>>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>>>>>
>>>>>
>>>>> This is command I've run:
>>>>>
>>>>>
>>>>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
>>>>> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
>>>>> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
>>>>> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>>>>>
>>>>>
>>>>>
>>>>> The problem is that I only have one entry per transcript (these transcripts
>>>>> are 1 minute long) and I don't see any bearing on this relative to the word
>>>>> input.    the
>>>>>
>>>>>
>>>>> 02.cut1 1 0.00 67.11 I
>>>>>
>>>>> 02.cut2 1 0.00 62.44 HIS
>>>>>
>>>>> 02.cut3 1 0.00 65.76 MOUNT
>>>>>
>>>>> 03.cut1 1 0.00 62.62 I
>>>>>
>>>>> 03.cut2 1 0.00 62.41 WHO
>>>>>
>>>>> 03.cut3 1 0.00 63.72 I
>>>>>
>>>>> 06.cut1 1 0.00 62.13 STANDING
>>>>>
>>>>> 06.cut2 1 0.00 57.95 A
>>>>>
>>>>> 06.cut3 1 0.00 66.78 I
>>>>>
>>>>> . . .
>>>>>
>>>>> What I want is the things for each word:
>>>>>
>>>>> 02.cut1 1 0.00 43.7 YOU
>>>>>
>>>>> 02.cut1 1 81.2 121.3 ARE
>>>>>
>>>>> 02.cut1 1 145.4 163.8 STANDING
>>>>>
>>>>> . . .
>>>>>
>>>>>
>>>>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>>>>>
>>>>> 1 nonword
>>>>>
>>>>> 2 begin
>>>>>
>>>>> 3 end
>>>>>
>>>>> 4 internal
>>>>>
>>>>> 5 singleton
>>>>>
>>>>> 6 nonword
>>>>>
>>>>> 7 begin
>>>>>
>>>>> 8 end
>>>>>
>>>>> . . .
>>>>>
>>>>>
>>>>>
>>>>> Any help is much appreciated.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Nathan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> See everything from the browser to the database with AppDynamics
>>>>>
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>>
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>>
>>>>> Start your free trial of AppDynamics Pro today!
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>
>>>>> _______________________________________________
>>>>>
>>>>> Kaldi-users mailing list
>>>>>
>>>>> Kal...@li...
>>>>>
>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>>
>>>>>
>>>>>
>>>
>
>