|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-11 05:00:55
|
I think that was part of it. I fixed one problem with the oov.txt / oov.int
I'll try to recompile that bug fix and see if that works. Its possible that I'm creating word_boundaries incorrectly. How many entries would you expect to get (I am getting 315). I wonder if I am using word_boundaries for the wrong set of phones . .
Checking g300_lang/phones.txt ...
--> g300_lang/phones.txt is OK
Checking words.txt: #0 ...
--> g300_lang/words.txt has "#0"
--> g300_lang/words.txt is OK
Checking g300_lang/phones/context_indep.{txt, int, csl} ...
--> 75 entry/entries in g300_lang/phones/context_indep.txt
--> g300_lang/phones/context_indep.int corresponds to g300_lang/phones/context_indep.txt
--> g300_lang/phones/context_indep.csl corresponds to g300_lang/phones/context_indep.txt
--> g300_lang/phones/context_indep.{txt, int, csl} are OK
Checking g300_lang/phones/disambig.{txt, int, csl} ...
--> 28 entry/entries in g300_lang/phones/disambig.txt
--> g300_lang/phones/disambig.int corresponds to g300_lang/phones/disambig.txt
--> g300_lang/phones/disambig.csl corresponds to g300_lang/phones/disambig.txt
--> g300_lang/phones/disambig.{txt, int, csl} are OK
Checking g300_lang/phones/nonsilence.{txt, int, csl} ...
--> 240 entry/entries in g300_lang/phones/nonsilence.txt
--> g300_lang/phones/nonsilence.int corresponds to g300_lang/phones/nonsilence.txt
--> g300_lang/phones/nonsilence.csl corresponds to g300_lang/phones/nonsilence.txt
--> g300_lang/phones/nonsilence.{txt, int, csl} are OK
Checking g300_lang/phones/silence.{txt, int, csl} ...
--> 75 entry/entries in g300_lang/phones/silence.txt
--> g300_lang/phones/silence.int corresponds to g300_lang/phones/silence.txt
--> g300_lang/phones/silence.csl corresponds to g300_lang/phones/silence.txt
--> g300_lang/phones/silence.{txt, int, csl} are OK
Checking g300_lang/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.int corresponds to g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.csl corresponds to g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.{txt, int, csl} are OK
Checking g300_lang/phones/extra_questions.{txt, int} ...
--> ERROR: fail to open g300_lang/phones/extra_questions.txt
Checking g300_lang/phones/roots.{txt, int} ...
--> 75 entry/entries in g300_lang/phones/roots.txt
--> g300_lang/phones/roots.int corresponds to g300_lang/phones/roots.txt
--> g300_lang/phones/roots.{txt, int} are OK
Checking g300_lang/phones/sets.{txt, int} ...
--> ERROR: fail to open g300_lang/phones/sets.int
Checking g300_lang/phones/word_boundary.{txt, int} ...
--> 315 entry/entries in g300_lang/phones/word_boundary.txt
--> g300_lang/phones/word_boundary.int corresponds to g300_lang/phones/word_boundary.txt
--> g300_lang/phones/word_boundary.{txt, int} are OK
Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK
Checking optional_silence.txt ...
--> reading g300_lang/phones/optional_silence.txt
--> g300_lang/phones/optional_silence.txt is OK
Checking disambiguation symbols: #0 and #1
--> g300_lang/phones/disambig.txt has "#0" and "#1"
--> g300_lang/phones/disambig.txt is OK
Checking topo ...
--> g300_lang/topo's nonsilence section is OK
--> g300_lang/topo's silence section is OK
--> g300_lang/topo is OK
Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> g300_lang/phones/word_boundary.txt doesn't include disambiguation symbols
--> g300_lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> g300_lang/phones/word_boundary.txt is OK
--> checking L.fst and L_disambig.fst...
--> generating a 46 words sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK
Checking g300_lang/oov.{txt, int} ...
--> 1 entry/entries in g300_lang/oov.txt
--> g300_lang/oov.int corresponds to g300_lang/oov.txt
--> g300_lang/oov.{txt, int} are OK
Nathan
On Jul 10, 2013, at 9:12 PM, Daniel Povey wrote:
> OK-- so the word-alignment seems to have failed. Generally that is
> because of invalid word-boundary information. That file is indexed by
> phones, not words. Issues can include a mismatch in phone set; words
> that don't have any phones in them; or phones that have only one state
> in their topology (this is a bug that was recently fixed, those should
> work now if you update and recompile).
> That program should not generally output any warnings, if all is OK.
> Try to use the program utils/validate_lang.pl to make sure your
> g300_lang/ directory is OK.
>
> Dan
>
>
> On Thu, Jul 11, 2013 at 12:06 AM, Nathan Dunn <nd...@me...> wrote:
>>
>> Sorry, and it ends with this:
>>
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 98.cut1
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 98.cut2
>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 132 to best
>> path, 0 had errors.
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 98.cut3
>> LOG (lattice-align-words:main():lattice-align-words.cc:104) Successfully
>> aligned 0 lattices; 132 had errors.
>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 132 linear lattices
>> to ctm format; 0 had errors.
>> ndunn:childspeech%
>>
>>
>> Nathan
>>
>> On Jul 10, 2013, at 9:06 PM, Nathan Dunn wrote:
>>
>>
>> The std err output is this:
>>
>> ndunn:childspeech% lattice-1best "ark:gunzip -c
>> exp/tri2a/decode_test_childspeech/lat.gz|" ark:- | lattice-align-words
>> g300_lang/phones/word_boundary.int exp/tri2a/final.mdl ark:- ark:- |
>> nbest-to-ctm ark:- - | utils/int2sym.pl -f 5 g300_lang/words.txt >
>> exp/tri2a/ctm2/output.txt
>> lattice-1best 'ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|'
>> ark:-
>> lattice-align-words g300_lang/phones/word_boundary.int exp/tri2a/final.mdl
>> ark:- ark:-
>> nbest-to-ctm ark:- -
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 02.cut1
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 02.cut2
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 02.cut3
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 03.cut1
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 03.cut2
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>> LOG (lattice-align-words:main():lattice-align-words.cc:89) Outputting
>> partial lattice for 03.cut3
>> WARNING (lattice-align-words:OutputArcForce():word-align-lattice.cc:541)
>> Invalid word at end of lattice [partial lattice, forced out?]
>>
>>
>> Nathan Dunn, Ph.D.
>> Scientific Programer
>> College of Arts and Science IT
>> 541-221-2418
>> nd...@ca...
>>
>>
>>
>> On Jul 10, 2013, at 8:45 PM, Daniel Povey wrote:
>>
>> Can you provide the logging output, at least some representative lines
>> from it. Are there any warnings?
>> Dan
>>
>> On Wed, Jul 10, 2013 at 11:38 PM, Mailing list used for User
>> Communication and Updates <kal...@li...> wrote:
>>
>>
>> I'm trying to get word timing information out of a successfully trained
>> language model that I've already been able to successfully decode with
>> following these instructions.
>>
>>
>> https://sourceforge.net/mailarchive/message.php?msg_id=30729903
>>
>>
>> This is command I've run:
>>
>>
>> lattice-1best "ark:gunzip -c exp/tri2a/decode_test_childspeech/lat.gz|"
>> ark:- | lattice-align-words g300_lang/phones/word_boundary.int
>> exp/tri2a/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | utils/int2sym.pl -f
>> 5 g300_lang/words.txt > exp/tri2a/ctm2/output.txt
>>
>>
>>
>> The problem is that I only have one entry per transcript (these transcripts
>> are 1 minute long) and I don't see any bearing on this relative to the word
>> input. the
>>
>>
>> 02.cut1 1 0.00 67.11 I
>>
>> 02.cut2 1 0.00 62.44 HIS
>>
>> 02.cut3 1 0.00 65.76 MOUNT
>>
>> 03.cut1 1 0.00 62.62 I
>>
>> 03.cut2 1 0.00 62.41 WHO
>>
>> 03.cut3 1 0.00 63.72 I
>>
>> 06.cut1 1 0.00 62.13 STANDING
>>
>> 06.cut2 1 0.00 57.95 A
>>
>> 06.cut3 1 0.00 66.78 I
>>
>> . . .
>>
>> What I want is the things for each word:
>>
>> 02.cut1 1 0.00 43.7 YOU
>>
>> 02.cut1 1 81.2 121.3 ARE
>>
>> 02.cut1 1 145.4 163.8 STANDING
>>
>> . . .
>>
>>
>> The words.txt is 116K, but word_boundary.int has only 316 entries like this:
>>
>> 1 nonword
>>
>> 2 begin
>>
>> 3 end
>>
>> 4 internal
>>
>> 5 singleton
>>
>> 6 nonword
>>
>> 7 begin
>>
>> 8 end
>>
>> . . .
>>
>>
>>
>> Any help is much appreciated.
>>
>>
>> Thanks,
>>
>>
>> Nathan
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> See everything from the browser to the database with AppDynamics
>>
>> Get end-to-end visibility with application monitoring from AppDynamics
>>
>> Isolate bottlenecks and diagnose root cause in seconds.
>>
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>
>> _______________________________________________
>>
>> Kaldi-users mailing list
>>
>> Kal...@li...
>>
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>
>>
>>
|