You can subscribe to this list here.
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2012 |
Jan
|
Feb
|
Mar
(8) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(7) |
May
(31) |
Jun
(40) |
Jul
(65) |
Aug
(37) |
Sep
(12) |
Oct
(57) |
Nov
(15) |
Dec
(35) |
| 2014 |
Jan
(3) |
Feb
(30) |
Mar
(57) |
Apr
(26) |
May
(49) |
Jun
(26) |
Jul
(63) |
Aug
(33) |
Sep
(20) |
Oct
(153) |
Nov
(62) |
Dec
(20) |
| 2015 |
Jan
(6) |
Feb
(21) |
Mar
(42) |
Apr
(33) |
May
(76) |
Jun
(102) |
Jul
(39) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-15 03:22:20
|
There are not currently, but if anyone wants to help by suggesting some system of registering such models and making them public, it would be appreciated. An issue is that it would be necessary to document which version of Kaldi they had been generated with and to associate certain information with them, such as the lang/ directory, and some notation of what feature-processing pipeline they had been generated with. Dan On Mon, Oct 14, 2013 at 8:18 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > Hello, > > Are there any sample acoustic models and language models (graphs) available > with Kaldi (similar to those at http://www.keithv.com/software/htk/us/)? > > Thanks, > e > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-15 03:18:56
|
Hello, Are there any sample acoustic models and language models (graphs) available with Kaldi (similar to those at http://www.keithv.com/software/htk/us/)? Thanks, e |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-14 19:48:54
|
Yes, you can. Dan On Mon, Oct 14, 2013 at 11:45 AM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > Hi, > > I'm trying to get time information (in terms of the number of frames) > from a lattice. I follow the instruction at the bottom of the lattice > page: > > http://kaldi.sourceforge.net/lattices.html > > Here is a snippet of the first few lines of the output of lattice-align-words. > > 0 1 !SIL 12.9525,7014.69,2_8_18_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17 > 1 2 THE 0,0,28760_28796_28795_28795_28842_28841_31456_31455_31676_31675_31806_31805 > 2 33 SALE 4.00787,2441.77,298_297_374_373_373_373_373_373_373_438_437_437_13528_13527_13527_13552_13596_13595_13595_13595_29390_29389_29389_29389_29510_29509_29564_29563_29563 > 2 32 SELL 11.552,2097.14,298_297_374_373_373_373_373_373_373_436_435_435_435_46812_46811_46811_46920_46919_46919_46984_46983_29434_29433_29433_29524_29523_29523_29564_29563 > 2 3 CELL 13.0187,2399.32,298_297_374_373_373_373_373_373_373_436_435_435_435_46812_46811_46811_46920_46919_46919_46984_46983_29434_29433_29433_29524_29523_29523_29564_29563 > > Can I interpret each transition-id (concatenated by underscore) as > occupying a single frame? In other words, can I assume the number of > transition-id's associated on the edge corresponds to the duration of > the word edge. > > Thanks, > Hao > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-14 18:46:07
|
Hi, I'm trying to get time information (in terms of the number of frames) from a lattice. I follow the instruction at the bottom of the lattice page: http://kaldi.sourceforge.net/lattices.html Here is a snippet of the first few lines of the output of lattice-align-words. 0 1 !SIL 12.9525,7014.69,2_8_18_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17 1 2 THE 0,0,28760_28796_28795_28795_28842_28841_31456_31455_31676_31675_31806_31805 2 33 SALE 4.00787,2441.77,298_297_374_373_373_373_373_373_373_438_437_437_13528_13527_13527_13552_13596_13595_13595_13595_29390_29389_29389_29389_29510_29509_29564_29563_29563 2 32 SELL 11.552,2097.14,298_297_374_373_373_373_373_373_373_436_435_435_435_46812_46811_46811_46920_46919_46919_46984_46983_29434_29433_29433_29524_29523_29523_29564_29563 2 3 CELL 13.0187,2399.32,298_297_374_373_373_373_373_373_373_436_435_435_435_46812_46811_46811_46920_46919_46919_46984_46983_29434_29433_29433_29524_29523_29523_29564_29563 Can I interpret each transition-id (concatenated by underscore) as occupying a single frame? In other words, can I assume the number of transition-id's associated on the edge corresponds to the duration of the word edge. Thanks, Hao |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-14 16:07:28
|
Thanks, it worked! -----Original Message----- From: Mailing list used for User Communication and Updates <kal...@li...> To: kaldi-users <kal...@li...> Sent: Mon, Oct 14, 2013 6:12 pm Subject: Re: [Kaldi-users] libfst.so: undefined reference to `dlopen' Hi, Did you try to compile OpenFST without "--disable-shared" or to run Kaldi's src/configure without "--shared" and recompile? The latter will build statically linked Kaldi binaries (and will occupy more disk space). Vassil On Mon, Oct 14, 2013 at 12:40 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > Hi, > > I am getting following problem when creating online recognizer binaries. > > OpenFST was compiled using --enable-static --disable-shared options. > > cd src/onlinebin > > make > > g++ -lfst -lm -ldl -msse -msse2 -Wall -I.. -fPIC -DKALDI_DOUBLEPRECISION=0 > -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 > -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS > -I/home/user/speech/kaldi/kaldi-trunk/tools/ATLAS/include > -I/home/user/speech/kaldi/kaldi-trunk/tools/openfst/include > -Wno-sign-compare -I/home/user/speech/portaudio/portaudio/include -g > -rdynamic -Wl,-rpath=/home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib > -ldl online-server-gmm-decode-faster.cc ../online/kaldi-online.a > ../lat/kaldi-lat.a ../decoder/kaldi-decoder.a ../feat/kaldi-feat.a > ../transform/kaldi-transform.a ../gmm/kaldi-gmm.a ../hmm/kaldi-hmm.a > ../tree/kaldi-tree.a ../matrix/kaldi-matrix.a ../util/kaldi-util.a > ../base/kaldi-base.a > /home/user/speech/portaudio/portaudio/lib/.libs/libportaudio.a -lasound > -ljack -L/home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib -lfst -ldl > /usr/lib/atlas-base/libatlas.so.3gf /usr/lib/atlas-base/libf77blas.so.3gf > /usr/lib/atlas-base/libcblas.so.3gf > /usr/lib/atlas-base/liblapack_atlas.so.3gf -lm -lpthread -ldl -o > online-server-gmm-decode-faster > > /home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib/libfst.so: undefined > reference to `dlopen' > /home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib/libfst.so: undefined > reference to `dlerror' > > collect2: ld returned 1 exit status > > How to fix it? > > Thanks a lot. > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk _______________________________________________ Kaldi-users mailing list Kal...@li... https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-14 10:12:27
|
Hi, Did you try to compile OpenFST without "--disable-shared" or to run Kaldi's src/configure without "--shared" and recompile? The latter will build statically linked Kaldi binaries (and will occupy more disk space). Vassil On Mon, Oct 14, 2013 at 12:40 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > Hi, > > I am getting following problem when creating online recognizer binaries. > > OpenFST was compiled using --enable-static --disable-shared options. > > cd src/onlinebin > > make > > g++ -lfst -lm -ldl -msse -msse2 -Wall -I.. -fPIC -DKALDI_DOUBLEPRECISION=0 > -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 > -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS > -I/home/user/speech/kaldi/kaldi-trunk/tools/ATLAS/include > -I/home/user/speech/kaldi/kaldi-trunk/tools/openfst/include > -Wno-sign-compare -I/home/user/speech/portaudio/portaudio/include -g > -rdynamic -Wl,-rpath=/home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib > -ldl online-server-gmm-decode-faster.cc ../online/kaldi-online.a > ../lat/kaldi-lat.a ../decoder/kaldi-decoder.a ../feat/kaldi-feat.a > ../transform/kaldi-transform.a ../gmm/kaldi-gmm.a ../hmm/kaldi-hmm.a > ../tree/kaldi-tree.a ../matrix/kaldi-matrix.a ../util/kaldi-util.a > ../base/kaldi-base.a > /home/user/speech/portaudio/portaudio/lib/.libs/libportaudio.a -lasound > -ljack -L/home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib -lfst -ldl > /usr/lib/atlas-base/libatlas.so.3gf /usr/lib/atlas-base/libf77blas.so.3gf > /usr/lib/atlas-base/libcblas.so.3gf > /usr/lib/atlas-base/liblapack_atlas.so.3gf -lm -lpthread -ldl -o > online-server-gmm-decode-faster > > /home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib/libfst.so: undefined > reference to `dlopen' > /home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib/libfst.so: undefined > reference to `dlerror' > > collect2: ld returned 1 exit status > > How to fix it? > > Thanks a lot. > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-14 09:40:34
|
Hi, I am getting following problem when creating online recognizer binaries. OpenFST was compiled using --enable-static --disable-shared options. cd src/onlinebin make g++ -lfst -lm -ldl -msse -msse2 -Wall -I.. -fPIC -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -rdynamic -DHAVE_CXXABI_H -DHAVE_ATLAS -I/home/user/speech/kaldi/kaldi-trunk/tools/ATLAS/include -I/home/user/speech/kaldi/kaldi-trunk/tools/openfst/include -Wno-sign-compare -I/home/user/speech/portaudio/portaudio/include -g -rdynamic -Wl,-rpath=/home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib -ldl online-server-gmm-decode-faster.cc ../online/kaldi-online.a ../lat/kaldi-lat.a ../decoder/kaldi-decoder.a ../feat/kaldi-feat.a ../transform/kaldi-transform.a ../gmm/kaldi-gmm.a ../hmm/kaldi-hmm.a ../tree/kaldi-tree.a ../matrix/kaldi-matrix.a ../util/kaldi-util.a ../base/kaldi-base.a /home/user/speech/portaudio/portaudio/lib/.libs/libportaudio.a -lasound -ljack -L/home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib -lfst -ldl /usr/lib/atlas-base/libatlas.so.3gf /usr/lib/atlas-base/libf77blas.so.3gf /usr/lib/atlas-base/libcblas.so.3gf /usr/lib/atlas-base/liblapack_atlas.so.3gf -lm -lpthread -ldl -o online-server-gmm-decode-faster /home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib/libfst.so: undefined reference to `dlopen' /home/user/speech/kaldi/kaldi-trunk/tools/openfst/lib/libfst.so: undefined reference to `dlerror' collect2: ld returned 1 exit status How to fix it? Thanks a lot. |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-09 17:46:51
|
[tried kaldi-discuss but didn't work, trying kaldi-users] They don't represent confidence scores. To get confidence scores, there are, I believe, options to lattice-mbr-decode that you can use to get this information, in a "sausage-string-like" format. Also, lattice-to-ctm-conf will use the same internal code to produce ctm-format output that has confidence scores. Cc'ing Kaldi discuss so the response is available to others. Dan On Wed, Oct 9, 2013 at 8:39 AM, Anil John M <ani...@gm...> wrote: > Hi, > > After decoding with Kaldi, I am finding there are many log files at > exp/tri2b/decode/scoring/log/best_path.*.log, containing each decoded text > string with respective cost mentioned as "best cost 435.662 + 39309.4 = > 39745.1". Those numerics are obviously different for individual utterances. > What am I interested to know is, do that numeric quantities signify > confidence scores, where the acoustic score is 435.662, LM score is 39309.4 > and Total score is 39745.1, for that respective utterance? > > Is there any way of getting these scores on a scale of 0 to 1 or say 0 to > 10. > > Thank you, > - Anil |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 20:59:25
|
Thank you for your help. I suspect that all these peculiarities arise from a NIST competition or something like that? Would Kaldi results be comparable with published results if I use s4 standard scripts for training and have 192 sentences TIMIT core as a test set? Valentin On Thu, Oct 3, 2013 at 12:23 AM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > On Wed, Oct 2, 2013 at 9:00 PM, Mailing list used for User > Communication and Updates <kal...@li...> wrote: >> Where can I find a description of how the classical train/test splits >> should look like? >> If I find that out I will be glad to mend the recipe. >> Maybe one of s3 or s4 is better? > > Look a s4. TIMIT has various peculiarities. Not just how the > train/dev/test sets are defined, but it's also a common practise to > train on one set of phones and score on a subset of those. And silence > is scored, which is really a free giveaway. -Arnab > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 20:31:16
|
On Wed, Oct 2, 2013 at 9:00 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > Where can I find a description of how the classical train/test splits > should look like? > If I find that out I will be glad to mend the recipe. > Maybe one of s3 or s4 is better? Look a s4. TIMIT has various peculiarities. Not just how the train/dev/test sets are defined, but it's also a common practise to train on one set of phones and score on a subset of those. And silence is scored, which is really a free giveaway. -Arnab |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 20:00:33
|
Thank you, Daniel and Arnab for your answers. I know Tara Sainath's paper. Probably my usage of the word "baseline" is misleading. I meant my own baseline. A set up which is accessible to me and which results can be compared with published results. I thought that Kaldi is ideal for such a purpose By the way, I think Kaldi is ideal in many ways. Thank you both for that! I've never published anything for English speech recognition yet so I didn't know that peculiarity. Where can I find a description of how the classical train/test splits should look like? If I find that out I will be glad to mend the recipe. Maybe one of s3 or s4 is better? Valentin On Wed, Oct 2, 2013 at 11:37 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > OK, so the training/test splits in the recipe need to be fixed. If > you can help us with this, that would be appreciated. > Dan > > > On Wed, Oct 2, 2013 at 3:23 PM, Mailing list used for User > Communication and Updates <kal...@li...> wrote: >> People generally use triphone baselines for TIMIT; that's not the >> issue here. If you are using the s5 recipe, it will give much better >> results than what is published because it doesn't use the standard >> training/test partition but uses more training data. >> -Arnab >> >> On Wed, Oct 2, 2013 at 8:13 PM, Mailing list used for User >> Communication and Updates <kal...@li...> wrote: >>> Hi, >>> >>> I’m a bit stuck with Kaldi TIMIT recipe. >>> In many articles there are phoneme error rate results which are much >>> worse than what one can obtain by simply running s5/run.sh on TIMIT >>> core. Apparently the recipe employs triphones with tied states when >>> papers present error numbers obtained on monophones (even Hinton’s RBM >>> paper is about a neural net with 183 outputs). >>> >>> How can I make a proper baseline on TIMIT core for a paper with Kaldi? >>> Why people don’t use triphones on TIMIT cause they are much better? >>> >>> Valentin >>> >>> ------------------------------------------------------------------------------ >>> October Webinars: Code for Performance >>> Free Intel webinars can help you accelerate application performance. >>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from >>> the latest Intel processors and coprocessors. See abstracts and register > >>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from >> the latest Intel processors and coprocessors. See abstracts and register > >> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 19:37:26
|
OK, so the training/test splits in the recipe need to be fixed. If you can help us with this, that would be appreciated. Dan On Wed, Oct 2, 2013 at 3:23 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > People generally use triphone baselines for TIMIT; that's not the > issue here. If you are using the s5 recipe, it will give much better > results than what is published because it doesn't use the standard > training/test partition but uses more training data. > -Arnab > > On Wed, Oct 2, 2013 at 8:13 PM, Mailing list used for User > Communication and Updates <kal...@li...> wrote: >> Hi, >> >> I’m a bit stuck with Kaldi TIMIT recipe. >> In many articles there are phoneme error rate results which are much >> worse than what one can obtain by simply running s5/run.sh on TIMIT >> core. Apparently the recipe employs triphones with tied states when >> papers present error numbers obtained on monophones (even Hinton’s RBM >> paper is about a neural net with 183 outputs). >> >> How can I make a proper baseline on TIMIT core for a paper with Kaldi? >> Why people don’t use triphones on TIMIT cause they are much better? >> >> Valentin >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from >> the latest Intel processors and coprocessors. See abstracts and register > >> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 19:24:26
|
People generally use triphone baselines for TIMIT; that's not the issue here. If you are using the s5 recipe, it will give much better results than what is published because it doesn't use the standard training/test partition but uses more training data. -Arnab On Wed, Oct 2, 2013 at 8:13 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > Hi, > > I’m a bit stuck with Kaldi TIMIT recipe. > In many articles there are phoneme error rate results which are much > worse than what one can obtain by simply running s5/run.sh on TIMIT > core. Apparently the recipe employs triphones with tied states when > papers present error numbers obtained on monophones (even Hinton’s RBM > paper is about a neural net with 183 outputs). > > How can I make a proper baseline on TIMIT core for a paper with Kaldi? > Why people don’t use triphones on TIMIT cause they are much better? > > Valentin > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 19:17:38
|
The people who run TIMIT experiments are quite often people who don't have much knowledge about speech recognition techniques and they want to run some experiments anyway. I advise to avoid TIMIT. BTW, there is a paper from IBM (with Tara Sainath?) that applies state-of-the-art methods to TIMIT, that may be a more relevant baseline. Dan On Wed, Oct 2, 2013 at 3:13 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > Hi, > > I’m a bit stuck with Kaldi TIMIT recipe. > In many articles there are phoneme error rate results which are much > worse than what one can obtain by simply running s5/run.sh on TIMIT > core. Apparently the recipe employs triphones with tied states when > papers present error numbers obtained on monophones (even Hinton’s RBM > paper is about a neural net with 183 outputs). > > How can I make a proper baseline on TIMIT core for a paper with Kaldi? > Why people don’t use triphones on TIMIT cause they are much better? > > Valentin > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 19:13:15
|
Hi, I’m a bit stuck with Kaldi TIMIT recipe. In many articles there are phoneme error rate results which are much worse than what one can obtain by simply running s5/run.sh on TIMIT core. Apparently the recipe employs triphones with tied states when papers present error numbers obtained on monophones (even Hinton’s RBM paper is about a neural net with 183 outputs). How can I make a proper baseline on TIMIT core for a paper with Kaldi? Why people don’t use triphones on TIMIT cause they are much better? Valentin |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 18:36:25
|
There are some outputs that you can look at: in build_tree.log it reports the objective-function improvement from splitting, if it's working it should be 5 or 6 or so. in lda_est.log (or est_lda.log?) it reports the sum of eigenvalues which should be between 10 and 30 (more is better). These values should increase during training. Note: these values do not exist in the monophone stage. Dan On Wed, Oct 2, 2013 at 2:33 PM, He Gu <gh...@gm...> wrote: > Sure, we have a recipe based on swbd example, but we modified it to fit our > own data. Everything goes well except decoding. We trained from monophone to > tri4a and decoded with corresponding model, we are expecting the result to > be better as the model getting more accurate, but it turns out that decoding > with monophone model has the lowest WER, and the result is getting worse as > the training goes. > > I've no idea what is going on since I didn't see any problem in the scripts. > Any help would be appreciated. Thank you. > > He Gu > > > On 10/02/2013 09:55 AM, Daniel Povey wrote: >> >> Can you be more specific? >> Dan >> >> >> On Wed, Oct 2, 2013 at 12:53 PM, He Gu <gh...@gm...> wrote: >>> >>> Hi, Daniel, >>> >>> I'm getting worse result over different decoding phases (monophone, tri1, >>> tri2 and so on), is there any possible reasons you can think of, thank >>> you. >>> >>> He Gu > > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 18:33:51
|
Sure, we have a recipe based on swbd example, but we modified it to fit our own data. Everything goes well except decoding. We trained from monophone to tri4a and decoded with corresponding model, we are expecting the result to be better as the model getting more accurate, but it turns out that decoding with monophone model has the lowest WER, and the result is getting worse as the training goes. I've no idea what is going on since I didn't see any problem in the scripts. Any help would be appreciated. Thank you. He Gu On 10/02/2013 09:55 AM, Daniel Povey wrote: > Can you be more specific? > Dan > > > On Wed, Oct 2, 2013 at 12:53 PM, He Gu <gh...@gm...> wrote: >> Hi, Daniel, >> >> I'm getting worse result over different decoding phases (monophone, tri1, >> tri2 and so on), is there any possible reasons you can think of, thank you. >> >> He Gu |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-02 16:55:30
|
Can you be more specific? Dan On Wed, Oct 2, 2013 at 12:53 PM, He Gu <gh...@gm...> wrote: > Hi, Daniel, > > I'm getting worse result over different decoding phases (monophone, tri1, > tri2 and so on), is there any possible reasons you can think of, thank you. > > He Gu |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-09-25 21:37:42
|
> > > I'm trying to train a neural net with nnet-cpu and I wonder how to > measure progress of training? > Should I assume that natural logarithm of frame accuracy on cv and > train is equal to corresponding numbers in computeprob log files? No, you have to compare the cross-entropy (xent) figure. > If so, then I probably receive strange results because after 20 epochs > accuracy on train is only about 45% when WER is not bad (7% relative > worse than sgmm result). The per-frame accuracies can't really be compared with WERs because there is no context taken into account in computing the frame accuracies, they are taken independently. > What hyper-parameters would you recommend for 10 hours training set > with 1 speaker in it for nnet-cpu? Start with the setup in RM and make the #parameters and layers a bit larger. > And should I expect that nnet will be better than sgmm for any LVCSR > test as it is better for Switchboard and WSJ? Not always, sometimes it's a bit worse. Sometimes you need to increase the decoding beam for the nnet-cpu setup. Dan > > Regards, > Valentin > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-09-25 21:05:12
|
Hi, I'm trying to train a neural net with nnet-cpu and I wonder how to measure progress of training? Should I assume that natural logarithm of frame accuracy on cv and train is equal to corresponding numbers in computeprob log files? If so, then I probably receive strange results because after 20 epochs accuracy on train is only about 45% when WER is not bad (7% relative worse than sgmm result). What hyper-parameters would you recommend for 10 hours training set with 1 speaker in it for nnet-cpu? And should I expect that nnet will be better than sgmm for any LVCSR test as it is better for Switchboard and WSJ? Regards, Valentin |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-09-24 17:50:35
|
Hi, you might wanna look at the script egs/wsj/s5/steps/lmrescore.sh The trouble might be that you don't call fstproject on the G fsts. y. On Tue, Sep 24, 2013 at 11:51 AM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > In order to help debug the problem you could look at the word sequence > involved (e.g. the best word sequence for that utterance; it doesn't > matter which word sequence you take, as all word sequences were > rejected from that utterance). You can get this from the decoder > logging output, or from running lattice-best-path (output to text as > ark,t:- and then apply int2sym.pl -f 2- data/lang/words.txt to view as > text). > > Then try to figure out why this word sequence might not be allowed in > your grammar. If you want to use OpenFST tools for debugging, you > could do lattice-to-fst and output to scp:foo, where foo would contain > entries like > 001-001 bar/001-001.fst > (make the directory bar first). > Then you can apply OpenFst tools on that fst file, e.g. try to compose > with G.fst. > If you have changed your lexicon or words.txt, make sure you properly > cleaned up, e.g. data/lang/tmp, or exp/tri4b/graph or whatever. > Dan > > > > > > I have a problem concerning LM rescoring - when trying to obtain decoded > > utterance with following pipeline: > > > > lattice-lmrescore --lm-scale=-1.0 ark:out.lat G_old.fst ark:- | \ > > lattice-lmrescore --lm-scale=1.0 ark:- G_new.fst ark:- | \ > > lattice-scale --inv_acoustic-scale=10 ark:- ark:- | \ > > lattice-best-path --word-symbol-table=words.txt ark:- ark,t:- | \ > > int2sym.pl -f 2- words.txt | \ > > sed 's/<UNK>//' > out.utt > > > > a warning displays: > >> Empty lattice for utterance 001-001 (incompatible LM?) > > > > HCLG is of course built with G_old.fst which is unigram version of > > G_new.fst. Both share the same lexicon. I get expected results when > omitting > > lmrescore part. > > What is weird, when i try to substitute G_new.fst with G_old.fst (to get > > "kind of" an identity transform), same warning occurs. > > Am I missing something? > > > > I don't know if it is relevant, but my AM is SGMM2 on top of LDA+MLLT. > > > > Marcin > > tv...@gm... > > > > > ------------------------------------------------------------------------------ > > October Webinars: Code for Performance > > Free Intel webinars can help you accelerate application performance. > > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > > from > > the latest Intel processors and coprocessors. See abstracts and register > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-09-24 17:29:46
|
Thanks.
Karel-- you might want to modify the script to check that the train
and cv sets have disjoint utterance-ids.
Dan
On Tue, Sep 24, 2013 at 1:27 PM, Mailing list used for User
Communication and Updates <kal...@li...> wrote:
> Hi,
>
> If someone gets the same errors..
> I had identical utterance ids in train and cv sets, while sound files
> were in different directories. That caused confusion in some feature
> preparation steps where the sets had been processed together,
>
> Regards,
> Valentin
>
> On Sat, Sep 21, 2013 at 12:37 AM, Valentin Mendelev <vm...@gm...> wrote:
>> Hi.
>>
>> That really was a partial message. I pressed shift+enter occasionally
>> and then realized how to solve my problem.
>> But another one have emerged.
>>
>> I’m trying to train a dnn on my own small base (1 speaker, about
>> 10hrs splitted on 7 words long utterances less than 10s duration
>> each) using egs/swbd/s5b/local/run_dnn.sh with appropriate alterations
>> (no feature-transform, paths).
>>
>> I run this
>>
>> $cuda_cmd $dir/_pretrain_dbn.log \
>> steps/pretrain_dbn.sh --hid_dim 2048 --train_utts 15000 --cmvn_utts
>> 1000 $t $dir || exit 1
>> <set proper paths>
>>
>> and this
>>
>> $cuda_cmd $dir/_train_nnet.log \
>> steps/train_nnet.sh --dbn $dbn --hid-layers 0 --learn-rate 0.008 \
>> $t $cv $lang $ali $ali_cv $dir || exit 1;
>>
>> Pre-training is ok now, but MLP training falls.
>> In prerun.log there are a lot of messages like this
>>
>> WARNING (nnet-train-xent-hardlab-frmshuff:main():nnet-train-xent-hardlab-frmshuf
>> f.cc:148) Alignment has wrong length, ali 258 vs. feats 334, utt 101-11
>> and finally
>> KALDI_ASSERT: at
>> nnet-train-xent-hardlab-frmshuff:CloseInternal:util/kaldi-table-inl.h:1546,
>> failed: holder_ == NULL
>> Stack trace is:
>> kaldi::KaldiGetStackTrace()
>> kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
>> kaldi::RandomAccessTableReaderArchiveImplBase<kaldi::BasicVectorHolder<int>
>>>::CloseInternal()
>>
>> .In _train_nnet.;log (last stage) :
>>
>> # RUNNING THE NN-TRAINING SCHEDULER
>> steps/train_nnet_scheduler.sh --feature-transform
>> exp/tri3b2_pretrain-dbn73_dnn/tr_splice5-1_cmvn-g.nnet --learn-rate
>> 0.008 --seed 777 exp/tri3b2_pretrain-dbn73_dnn/nnet_6.dbn_dnn.init
>> ark:copy-feats scp:exp/tri3b2_pretrain-dbn73_dnn/train.scp ark:- |
>> ark:copy-feats scp:exp/tri3b2_pretrain-dbn73_dnn/cv.scp ark:- |
>> ark:ali-to-pdf exp/tri3b2_ali/final.mdl "ark:gunzip -c
>> exp/tri3b2_ali/ali.*.gz exp/tri3b2_ali_cvseg/ali.*.gz |" ark:- |
>> exp/tri3b2_pretrain-dbn73_dnn
>> steps/train_nnet_scheduler.sh: line 78: 5525 Aborted
>> (core dumped) $train_tool --cross-validate=true
>> --bunchsize=$bunch_size --cachesize=$cache_size --verbose=$verbose
>> ${feature_transform:+ --feature-transform=$feature_transform}
>> ${use_gpu_id:+ --use-gpu-id=$use_gpu_id} $mlp_best "$feats_cv"
>> "$labels" 2> $dir/log/prerun.log
>>
>> It’s not a list sort problem because I can train simple triphone
>> models on the same alignment and decode the cv set.
>>
>> I’m using default feature settings, so I suppose it should be plain
>> mfcc with 5 frames contexts.
>> Could you tell where to look to make this work?
>>
>> I run ubuntu 12.10 64-bits and my video card is GTX 580. with 1.5G RAM
>>
>> Regards,
>> Valentin
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-09-24 17:27:51
|
Hi,
If someone gets the same errors..
I had identical utterance ids in train and cv sets, while sound files
were in different directories. That caused confusion in some feature
preparation steps where the sets had been processed together,
Regards,
Valentin
On Sat, Sep 21, 2013 at 12:37 AM, Valentin Mendelev <vm...@gm...> wrote:
> Hi.
>
> That really was a partial message. I pressed shift+enter occasionally
> and then realized how to solve my problem.
> But another one have emerged.
>
> I’m trying to train a dnn on my own small base (1 speaker, about
> 10hrs splitted on 7 words long utterances less than 10s duration
> each) using egs/swbd/s5b/local/run_dnn.sh with appropriate alterations
> (no feature-transform, paths).
>
> I run this
>
> $cuda_cmd $dir/_pretrain_dbn.log \
> steps/pretrain_dbn.sh --hid_dim 2048 --train_utts 15000 --cmvn_utts
> 1000 $t $dir || exit 1
> <set proper paths>
>
> and this
>
> $cuda_cmd $dir/_train_nnet.log \
> steps/train_nnet.sh --dbn $dbn --hid-layers 0 --learn-rate 0.008 \
> $t $cv $lang $ali $ali_cv $dir || exit 1;
>
> Pre-training is ok now, but MLP training falls.
> In prerun.log there are a lot of messages like this
>
> WARNING (nnet-train-xent-hardlab-frmshuff:main():nnet-train-xent-hardlab-frmshuf
> f.cc:148) Alignment has wrong length, ali 258 vs. feats 334, utt 101-11
> and finally
> KALDI_ASSERT: at
> nnet-train-xent-hardlab-frmshuff:CloseInternal:util/kaldi-table-inl.h:1546,
> failed: holder_ == NULL
> Stack trace is:
> kaldi::KaldiGetStackTrace()
> kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
> kaldi::RandomAccessTableReaderArchiveImplBase<kaldi::BasicVectorHolder<int>
>>::CloseInternal()
>
> .In _train_nnet.;log (last stage) :
>
> # RUNNING THE NN-TRAINING SCHEDULER
> steps/train_nnet_scheduler.sh --feature-transform
> exp/tri3b2_pretrain-dbn73_dnn/tr_splice5-1_cmvn-g.nnet --learn-rate
> 0.008 --seed 777 exp/tri3b2_pretrain-dbn73_dnn/nnet_6.dbn_dnn.init
> ark:copy-feats scp:exp/tri3b2_pretrain-dbn73_dnn/train.scp ark:- |
> ark:copy-feats scp:exp/tri3b2_pretrain-dbn73_dnn/cv.scp ark:- |
> ark:ali-to-pdf exp/tri3b2_ali/final.mdl "ark:gunzip -c
> exp/tri3b2_ali/ali.*.gz exp/tri3b2_ali_cvseg/ali.*.gz |" ark:- |
> exp/tri3b2_pretrain-dbn73_dnn
> steps/train_nnet_scheduler.sh: line 78: 5525 Aborted
> (core dumped) $train_tool --cross-validate=true
> --bunchsize=$bunch_size --cachesize=$cache_size --verbose=$verbose
> ${feature_transform:+ --feature-transform=$feature_transform}
> ${use_gpu_id:+ --use-gpu-id=$use_gpu_id} $mlp_best "$feats_cv"
> "$labels" 2> $dir/log/prerun.log
>
> It’s not a list sort problem because I can train simple triphone
> models on the same alignment and decode the cv set.
>
> I’m using default feature settings, so I suppose it should be plain
> mfcc with 5 frames contexts.
> Could you tell where to look to make this work?
>
> I run ubuntu 12.10 64-bits and my video card is GTX 580. with 1.5G RAM
>
> Regards,
> Valentin
|
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-09-24 15:51:30
|
In order to help debug the problem you could look at the word sequence involved (e.g. the best word sequence for that utterance; it doesn't matter which word sequence you take, as all word sequences were rejected from that utterance). You can get this from the decoder logging output, or from running lattice-best-path (output to text as ark,t:- and then apply int2sym.pl -f 2- data/lang/words.txt to view as text). Then try to figure out why this word sequence might not be allowed in your grammar. If you want to use OpenFST tools for debugging, you could do lattice-to-fst and output to scp:foo, where foo would contain entries like 001-001 bar/001-001.fst (make the directory bar first). Then you can apply OpenFst tools on that fst file, e.g. try to compose with G.fst. If you have changed your lexicon or words.txt, make sure you properly cleaned up, e.g. data/lang/tmp, or exp/tri4b/graph or whatever. Dan > > I have a problem concerning LM rescoring - when trying to obtain decoded > utterance with following pipeline: > > lattice-lmrescore --lm-scale=-1.0 ark:out.lat G_old.fst ark:- | \ > lattice-lmrescore --lm-scale=1.0 ark:- G_new.fst ark:- | \ > lattice-scale --inv_acoustic-scale=10 ark:- ark:- | \ > lattice-best-path --word-symbol-table=words.txt ark:- ark,t:- | \ > int2sym.pl -f 2- words.txt | \ > sed 's/<UNK>//' > out.utt > > a warning displays: >> Empty lattice for utterance 001-001 (incompatible LM?) > > HCLG is of course built with G_old.fst which is unigram version of > G_new.fst. Both share the same lexicon. I get expected results when omitting > lmrescore part. > What is weird, when i try to substitute G_new.fst with G_old.fst (to get > "kind of" an identity transform), same warning occurs. > Am I missing something? > > I don't know if it is relevant, but my AM is SGMM2 on top of LDA+MLLT. > > Marcin > tv...@gm... > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-09-24 15:22:04
|
Hi, I have a problem concerning LM rescoring - when trying to obtain decoded utterance with following pipeline: lattice-lmrescore --lm-scale=-1.0 ark:out.lat G_old.fst ark:- | \ lattice-lmrescore --lm-scale=1.0 ark:- G_new.fst ark:- | \ lattice-scale --inv_acoustic-scale=10 ark:- ark:- | \ lattice-best-path --word-symbol-table=words.txt ark:- ark,t:- | \ int2sym.pl -f 2- words.txt | \ sed 's/<UNK>//' > out.utt a warning displays: > Empty lattice for utterance 001-001 (incompatible LM?) HCLG is of course built with G_old.fst which is unigram version of G_new.fst. Both share the same lexicon. I get expected results when omitting lmrescore part. What is weird, when i try to substitute G_new.fst with G_old.fst (to get "kind of" an identity transform), same warning occurs. Am I missing something? I don't know if it is relevant, but my AM is SGMM2 on top of LDA+MLLT. Marcin tv...@gm... |