You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
|
Feb
|
Mar
(8) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(7) |
May
(31) |
Jun
(40) |
Jul
(65) |
Aug
(37) |
Sep
(12) |
Oct
(57) |
Nov
(15) |
Dec
(35) |
2014 |
Jan
(3) |
Feb
(30) |
Mar
(57) |
Apr
(26) |
May
(49) |
Jun
(26) |
Jul
(63) |
Aug
(33) |
Sep
(20) |
Oct
(153) |
Nov
(62) |
Dec
(20) |
2015 |
Jan
(6) |
Feb
(21) |
Mar
(42) |
Apr
(33) |
May
(76) |
Jun
(102) |
Jul
(39) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Sarah F. S. J. <sar...@gm...> - 2015-06-11 15:05:37
|
Dear Kaldi users, We would like to inform you about our recently published data for ASR available on github: https://github.com/sarahjuan/iban . The data contains speech in Iban language, a language that is spoken in Borneo. We have used the data in our study on under-resourced language for ASR and we have built our systems using Kaldi. Thanks to the available recipes and active forum, we have learnt several techniques that were very useful for our research. Feel free to download our data and Kaldi scripts that were used to build ASR. Best regards, Sarah (sjs...@fi...) & Laurent (lau...@im...) |
From: Vesely K. <ve...@gm...> - 2015-06-11 13:33:30
|
If you look into the LSTM code, you'd see that the last operation on the output is multiplying by a linear transform, while there is no activation function used. K. On 06/11/2015 05:35 AM, Xingyu Na wrote: > Hi Karel, > > Thank you so much. The pre-training goes reasonably now. > BTW, I might be asking silly questions but, why "the outputs of LSTM > block, are more like Gaussian random variables"? Where could I find > such analysis? > Does it affect the convergence when I trained the stack of 2 LSTM and > 2 DNN(not RBM)? Should I add a sigmoid for that training as well? > > Thank you again and best regards, > Xingyu > > On 06/10/2015 08:38 PM, Vesely Karel wrote: >> Hi, >> the outputs of LSTM block, are more like Gaussian random variables. >> So it makes more sense to use Gaussain visible units in RBM. >> However in case of RBM training we assume that the individual input >> features are normalized to have zero mean and unit variance, >> which is not guaranteed in the LSTM outptut. >> >> You can try running 'steps/nnet/pretrain_dbn.sh' with >> '--input-vis-type gauss' and see what happens, >> despite the wrong assumption, it may still train reasonably. >> >> Best regards, >> Karel. >> >> >> On 06/06/2015 07:51 AM, Xingyu Na wrote: >>> Hi, >>> >>> I trained 2 layers of LSTM, with 2 hidden layers on top of that. >>> The decoding performance on eval92 is reasonable. >>> Now I want to do RBM pre-training. >>> The straightforward way is to remove the hidden layers, and use the >>> LSTM >>> layers as feature transform, just as the way in Karel's cnn pre-train >>> recipe. >>> However, no matter how small the learn rate is, the first RBM seems not >>> converging, log is pasted below: >>> ================================================ >>> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >>> srand with : 777 >>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >>> TRAINING STARTED >>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >>> Iteration 1/2 >>> LOG >>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >>> >>> Running nnet-forward with per-utterance LSTM-state reset >>> LOG >>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >>> >>> Running nnet-forward with per-utterance LSTM-state reset >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.5e-06 after processing >>> 0.000277778h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 1h]: 218.955 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.45e-06 after processing >>> 1.38889h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 2h]: 222.583 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.4e-06 after processing >>> 2.77778h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 3h]: 220.827 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 4h]: 221.531 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.35e-06 after processing >>> 4.16667h >>> ....... >>> ================================================ >>> >>> Mse does not decrease. >>> However, after 1.rbm is trained, and concatenated with LSTM, (now the >>> transform become LSTM+RBM), the training of 2.rbm seems converging.... >>> ================================================ >>> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >>> srand with : 777 >>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >>> TRAINING STARTED >>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >>> Iteration 1/2 >>> LOG >>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >>> >>> Running nnet-forward with per-utterance LSTM-state reset >>> LOG >>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >>> >>> Running nnet-forward with per-utterance LSTM-state reset >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.5e-06 after processing >>> 0.000277778h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 1h]: 56.9416 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.45e-06 after processing >>> 1.38889h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 2h]: 39.1901 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.4e-06 after processing >>> 2.77778h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 3h]: 34.2891 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 4h]: 30.5311 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.35e-06 after processing >>> 4.16667h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 5h]: 29.2614 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.3e-06 after processing >>> 5.55556h >>> ....... >>> =============================================== >>> >>> I am quite confused about this. I believe further fine tuning of the >>> weights based on these RBMs does not make sense. >>> What am I missing? >>> >>> Best, >>> Xingyu >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > |
From: Xingyu Na <asr...@gm...> - 2015-06-11 03:36:11
|
Hi Karel, Thank you so much. The pre-training goes reasonably now. BTW, I might be asking silly questions but, why "the outputs of LSTM block, are more like Gaussian random variables"? Where could I find such analysis? Does it affect the convergence when I trained the stack of 2 LSTM and 2 DNN(not RBM)? Should I add a sigmoid for that training as well? Thank you again and best regards, Xingyu On 06/10/2015 08:38 PM, Vesely Karel wrote: > Hi, > the outputs of LSTM block, are more like Gaussian random variables. So > it makes more sense to use Gaussain visible units in RBM. > However in case of RBM training we assume that the individual input > features are normalized to have zero mean and unit variance, > which is not guaranteed in the LSTM outptut. > > You can try running 'steps/nnet/pretrain_dbn.sh' with > '--input-vis-type gauss' and see what happens, > despite the wrong assumption, it may still train reasonably. > > Best regards, > Karel. > > > On 06/06/2015 07:51 AM, Xingyu Na wrote: >> Hi, >> >> I trained 2 layers of LSTM, with 2 hidden layers on top of that. >> The decoding performance on eval92 is reasonable. >> Now I want to do RBM pre-training. >> The straightforward way is to remove the hidden layers, and use the LSTM >> layers as feature transform, just as the way in Karel's cnn pre-train >> recipe. >> However, no matter how small the learn rate is, the first RBM seems not >> converging, log is pasted below: >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 218.955 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 222.583 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 220.827 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 221.531 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> ....... >> ================================================ >> >> Mse does not decrease. >> However, after 1.rbm is trained, and concatenated with LSTM, (now the >> transform become LSTM+RBM), the training of 2.rbm seems converging.... >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 56.9416 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 39.1901 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 34.2891 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 30.5311 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 5h]: 29.2614 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h >> ....... >> =============================================== >> >> I am quite confused about this. I believe further fine tuning of the >> weights based on these RBMs does not make sense. >> What am I missing? >> >> Best, >> Xingyu >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Daniel P. <dp...@gm...> - 2015-06-10 19:04:35
|
Usually cases like this where after a while you see NaN's, are due to some kind of instability in the training, which causes the parameters to diverge. It could be due to too-high learning rates. It could also be that if you apply LSTMs on long pieces of audio, as happens in the discriminative training code, there is some kind of gradient explosion. However, IIRC LSTMs were specifically designed to avoid the possibility of gradient explosion, so this would be surprising. You could try smaller learning rates. Dan > When I try to do discriminative LSTM training I get the following error: > > If I use train_mpe.sh, it runs for a few thousand utterances and then I get > the following error: > > ERROR > (nnet-train-mpe-sequential:LatticeForwardBackwardMpeVariants():lattice-functions.cc:833) > Total forward score over lattice = -nan, while total backward score = 0 > and then the program crashes. > > If I use train_mmi.sh then after few thousand utterances I get logs with > "nan": > > VLOG[1] (nnet-train-mmi-sequential:main():nnet-train-mmi-sequential.cc:346) > Utterance 20080401_170000_bbcone_bbc_news_spk-0025_seg-0150897:0151494: > Average MMI obj. value = nan over 595 frames. (Avg. den-posterior on ali > -nan) > > However, the program keeps on running. > Is there a workaround for that? > > Thanks, > > Vishwa > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Gupta V. <vis...@cr...> - 2015-06-10 13:55:24
|
Hi, When I try to do discriminative LSTM training I get the following error: If I use train_mpe.sh, it runs for a few thousand utterances and then I get the following error: ERROR (nnet-train-mpe-sequential:LatticeForwardBackwardMpeVariants():lattice-functions.cc:833) Total forward score over lattice = -nan, while total backward score = 0 and then the program crashes. If I use train_mmi.sh then after few thousand utterances I get logs with "nan": VLOG[1] (nnet-train-mmi-sequential:main():nnet-train-mmi-sequential.cc:346) Utterance 20080401_170000_bbcone_bbc_news_spk-0025_seg-0150897:0151494: Average MMI obj. value = nan over 595 frames. (Avg. den-posterior on ali -nan) However, the program keeps on running. Is there a workaround for that? Thanks, Vishwa |
From: Vesely K. <ve...@gm...> - 2015-06-10 13:40:30
|
Or other option would be to add Sigmoid component after the LSTM and pre-train the RBM with Bernoulli visible units. K. On 06/10/2015 02:38 PM, Vesely Karel wrote: > Hi, > the outputs of LSTM block, are more like Gaussian random variables. So > it makes more sense to use Gaussain visible units in RBM. > However in case of RBM training we assume that the individual input > features are normalized to have zero mean and unit variance, > which is not guaranteed in the LSTM outptut. > > You can try running 'steps/nnet/pretrain_dbn.sh' with > '--input-vis-type gauss' and see what happens, > despite the wrong assumption, it may still train reasonably. > > Best regards, > Karel. > > > On 06/06/2015 07:51 AM, Xingyu Na wrote: >> Hi, >> >> I trained 2 layers of LSTM, with 2 hidden layers on top of that. >> The decoding performance on eval92 is reasonable. >> Now I want to do RBM pre-training. >> The straightforward way is to remove the hidden layers, and use the LSTM >> layers as feature transform, just as the way in Karel's cnn pre-train >> recipe. >> However, no matter how small the learn rate is, the first RBM seems not >> converging, log is pasted below: >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 218.955 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 222.583 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 220.827 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 221.531 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> ....... >> ================================================ >> >> Mse does not decrease. >> However, after 1.rbm is trained, and concatenated with LSTM, (now the >> transform become LSTM+RBM), the training of 2.rbm seems converging.... >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 56.9416 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 39.1901 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 34.2891 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 30.5311 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 5h]: 29.2614 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h >> ....... >> =============================================== >> >> I am quite confused about this. I believe further fine tuning of the >> weights based on these RBMs does not make sense. >> What am I missing? >> >> Best, >> Xingyu >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Vesely K. <ve...@gm...> - 2015-06-10 12:38:57
|
Hi, the outputs of LSTM block, are more like Gaussian random variables. So it makes more sense to use Gaussain visible units in RBM. However in case of RBM training we assume that the individual input features are normalized to have zero mean and unit variance, which is not guaranteed in the LSTM outptut. You can try running 'steps/nnet/pretrain_dbn.sh' with '--input-vis-type gauss' and see what happens, despite the wrong assumption, it may still train reasonably. Best regards, Karel. On 06/06/2015 07:51 AM, Xingyu Na wrote: > Hi, > > I trained 2 layers of LSTM, with 2 hidden layers on top of that. > The decoding performance on eval92 is reasonable. > Now I want to do RBM pre-training. > The straightforward way is to remove the hidden layers, and use the LSTM > layers as feature transform, just as the way in Karel's cnn pre-train > recipe. > However, no matter how small the learn rate is, the first RBM seems not > converging, log is pasted below: > ================================================ > LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by > srand with : 777 > LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM > TRAINING STARTED > LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) > Iteration 1/2 > LOG > (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) > Running nnet-forward with per-utterance LSTM-state reset > LOG > (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) > Running nnet-forward with per-utterance LSTM-state reset > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 1h]: 218.955 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 2h]: 222.583 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 3h]: 220.827 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 4h]: 221.531 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h > ....... > ================================================ > > Mse does not decrease. > However, after 1.rbm is trained, and concatenated with LSTM, (now the > transform become LSTM+RBM), the training of 2.rbm seems converging.... > ================================================ > LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by > srand with : 777 > LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM > TRAINING STARTED > LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) > Iteration 1/2 > LOG > (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) > Running nnet-forward with per-utterance LSTM-state reset > LOG > (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) > Running nnet-forward with per-utterance LSTM-state reset > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 1h]: 56.9416 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 2h]: 39.1901 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 3h]: 34.2891 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 4h]: 30.5311 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 5h]: 29.2614 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h > ....... > =============================================== > > I am quite confused about this. I believe further fine tuning of the > weights based on these RBMs does not make sense. > What am I missing? > > Best, > Xingyu > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Xingyu Na <asr...@gm...> - 2015-06-10 01:12:09
|
Thanks Dan. I did exactly the same thing, so you confirmed that I didn't make some stupid move :-P X. On 06/10/2015 01:22 AM, Daniel Povey wrote: > Yes, some tools link against the CUDA library even if they can't use it. > The way I solve this is to make a directory where I copy those > libraries to, and add it to the LD_LIBRARY_PATH. > You can figure out the libraries involved by running "ldd" on a binary > on a machine where they exist. > > Dan > > > On Tue, Jun 9, 2015 at 6:44 AM, Xingyu Na <asr...@gm...> wrote: >> Hi, >> >> Does nnet-latgen-faster(-parallel) use GPU or not? It seems not... >> I compiled Kaldi with CUDA, however, some tools linked the CU libs >> without actually using them. >> If I want to do align or make_denlats on nodes without CUDA/GPU, it >> seems that the only option is to recompile them without CUDA, right? >> >> The reason for this question is that I'm maintaining a GE cluster in >> which some nodes have GPU and some nodes do not. >> And I notice even if I do nnet-align-compiled with --use-gpu no, I need >> to run it on a machine which has CUDA libs.... >> >> How should I manage this? >> >> Best, >> Xingyu >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Daniel P. <dp...@gm...> - 2015-06-09 17:23:08
|
Yes, some tools link against the CUDA library even if they can't use it. The way I solve this is to make a directory where I copy those libraries to, and add it to the LD_LIBRARY_PATH. You can figure out the libraries involved by running "ldd" on a binary on a machine where they exist. Dan On Tue, Jun 9, 2015 at 6:44 AM, Xingyu Na <asr...@gm...> wrote: > Hi, > > Does nnet-latgen-faster(-parallel) use GPU or not? It seems not... > I compiled Kaldi with CUDA, however, some tools linked the CU libs > without actually using them. > If I want to do align or make_denlats on nodes without CUDA/GPU, it > seems that the only option is to recompile them without CUDA, right? > > The reason for this question is that I'm maintaining a GE cluster in > which some nodes have GPU and some nodes do not. > And I notice even if I do nnet-align-compiled with --use-gpu no, I need > to run it on a machine which has CUDA libs.... > > How should I manage this? > > Best, > Xingyu > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Xingyu Na <asr...@gm...> - 2015-06-09 10:44:44
|
Hi, Does nnet-latgen-faster(-parallel) use GPU or not? It seems not... I compiled Kaldi with CUDA, however, some tools linked the CU libs without actually using them. If I want to do align or make_denlats on nodes without CUDA/GPU, it seems that the only option is to recompile them without CUDA, right? The reason for this question is that I'm maintaining a GE cluster in which some nodes have GPU and some nodes do not. And I notice even if I do nnet-align-compiled with --use-gpu no, I need to run it on a machine which has CUDA libs.... How should I manage this? Best, Xingyu |
From: Xingyu Na <asr...@gm...> - 2015-06-06 05:51:30
|
Hi, I trained 2 layers of LSTM, with 2 hidden layers on top of that. The decoding performance on eval92 is reasonable. Now I want to do RBM pre-training. The straightforward way is to remove the hidden layers, and use the LSTM layers as feature transform, just as the way in Karel's cnn pre-train recipe. However, no matter how small the learn rate is, the first RBM seems not converging, log is pasted below: ================================================ LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by srand with : 777 LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM TRAINING STARTED LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) Iteration 1/2 LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 1h]: 218.955 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 2h]: 222.583 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 3h]: 220.827 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 4h]: 221.531 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h ....... ================================================ Mse does not decrease. However, after 1.rbm is trained, and concatenated with LSTM, (now the transform become LSTM+RBM), the training of 2.rbm seems converging.... ================================================ LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by srand with : 777 LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM TRAINING STARTED LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) Iteration 1/2 LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 1h]: 56.9416 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 2h]: 39.1901 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 3h]: 34.2891 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 4h]: 30.5311 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 5h]: 29.2614 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h ....... =============================================== I am quite confused about this. I believe further fine tuning of the weights based on these RBMs does not make sense. What am I missing? Best, Xingyu |
From: Daniel P. <dp...@gm...> - 2015-06-04 06:11:29
|
Paper is here http://www.danielpovey.com/files/2015_interspeech_silprob.pdf It affects decoding and will make very little to no difference in training. Dan On Thu, Jun 4, 2015 at 2:05 AM, Kirill Katsnelson <kir...@sm...> wrote: > Does the use of lexicon with silprobs affect decoding, or is it advantageous only for training? > > Generally, I do not understand this part; is there a reference to a paper that explains it? > > -kkm > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Kirill K. <kir...@sm...> - 2015-06-04 06:05:44
|
Does the use of lexicon with silprobs affect decoding, or is it advantageous only for training? Generally, I do not understand this part; is there a reference to a paper that explains it? -kkm |
From: Yannis C. <ha...@ho...> - 2015-06-02 07:25:11
|
Hello, I am training a decision tree using the standard wsj/s5 recipe and I try to acquire its structure using GetTreeStructure(). However, I get a warning that there are repeated leaves in the tree , even though I have allowed clustering (clustering threshold is a non-zero value, either -1 or a value greater that zero). Could this be a problem of the threshold value I use, given that I get the same warning using -1 as the clustering threshold? If not, how could I overcome this issue? Furthermore, is there a way to get the initial tree given only the outcome of GetTreeStructureInternal() (parents of all nodes in the tree and EventMaps of all internal nodes) ? I would like to thank you for the help,Yannis |
From: Daniel P. <dp...@gm...> - 2015-06-01 18:08:03
|
It is based on the ideas in this paper: "Minimum Bayes Risk decoding and system combination based on a recursion for edit distance", Haihua Xu, Daniel Povey, Lidia Mangu and Jie Zhu, Computer Speech and Language, 2011 It's like confusion network combination but derived in a more rigorous way. Dan On Mon, Jun 1, 2015 at 3:25 AM, karthick <cre...@gm...> wrote: > Hi Friends, > > Recently I noted that when I used local/score_combine.sh from TIMIT task to > combine the lattices of the same dnn model it gave reduced error rate. > > The actual DNN performance was 18.4 % PER, the combination of the same model > gave a % PER of 16.9. Since, I am new to this lattice_combination approach > can someone suggest me how it works ? and whether the result what i got is > correct ? > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: karthick <cre...@gm...> - 2015-06-01 07:26:08
|
Hi Friends, Recently I noted that when I used local/score_combine.sh from TIMIT task to combine the lattices of the same dnn model it gave reduced error rate. The actual DNN performance was 18.4 % PER, the combination of the same model gave a % PER of 16.9. Since, I am new to this lattice_combination approach can someone suggest me how it works ? and whether the result what i got is correct ? |
From: Xingyu Na <asr...@gm...> - 2015-05-30 07:38:09
|
Thank you Dan. My setup is for ~2000hrs speech, so I should take that into my note.... :) On 05/30/2015 04:06 AM, Daniel Povey wrote: > I never created GPU versions of those because they did not seem to be helpful. > From what I hear, dropout is tricky to get working in speech > recognition, and requires things like dropout-probability schedules, > and maybe only applying it to certain layers of the network. I don't > think it's even that useful for larger databases. > Dan > > > On Fri, May 29, 2015 at 2:54 AM, Xingyu Na <asr...@gm...> wrote: >> Hi, >> >> I want to regularize the network by perturbing the parameters, not the >> samples. >> I found two components in nnet2, namely "DropoutComponent" and >> "AdditiveNoiseComponent", but they seem to work only in CPU training. >> There is also a "kDropout" in nnet1. In the egs scripts, the only >> utilization of dropout is in the >> utils/nnetc-cpu/make_nnet_config_preconditioned.pl, but not used in any >> recipe. >> Has anyone got a dropout or additive-noise recipe working, using GPU? >> >> Best, >> X. >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Sean T. <se...@se...> - 2015-05-30 01:12:44
|
Many of the use cases are not going to have helpfully bracketing silence. Take for instance, addresses from a directory listing. -- Sean On Fri, May 29, 2015 at 4:27 PM, Daniel Povey <dp...@gm...> wrote: > Yes, I think there might be various reasons why that > dynamic-composition approach would not work in Kaldi, but I don't have > time right at this second to figure it out. > It might be easier to handle things that were dynamic like that if you > got rid of context dependency by (say) assuming the dynamic part could > only come before/after silence. Certainly it would require a lot of > coding. > Dan > > > On Fri, May 29, 2015 at 4:16 PM, Kirill Katsnelson > <kir...@sm...> wrote: > > Not sure if this is exactly a kaldi question, but how dynamic classes > could be handled in wfst HCLG composite transducer? There are more tricky > cases when the lexicon also needs extension. I want an optimized approach, > much faster than the whole HCLG composition from the ground up. > > > > The Holy Book paper suggests the lazy composition, but I am not sure how > to fit the determinization and adding self-loops into the picture (they did > not explicitly use self-loops in the $H$ and instead assumed them in the > decoder, but kaldi does, if I understand it correctly). > > > > -kkm > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Daniel P. <dp...@gm...> - 2015-05-29 20:27:14
|
Yes, I think there might be various reasons why that dynamic-composition approach would not work in Kaldi, but I don't have time right at this second to figure it out. It might be easier to handle things that were dynamic like that if you got rid of context dependency by (say) assuming the dynamic part could only come before/after silence. Certainly it would require a lot of coding. Dan On Fri, May 29, 2015 at 4:16 PM, Kirill Katsnelson <kir...@sm...> wrote: > Not sure if this is exactly a kaldi question, but how dynamic classes could be handled in wfst HCLG composite transducer? There are more tricky cases when the lexicon also needs extension. I want an optimized approach, much faster than the whole HCLG composition from the ground up. > > The Holy Book paper suggests the lazy composition, but I am not sure how to fit the determinization and adding self-loops into the picture (they did not explicitly use self-loops in the $H$ and instead assumed them in the decoder, but kaldi does, if I understand it correctly). > > -kkm > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Kirill K. <kir...@sm...> - 2015-05-29 20:16:40
|
Not sure if this is exactly a kaldi question, but how dynamic classes could be handled in wfst HCLG composite transducer? There are more tricky cases when the lexicon also needs extension. I want an optimized approach, much faster than the whole HCLG composition from the ground up. The Holy Book paper suggests the lazy composition, but I am not sure how to fit the determinization and adding self-loops into the picture (they did not explicitly use self-loops in the $H$ and instead assumed them in the decoder, but kaldi does, if I understand it correctly). -kkm |
From: Daniel P. <dp...@gm...> - 2015-05-29 20:06:58
|
I never created GPU versions of those because they did not seem to be helpful. >From what I hear, dropout is tricky to get working in speech recognition, and requires things like dropout-probability schedules, and maybe only applying it to certain layers of the network. I don't think it's even that useful for larger databases. Dan On Fri, May 29, 2015 at 2:54 AM, Xingyu Na <asr...@gm...> wrote: > Hi, > > I want to regularize the network by perturbing the parameters, not the > samples. > I found two components in nnet2, namely "DropoutComponent" and > "AdditiveNoiseComponent", but they seem to work only in CPU training. > There is also a "kDropout" in nnet1. In the egs scripts, the only > utilization of dropout is in the > utils/nnetc-cpu/make_nnet_config_preconditioned.pl, but not used in any > recipe. > Has anyone got a dropout or additive-noise recipe working, using GPU? > > Best, > X. > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Xingyu Na <asr...@gm...> - 2015-05-29 06:54:43
|
Hi, I want to regularize the network by perturbing the parameters, not the samples. I found two components in nnet2, namely "DropoutComponent" and "AdditiveNoiseComponent", but they seem to work only in CPU training. There is also a "kDropout" in nnet1. In the egs scripts, the only utilization of dropout is in the utils/nnetc-cpu/make_nnet_config_preconditioned.pl, but not used in any recipe. Has anyone got a dropout or additive-noise recipe working, using GPU? Best, X. |
From: Kirill K. <kir...@sm...> - 2015-05-26 18:11:21
|
Speaking about data set preprocessing only, will Stanford NLP POS tagger pull the trick? -kkm > -----Original Message----- > From: Nagendra Goel [mailto:nag...@go...] > Sent: 2015-05-24 1511 > To: Matthew Aylett > Cc: Dimitris Vassos; kal...@li... > Subject: Re: [Kaldi-users] LM grafting > > A systematic way for identifying special elements in text will be very > useful. Currently NSW-EXPAND from festival conflicts with this sub- > grammar approach although otherwise it's a good lm pre-processing step. > > Nagendra Kumar Goel > > On May 24, 2015 4:45 PM, "Matthew Aylett" <mat...@gm...> > wrote: > > > Not sure if this is relevant to this thread. But in the speech > synthesis system branch we have a very early text normaliser which > (when > complete) will detect things like phone numbers addresses, currencies > etc. The output form this could then be used to inform language model > building. Currently it deals with symbols and tokenisations in English. > > Potentially `(although I wasn't currently planning on this), the > text normaliser could be written in thrax - based on openfst - authored > by Richard Sproat I believe). However if this approach would benefit > ASR as well then it might be worth doing it this way rather than my > plan of a simple greedy normaliser. > > > v best > > Matthew Aylett > > > On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos > <dva...@gm...> wrote: > > > We have access to several corpora and we are trying to put > together something appropriate. > > In the next couple of days, we will also volunteer a server > to set it all up and run the tests. > > Dimitris > > > On 24 Μαΐ 2015, at 02:06, Daniel Povey <dp...@gm...> > wrote: > > > > One possibility is to use a completely open-source setup, > e.g. > > Voxforge, and forget about the "has a clear advantage" > requirement. > > E.g. target anything that looks like a year, and make a > grammar for > > years. > > Dan > > > > > > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel > > <nag...@go...> wrote: > >> Since I cannot volunteer my enviornment, do you > recommend another > >> enviornment where this can be prototyped and where you > can check in some > >> class lm recipe that has advantage. > >> > >> Nagendra > >> > >> Nagendra Kumar Goel > >> > >>> On May 21, 2015 11:01 PM, "Dimitris Vassos" > <dva...@gm...> wrote: > >>> > >>> +1 for the class-based LMs. I have also been interested > in this > >>> functionality for some time now, so will be more than > happy to try out the > >>> current implementation, if possible. > >>> > >>> Thanks > >>> Dimitris > >>> > >>>> On 22 Μαΐ 2015, at 01:34, > kal...@li... > >>>> wrote: > >>>> > >>>> Send Kaldi-users mailing list submissions to > >>>> kal...@li... > >>>> > >>>> To subscribe or unsubscribe via the World Wide Web, > visit > >>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> or, via email, send a message with subject or body > 'help' to > >>>> kal...@li... > >>>> > >>>> You can reach the person managing the list at > >>>> kal...@li... > >>>> > >>>> When replying, please edit your Subject line so it is > more specific > >>>> than "Re: Contents of Kaldi-users digest..." > >>>> > >>>> > >>>> Today's Topics: > >>>> > >>>> 1. Re: LM grafting (Daniel Povey) > >>>> 2. Re: LM grafting (Kirill Katsnelson) > >>>> 3. Re: LM grafting (Hainan Xu) > >>>> 4. Re: LM grafting (Sean True) > >>>> > >>>> > >>>> > ---------------------------------------------------------------------- > >>>> > >>>> Message: 1 > >>>> Date: Thu, 21 May 2015 15:04:04 -0400 > >>>> From: Daniel Povey <dp...@gm...> > >>>> Subject: Re: [Kaldi-users] LM grafting > >>>> To: Sean True <se...@se...> > >>>> Cc: Hainan Xu <hai...@gm...>, > >>>> "kal...@li..." > >>>> <kal...@li...>, Kirill > Katsnelson > >>>> <kir...@sm...> > >>>> Message-ID: > >>>> > <CAEWAuySHaXwdNJZAoL6CanzHth=k4Y...@ma... > <mailto:k4YJVsBiAfEuFDFMvY%2B...@ma...> > > >>>> Content-Type: text/plain; charset=UTF-8 > >>>> > >>>> The general approach is to create an FST for the > little language > >>>> model, and then to use fstreplace to replace instances > of a particular > >>>> symbol in the top-level language model, with that FST. > >>>> The tricky part is ensuring that the result is > determinizable after > >>>> composing with the lexicon. In general our solution > is to add special > >>>> disambiguation symbols at the beginning and end of > each of the > >>>> sub-FSTs, and of course making sure that the sub-FSTs > are themselves > >>>> determinizable. > >>>> Dan > >>>> > >>>> > >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True > <se...@se...> > >>>>> wrote: > >>>>> That's a subject of some general interest. Is there a > discussion of the > >>>>> general approach that was taken somewhere? > >>>>> > >>>>> -- Sean > >>>>> > >>>>> Sean True > >>>>> Semantic Machines > >>>>> > >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey > <dp...@gm...> > >>>>>> wrote: > >>>>>> > >>>>>> Nagendra Goel has worked on some example scripts for > this type of > >>>>>> thing, and with Hainan we were working on trying to > get it cleaned up > >>>>>> and checked in, but he's going for an internship so > it will have to > >>>>>> wait. But Nagendra might be willing to share it > with you. > >>>>>> Dan > >>>>>> > >>>>>> > >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson > >>>>>> <kir...@sm...> wrote: > >>>>>>> Suppose I have a language model where one token (a > "word") is a > >>>>>>> pointer > >>>>>>> to a whole another LM. This is a practical case > when you expect an > >>>>>>> abrupt > >>>>>>> change in model, a clear example being "my phone > number is..." and > >>>>>>> then > >>>>>>> you'd expect them rattling a string of digits. > Is there any support > >>>>>>> in kaldi > >>>>>>> for this? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> -kkm > >>>>>>> > >>>>>>> > >>>>>>> > ----------------------------------------------------------------------- > - > ------ > >>>>>>> One dashboard for servers and applications across > >>>>>>> Physical-Virtual-Cloud > >>>>>>> Widest out-of-the-box monitoring support with > 50+ applications > >>>>>>> Performance metrics, stats and reports that give > you Actionable > >>>>>>> Insights > >>>>>>> Deep dive visibility with transaction tracing using > APM Insight. > >>>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>> _______________________________________________ > >>>>>>> Kaldi-users mailing list > >>>>>>> Kal...@li... > >>>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>>> > >>>>>> > >>>>>> > >>>>>> > ----------------------------------------------------------------------- > - > ------ > >>>>>> One dashboard for servers and applications across > >>>>>> Physical-Virtual-Cloud > >>>>>> Widest out-of-the-box monitoring support with 50+ > applications > >>>>>> Performance metrics, stats and reports that give you > Actionable > >>>>>> Insights > >>>>>> Deep dive visibility with transaction tracing using > APM Insight. > >>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>> _______________________________________________ > >>>>>> Kaldi-users mailing list > >>>>>> Kal...@li... > >>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> > >>>> > >>>> > >>>> ------------------------------ > >>>> > >>>> Message: 2 > >>>> Date: Thu, 21 May 2015 19:24:38 +0000 > >>>> From: Kirill Katsnelson > <kir...@sm...> > >>>> Subject: Re: [Kaldi-users] LM grafting > >>>> To: "dp...@gm..." <dp...@gm...>, Sean True > >>>> <se...@se...> > >>>> Cc: Hainan Xu <hai...@gm...>, > >>>> "kal...@li..." > >>>> <kal...@li...> > >>>> Message-ID: > >>>> > >>>> > <CY1...@CY...d.out > l > ook.com> > >>>> > >>>> Content-Type: text/plain; charset="utf-8" > >>>> > >>>> Also, from the practical standpoint, > backoff/discounting weights usually > >>>> need to be massaged. Otherwise when the grafted LM is > small and the main LM > >>>> is large, the little model will tend to shoehorn an > utterance into itself > >>>> rather than let go of it. In my phone number example, > everything becomes > >>>> digits once the phone number starts. > >>>> > >>>> -kkm > >>>> > >>>>> -----Original Message----- > >>>>> From: Daniel Povey [mailto:dp...@gm...] > >>>>> Sent: 2015-05-21 1204 > >>>>> To: Sean True > >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu; > kaldi- > >>>>> us...@li... > >>>>> Subject: Re: [Kaldi-users] LM grafting > >>>>> > >>>>> The general approach is to create an FST for the > little language model, > >>>>> and then to use fstreplace to replace instances of a > particular symbol > >>>>> in the top-level language model, with that FST. > >>>>> The tricky part is ensuring that the result is > determinizable after > >>>>> composing with the lexicon. In general our solution > is to add special > >>>>> disambiguation symbols at the beginning and end of > each of the sub- > >>>>> FSTs, and of course making sure that the sub-FSTs are > themselves > >>>>> determinizable. > >>>>> Dan > >>>>> > >>>>> > >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True > <se...@se...> > >>>>> wrote: > >>>>>> That's a subject of some general interest. Is there > a discussion of > >>>>>> the general approach that was taken somewhere? > >>>>>> > >>>>>> -- Sean > >>>>>> > >>>>>> Sean True > >>>>>> Semantic Machines > >>>>>> > >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey > <dp...@gm...> > >>>>> wrote: > >>>>>>> > >>>>>>> Nagendra Goel has worked on some example scripts > for this type of > >>>>>>> thing, and with Hainan we were working on trying to > get it cleaned > >>>>> up > >>>>>>> and checked in, but he's going for an internship so > it will have to > >>>>>>> wait. But Nagendra might be willing to share it > with you. > >>>>>>> Dan > >>>>>>> > >>>>>>> > >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson > >>>>>>> <kir...@sm...> wrote: > >>>>>>>> Suppose I have a language model where one token (a > "word") is a > >>>>>>>> pointer to a whole another LM. This is a practical > case when you > >>>>>>>> expect an abrupt change in model, a clear example > being "my phone > >>>>>>>> number is..." and then you'd expect them rattling > a string of > >>>>>>>> digits. Is there any support in kaldi for this? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> -kkm > >>>>>>>> > >>>>>>>> > ------------------------------------------------------------------ > >>>>> - > >>>>>>>> ----------- One dashboard for servers and > applications across > >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box > monitoring support > >>>>>>>> with 50+ applications Performance metrics, stats > and reports that > >>>>>>>> give you Actionable Insights Deep dive visibility > with transaction > >>>>>>>> tracing using APM Insight. > >>>>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>>> _______________________________________________ > >>>>>>>> Kaldi-users mailing list > >>>>>>>> Kal...@li... > >>>>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>>>> > >>>>>>> > >>>>>>> > -------------------------------------------------------------------- > >>>>> - > >>>>>>> --------- One dashboard for servers and > applications across > >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box > monitoring support with > >>>>>>> 50+ applications Performance metrics, stats and > reports that give > >>>>> you > >>>>>>> Actionable Insights Deep dive visibility with > transaction tracing > >>>>>>> using APM Insight. > >>>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>> _______________________________________________ > >>>>>>> Kaldi-users mailing list > >>>>>>> Kal...@li... > >>>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> > >>>> ------------------------------ > >>>> > >>>> Message: 3 > >>>> Date: Thu, 21 May 2015 15:29:54 -0400 > >>>> From: Hainan Xu <hai...@gm...> > >>>> Subject: Re: [Kaldi-users] LM grafting > >>>> To: Daniel Povey <dp...@gm...> > >>>> Cc: Sean True <se...@se...>, > >>>> "kal...@li..." > >>>> <kal...@li...>, Kirill > Katsnelson > >>>> <kir...@sm...> > >>>> Message-ID: > >>>> > <CALP+BDZvJP-2cZ+fEJEXaMaVWzgy63mtc=J1E...@ma...> > >>>> Content-Type: text/plain; charset="utf-8" > >>>> > >>>> There is a paper in ICASSP 2015 that described some > very similar idea: > >>>> > >>>> Improved recognition of contact names in voice > commands > >>>> > >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey > <dp...@gm...> wrote: > >>>>> > >>>>> The general approach is to create an FST for the > little language > >>>>> model, and then to use fstreplace to replace > instances of a particular > >>>>> symbol in the top-level language model, with that > FST. > >>>>> The tricky part is ensuring that the result is > determinizable after > >>>>> composing with the lexicon. In general our solution > is to add special > >>>>> disambiguation symbols at the beginning and end of > each of the > >>>>> sub-FSTs, and of course making sure that the sub-FSTs > are themselves > >>>>> determinizable. > >>>>> Dan > >>>>> > >>>>> > >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True > <se...@se...> > >>>>> wrote: > >>>>>> That's a subject of some general interest. Is there > a discussion of > >>>>>> the > >>>>>> general approach that was taken somewhere? > >>>>>> > >>>>>> -- Sean > >>>>>> > >>>>>> Sean True > >>>>>> Semantic Machines > >>>>>> > >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey > <dp...@gm...> > >>>>>>> wrote: > >>>>>>> > >>>>>>> Nagendra Goel has worked on some example scripts > for this type of > >>>>>>> thing, and with Hainan we were working on trying to > get it cleaned up > >>>>>>> and checked in, but he's going for an internship so > it will have to > >>>>>>> wait. But Nagendra might be willing to share it > with you. > >>>>>>> Dan > >>>>>>> > >>>>>>> > >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson > >>>>>>> <kir...@sm...> wrote: > >>>>>>>> Suppose I have a language model where one token (a > "word") is a > >>>>> pointer > >>>>>>>> to a whole another LM. This is a practical case > when you expect an > >>>>> abrupt > >>>>>>>> change in model, a clear example being "my phone > number is..." and > >>>>> then > >>>>>>>> you'd expect them rattling a string of digits. > Is there any support > >>>>> in kaldi > >>>>>>>> for this? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> -kkm > >>>>> > >>>>> > ----------------------------------------------------------------------- > - > ------ > >>>>>>>> One dashboard for servers and applications across > >>>>> Physical-Virtual-Cloud > >>>>>>>> Widest out-of-the-box monitoring support with > 50+ applications > >>>>>>>> Performance metrics, stats and reports that give > you Actionable > >>>>> Insights > >>>>>>>> Deep dive visibility with transaction tracing > using APM Insight. > >>>>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>>> _______________________________________________ > >>>>>>>> Kaldi-users mailing list > >>>>>>>> Kal...@li... > >>>>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>> > >>>>> > ----------------------------------------------------------------------- > - > ------ > >>>>>>> One dashboard for servers and applications across > >>>>>>> Physical-Virtual-Cloud > >>>>>>> Widest out-of-the-box monitoring support with > 50+ applications > >>>>>>> Performance metrics, stats and reports that give > you Actionable > >>>>>>> Insights > >>>>>>> Deep dive visibility with transaction tracing using > APM Insight. > >>>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>> _______________________________________________ > >>>>>>> Kaldi-users mailing list > >>>>>>> Kal...@li... > >>>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> > >>>> > >>>> > >>>> -- > >>>> - Hainan > >>>> -------------- next part -------------- > >>>> An HTML attachment was scrubbed... > >>>> > >>>> ------------------------------ > >>>> > >>>> Message: 4 > >>>> Date: Thu, 21 May 2015 15:01:51 -0400 > >>>> From: Sean True <se...@se...> > >>>> Subject: Re: [Kaldi-users] LM grafting > >>>> To: Daniel Povey <dp...@gm...> > >>>> Cc: Hainan Xu <hai...@gm...>, > >>>> "kal...@li..." > >>>> <kal...@li...>, Kirill > Katsnelson > >>>> <kir...@sm...> > >>>> Message-ID: > >>>> > <CALtEaHntdAcmO_Ji5dxsPnT8i9M_LVuGnY0UjkJUPp=pY...@ma...> > >>>> Content-Type: text/plain; charset="utf-8" > >>>> > >>>> That's a subject of some general interest. Is there a > discussion of the > >>>> general approach that was taken somewhere? > >>>> > >>>> -- Sean > >>>> > >>>> Sean True > >>>> Semantic Machines > >>>> > >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey > <dp...@gm...> wrote: > >>>>> > >>>>> Nagendra Goel has worked on some example scripts for > this type of > >>>>> thing, and with Hainan we were working on trying to > get it cleaned up > >>>>> and checked in, but he's going for an internship so > it will have to > >>>>> wait. But Nagendra might be willing to share it with > you. > >>>>> Dan > >>>>> > >>>>> > >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson > >>>>> <kir...@sm...> wrote: > >>>>>> Suppose I have a language model where one token (a > "word") is a > >>>>>> pointer > >>>>> to a whole another LM. This is a practical case when > you expect an > >>>>> abrupt > >>>>> change in model, a clear example being "my phone > number is..." and then > >>>>> you'd expect them rattling a string of digits. Is > there any support in > >>>>> kaldi for this? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> -kkm > >>>>> > >>>>> > ----------------------------------------------------------------------- > - > ------ > >>>>>> One dashboard for servers and applications across > >>>>>> Physical-Virtual-Cloud > >>>>>> Widest out-of-the-box monitoring support with 50+ > applications > >>>>>> Performance metrics, stats and reports that give you > Actionable > >>>>>> Insights > >>>>>> Deep dive visibility with transaction tracing using > APM Insight. > >>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>> _______________________________________________ > >>>>>> Kaldi-users mailing list > >>>>>> Kal...@li... > >>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>> > >>>>> > >>>>> > >>>>> > ----------------------------------------------------------------------- > - > ------ > >>>>> One dashboard for servers and applications across > >>>>> Physical-Virtual-Cloud > >>>>> Widest out-of-the-box monitoring support with 50+ > applications > >>>>> Performance metrics, stats and reports that give you > Actionable > >>>>> Insights > >>>>> Deep dive visibility with transaction tracing using > APM Insight. > >>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>> _______________________________________________ > >>>>> Kaldi-users mailing list > >>>>> Kal...@li... > >>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> -------------- next part -------------- > >>>> An HTML attachment was scrubbed... > >>>> > >>>> ------------------------------ > >>>> > >>>> > >>>> > ----------------------------------------------------------------------- > - > ------ > >>>> One dashboard for servers and applications across > Physical-Virtual-Cloud > >>>> Widest out-of-the-box monitoring support with 50+ > applications > >>>> Performance metrics, stats and reports that give you > Actionable Insights > >>>> Deep dive visibility with transaction tracing using > APM Insight. > >>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>> > >>>> ------------------------------ > >>>> > >>>> _______________________________________________ > >>>> Kaldi-users mailing list > >>>> Kal...@li... > >>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> > >>>> > >>>> End of Kaldi-users Digest, Vol 29, Issue 15 > >>>> ******************************************* > >>> > >>> > >>> > ----------------------------------------------------------------------- > - > ------ > >>> One dashboard for servers and applications across > Physical-Virtual-Cloud > >>> Widest out-of-the-box monitoring support with 50+ > applications > >>> Performance metrics, stats and reports that give you > Actionable Insights > >>> Deep dive visibility with transaction tracing using APM > Insight. > >>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>> _______________________________________________ > >>> Kaldi-users mailing list > >>> Kal...@li... > >>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > >> > >> > ----------------------------------------------------------------------- > - > ------ > >> One dashboard for servers and applications across > Physical-Virtual-Cloud > >> Widest out-of-the-box monitoring support with 50+ > applications > >> Performance metrics, stats and reports that give you > Actionable Insights > >> Deep dive visibility with transaction tracing using APM > Insight. > >> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >> _______________________________________________ > >> Kaldi-users mailing list > >> Kal...@li... > >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > > > ----------------------------------------------------------------------- > - > ------ > One dashboard for servers and applications across Physical- > Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ > applications > Performance metrics, stats and reports that give you > Actionable Insights > Deep dive visibility with transaction tracing using APM > Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > ----------------------------------------------------------------------- > - > ------ > One dashboard for servers and applications across Physical- > Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable > Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
From: Nagendra G. <nag...@go...> - 2015-05-24 22:42:28
|
A systematic way for identifying special elements in text will be very useful. Currently NSW-EXPAND from festival conflicts with this sub-grammar approach although otherwise it's a good lm pre-processing step. Nagendra Kumar Goel On May 24, 2015 4:45 PM, "Matthew Aylett" <mat...@gm...> wrote: > Not sure if this is relevant to this thread. But in the speech synthesis > system branch we have a very early text normaliser which (when complete) > will detect things like phone numbers addresses, currencies etc. The output > form this could then be used to inform language model building. Currently > it deals with symbols and tokenisations in English. > > Potentially `(although I wasn't currently planning on this), the text > normaliser could be written in thrax - based on openfst - authored by > Richard Sproat I believe). However if this approach would benefit ASR as > well then it might be worth doing it this way rather than my plan of a > simple greedy normaliser. > > v best > > Matthew Aylett > > > On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos <dva...@gm...> > wrote: > >> We have access to several corpora and we are trying to put together >> something appropriate. >> >> In the next couple of days, we will also volunteer a server to set it all >> up and run the tests. >> >> Dimitris >> >> > On 24 Μαΐ 2015, at 02:06, Daniel Povey <dp...@gm...> wrote: >> > >> > One possibility is to use a completely open-source setup, e.g. >> > Voxforge, and forget about the "has a clear advantage" requirement. >> > E.g. target anything that looks like a year, and make a grammar for >> > years. >> > Dan >> > >> > >> > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel >> > <nag...@go...> wrote: >> >> Since I cannot volunteer my enviornment, do you recommend another >> >> enviornment where this can be prototyped and where you can check in >> some >> >> class lm recipe that has advantage. >> >> >> >> Nagendra >> >> >> >> Nagendra Kumar Goel >> >> >> >>> On May 21, 2015 11:01 PM, "Dimitris Vassos" <dva...@gm...> >> wrote: >> >>> >> >>> +1 for the class-based LMs. I have also been interested in this >> >>> functionality for some time now, so will be more than happy to try >> out the >> >>> current implementation, if possible. >> >>> >> >>> Thanks >> >>> Dimitris >> >>> >> >>>> On 22 Μαΐ 2015, at 01:34, kal...@li... >> >>>> wrote: >> >>>> >> >>>> Send Kaldi-users mailing list submissions to >> >>>> kal...@li... >> >>>> >> >>>> To subscribe or unsubscribe via the World Wide Web, visit >> >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> or, via email, send a message with subject or body 'help' to >> >>>> kal...@li... >> >>>> >> >>>> You can reach the person managing the list at >> >>>> kal...@li... >> >>>> >> >>>> When replying, please edit your Subject line so it is more specific >> >>>> than "Re: Contents of Kaldi-users digest..." >> >>>> >> >>>> >> >>>> Today's Topics: >> >>>> >> >>>> 1. Re: LM grafting (Daniel Povey) >> >>>> 2. Re: LM grafting (Kirill Katsnelson) >> >>>> 3. Re: LM grafting (Hainan Xu) >> >>>> 4. Re: LM grafting (Sean True) >> >>>> >> >>>> >> >>>> >> ---------------------------------------------------------------------- >> >>>> >> >>>> Message: 1 >> >>>> Date: Thu, 21 May 2015 15:04:04 -0400 >> >>>> From: Daniel Povey <dp...@gm...> >> >>>> Subject: Re: [Kaldi-users] LM grafting >> >>>> To: Sean True <se...@se...> >> >>>> Cc: Hainan Xu <hai...@gm...>, >> >>>> "kal...@li..." >> >>>> <kal...@li...>, Kirill Katsnelson >> >>>> <kir...@sm...> >> >>>> Message-ID: >> >>>> <CAEWAuySHaXwdNJZAoL6CanzHth= >> k4Y...@ma...> >> >>>> Content-Type: text/plain; charset=UTF-8 >> >>>> >> >>>> The general approach is to create an FST for the little language >> >>>> model, and then to use fstreplace to replace instances of a >> particular >> >>>> symbol in the top-level language model, with that FST. >> >>>> The tricky part is ensuring that the result is determinizable after >> >>>> composing with the lexicon. In general our solution is to add >> special >> >>>> disambiguation symbols at the beginning and end of each of the >> >>>> sub-FSTs, and of course making sure that the sub-FSTs are themselves >> >>>> determinizable. >> >>>> Dan >> >>>> >> >>>> >> >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True < >> se...@se...> >> >>>>> wrote: >> >>>>> That's a subject of some general interest. Is there a discussion of >> the >> >>>>> general approach that was taken somewhere? >> >>>>> >> >>>>> -- Sean >> >>>>> >> >>>>> Sean True >> >>>>> Semantic Machines >> >>>>> >> >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> >> >>>>>> wrote: >> >>>>>> >> >>>>>> Nagendra Goel has worked on some example scripts for this type of >> >>>>>> thing, and with Hainan we were working on trying to get it cleaned >> up >> >>>>>> and checked in, but he's going for an internship so it will have to >> >>>>>> wait. But Nagendra might be willing to share it with you. >> >>>>>> Dan >> >>>>>> >> >>>>>> >> >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson >> >>>>>> <kir...@sm...> wrote: >> >>>>>>> Suppose I have a language model where one token (a "word") is a >> >>>>>>> pointer >> >>>>>>> to a whole another LM. This is a practical case when you expect an >> >>>>>>> abrupt >> >>>>>>> change in model, a clear example being "my phone number is..." and >> >>>>>>> then >> >>>>>>> you'd expect them rattling a string of digits. Is there any >> support >> >>>>>>> in kaldi >> >>>>>>> for this? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> >> >>>>>>> -kkm >> >>>>>>> >> >>>>>>> >> >>>>>>> >> ------------------------------------------------------------------------------ >> >>>>>>> One dashboard for servers and applications across >> >>>>>>> Physical-Virtual-Cloud >> >>>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>>>> Insights >> >>>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>> _______________________________________________ >> >>>>>>> Kaldi-users mailing list >> >>>>>>> Kal...@li... >> >>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> ------------------------------------------------------------------------------ >> >>>>>> One dashboard for servers and applications across >> >>>>>> Physical-Virtual-Cloud >> >>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>>> Insights >> >>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>> _______________________________________________ >> >>>>>> Kaldi-users mailing list >> >>>>>> Kal...@li... >> >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> >> >>>> >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> Message: 2 >> >>>> Date: Thu, 21 May 2015 19:24:38 +0000 >> >>>> From: Kirill Katsnelson <kir...@sm...> >> >>>> Subject: Re: [Kaldi-users] LM grafting >> >>>> To: "dp...@gm..." <dp...@gm...>, Sean True >> >>>> <se...@se...> >> >>>> Cc: Hainan Xu <hai...@gm...>, >> >>>> "kal...@li..." >> >>>> <kal...@li...> >> >>>> Message-ID: >> >>>> >> >>>> < >> CY1...@CY... >> > >> >>>> >> >>>> Content-Type: text/plain; charset="utf-8" >> >>>> >> >>>> Also, from the practical standpoint, backoff/discounting weights >> usually >> >>>> need to be massaged. Otherwise when the grafted LM is small and the >> main LM >> >>>> is large, the little model will tend to shoehorn an utterance into >> itself >> >>>> rather than let go of it. In my phone number example, everything >> becomes >> >>>> digits once the phone number starts. >> >>>> >> >>>> -kkm >> >>>> >> >>>>> -----Original Message----- >> >>>>> From: Daniel Povey [mailto:dp...@gm...] >> >>>>> Sent: 2015-05-21 1204 >> >>>>> To: Sean True >> >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu; kaldi- >> >>>>> us...@li... >> >>>>> Subject: Re: [Kaldi-users] LM grafting >> >>>>> >> >>>>> The general approach is to create an FST for the little language >> model, >> >>>>> and then to use fstreplace to replace instances of a particular >> symbol >> >>>>> in the top-level language model, with that FST. >> >>>>> The tricky part is ensuring that the result is determinizable after >> >>>>> composing with the lexicon. In general our solution is to add >> special >> >>>>> disambiguation symbols at the beginning and end of each of the sub- >> >>>>> FSTs, and of course making sure that the sub-FSTs are themselves >> >>>>> determinizable. >> >>>>> Dan >> >>>>> >> >>>>> >> >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True < >> se...@se...> >> >>>>> wrote: >> >>>>>> That's a subject of some general interest. Is there a discussion of >> >>>>>> the general approach that was taken somewhere? >> >>>>>> >> >>>>>> -- Sean >> >>>>>> >> >>>>>> Sean True >> >>>>>> Semantic Machines >> >>>>>> >> >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> >> >>>>> wrote: >> >>>>>>> >> >>>>>>> Nagendra Goel has worked on some example scripts for this type of >> >>>>>>> thing, and with Hainan we were working on trying to get it cleaned >> >>>>> up >> >>>>>>> and checked in, but he's going for an internship so it will have >> to >> >>>>>>> wait. But Nagendra might be willing to share it with you. >> >>>>>>> Dan >> >>>>>>> >> >>>>>>> >> >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson >> >>>>>>> <kir...@sm...> wrote: >> >>>>>>>> Suppose I have a language model where one token (a "word") is a >> >>>>>>>> pointer to a whole another LM. This is a practical case when you >> >>>>>>>> expect an abrupt change in model, a clear example being "my phone >> >>>>>>>> number is..." and then you'd expect them rattling a string of >> >>>>>>>> digits. Is there any support in kaldi for this? >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> >> >>>>>>>> -kkm >> >>>>>>>> >> >>>>>>>> >> ------------------------------------------------------------------ >> >>>>> - >> >>>>>>>> ----------- One dashboard for servers and applications across >> >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box monitoring support >> >>>>>>>> with 50+ applications Performance metrics, stats and reports that >> >>>>>>>> give you Actionable Insights Deep dive visibility with >> transaction >> >>>>>>>> tracing using APM Insight. >> >>>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>>> _______________________________________________ >> >>>>>>>> Kaldi-users mailing list >> >>>>>>>> Kal...@li... >> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>>>> >> >>>>>>> >> >>>>>>> >> -------------------------------------------------------------------- >> >>>>> - >> >>>>>>> --------- One dashboard for servers and applications across >> >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box monitoring support >> with >> >>>>>>> 50+ applications Performance metrics, stats and reports that give >> >>>>> you >> >>>>>>> Actionable Insights Deep dive visibility with transaction tracing >> >>>>>>> using APM Insight. >> >>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>> _______________________________________________ >> >>>>>>> Kaldi-users mailing list >> >>>>>>> Kal...@li... >> >>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> Message: 3 >> >>>> Date: Thu, 21 May 2015 15:29:54 -0400 >> >>>> From: Hainan Xu <hai...@gm...> >> >>>> Subject: Re: [Kaldi-users] LM grafting >> >>>> To: Daniel Povey <dp...@gm...> >> >>>> Cc: Sean True <se...@se...>, >> >>>> "kal...@li..." >> >>>> <kal...@li...>, Kirill Katsnelson >> >>>> <kir...@sm...> >> >>>> Message-ID: >> >>>> <CALP+BDZvJP-2cZ+fEJEXaMaVWzgy63mtc= >> J1E...@ma...> >> >>>> Content-Type: text/plain; charset="utf-8" >> >>>> >> >>>> There is a paper in ICASSP 2015 that described some very similar >> idea: >> >>>> >> >>>> Improved recognition of contact names in voice commands >> >>>> >> >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey <dp...@gm...> >> wrote: >> >>>>> >> >>>>> The general approach is to create an FST for the little language >> >>>>> model, and then to use fstreplace to replace instances of a >> particular >> >>>>> symbol in the top-level language model, with that FST. >> >>>>> The tricky part is ensuring that the result is determinizable after >> >>>>> composing with the lexicon. In general our solution is to add >> special >> >>>>> disambiguation symbols at the beginning and end of each of the >> >>>>> sub-FSTs, and of course making sure that the sub-FSTs are themselves >> >>>>> determinizable. >> >>>>> Dan >> >>>>> >> >>>>> >> >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True < >> se...@se...> >> >>>>> wrote: >> >>>>>> That's a subject of some general interest. Is there a discussion of >> >>>>>> the >> >>>>>> general approach that was taken somewhere? >> >>>>>> >> >>>>>> -- Sean >> >>>>>> >> >>>>>> Sean True >> >>>>>> Semantic Machines >> >>>>>> >> >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>> Nagendra Goel has worked on some example scripts for this type of >> >>>>>>> thing, and with Hainan we were working on trying to get it >> cleaned up >> >>>>>>> and checked in, but he's going for an internship so it will have >> to >> >>>>>>> wait. But Nagendra might be willing to share it with you. >> >>>>>>> Dan >> >>>>>>> >> >>>>>>> >> >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson >> >>>>>>> <kir...@sm...> wrote: >> >>>>>>>> Suppose I have a language model where one token (a "word") is a >> >>>>> pointer >> >>>>>>>> to a whole another LM. This is a practical case when you expect >> an >> >>>>> abrupt >> >>>>>>>> change in model, a clear example being "my phone number is..." >> and >> >>>>> then >> >>>>>>>> you'd expect them rattling a string of digits. Is there any >> support >> >>>>> in kaldi >> >>>>>>>> for this? >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> >> >>>>>>>> -kkm >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>>>>> One dashboard for servers and applications across >> >>>>> Physical-Virtual-Cloud >> >>>>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>> Insights >> >>>>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>>> _______________________________________________ >> >>>>>>>> Kaldi-users mailing list >> >>>>>>>> Kal...@li... >> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>>>> One dashboard for servers and applications across >> >>>>>>> Physical-Virtual-Cloud >> >>>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>>>> Insights >> >>>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>> _______________________________________________ >> >>>>>>> Kaldi-users mailing list >> >>>>>>> Kal...@li... >> >>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> - Hainan >> >>>> -------------- next part -------------- >> >>>> An HTML attachment was scrubbed... >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> Message: 4 >> >>>> Date: Thu, 21 May 2015 15:01:51 -0400 >> >>>> From: Sean True <se...@se...> >> >>>> Subject: Re: [Kaldi-users] LM grafting >> >>>> To: Daniel Povey <dp...@gm...> >> >>>> Cc: Hainan Xu <hai...@gm...>, >> >>>> "kal...@li..." >> >>>> <kal...@li...>, Kirill Katsnelson >> >>>> <kir...@sm...> >> >>>> Message-ID: >> >>>> <CALtEaHntdAcmO_Ji5dxsPnT8i9M_LVuGnY0UjkJUPp= >> pY...@ma...> >> >>>> Content-Type: text/plain; charset="utf-8" >> >>>> >> >>>> That's a subject of some general interest. Is there a discussion of >> the >> >>>> general approach that was taken somewhere? >> >>>> >> >>>> -- Sean >> >>>> >> >>>> Sean True >> >>>> Semantic Machines >> >>>> >> >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> >> wrote: >> >>>>> >> >>>>> Nagendra Goel has worked on some example scripts for this type of >> >>>>> thing, and with Hainan we were working on trying to get it cleaned >> up >> >>>>> and checked in, but he's going for an internship so it will have to >> >>>>> wait. But Nagendra might be willing to share it with you. >> >>>>> Dan >> >>>>> >> >>>>> >> >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson >> >>>>> <kir...@sm...> wrote: >> >>>>>> Suppose I have a language model where one token (a "word") is a >> >>>>>> pointer >> >>>>> to a whole another LM. This is a practical case when you expect an >> >>>>> abrupt >> >>>>> change in model, a clear example being "my phone number is..." and >> then >> >>>>> you'd expect them rattling a string of digits. Is there any support >> in >> >>>>> kaldi for this? >> >>>>>> >> >>>>>> Thanks, >> >>>>>> >> >>>>>> -kkm >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>>> One dashboard for servers and applications across >> >>>>>> Physical-Virtual-Cloud >> >>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>>> Insights >> >>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>> _______________________________________________ >> >>>>>> Kaldi-users mailing list >> >>>>>> Kal...@li... >> >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>> One dashboard for servers and applications across >> >>>>> Physical-Virtual-Cloud >> >>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>> Performance metrics, stats and reports that give you Actionable >> >>>>> Insights >> >>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>> _______________________________________________ >> >>>>> Kaldi-users mailing list >> >>>>> Kal...@li... >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> -------------- next part -------------- >> >>>> An HTML attachment was scrubbed... >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> >> >>>> >> ------------------------------------------------------------------------------ >> >>>> One dashboard for servers and applications across >> Physical-Virtual-Cloud >> >>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>> Performance metrics, stats and reports that give you Actionable >> Insights >> >>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> _______________________________________________ >> >>>> Kaldi-users mailing list >> >>>> Kal...@li... >> >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> >> >>>> >> >>>> End of Kaldi-users Digest, Vol 29, Issue 15 >> >>>> ******************************************* >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> One dashboard for servers and applications across >> Physical-Virtual-Cloud >> >>> Widest out-of-the-box monitoring support with 50+ applications >> >>> Performance metrics, stats and reports that give you Actionable >> Insights >> >>> Deep dive visibility with transaction tracing using APM Insight. >> >>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>> _______________________________________________ >> >>> Kaldi-users mailing list >> >>> Kal...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> One dashboard for servers and applications across >> Physical-Virtual-Cloud >> >> Widest out-of-the-box monitoring support with 50+ applications >> >> Performance metrics, stats and reports that give you Actionable >> Insights >> >> Deep dive visibility with transaction tracing using APM Insight. >> >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >> _______________________________________________ >> >> Kaldi-users mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> >> >> ------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
From: Matthew A. <mat...@gm...> - 2015-05-24 20:44:23
|
Not sure if this is relevant to this thread. But in the speech synthesis system branch we have a very early text normaliser which (when complete) will detect things like phone numbers addresses, currencies etc. The output form this could then be used to inform language model building. Currently it deals with symbols and tokenisations in English. Potentially `(although I wasn't currently planning on this), the text normaliser could be written in thrax - based on openfst - authored by Richard Sproat I believe). However if this approach would benefit ASR as well then it might be worth doing it this way rather than my plan of a simple greedy normaliser. v best Matthew Aylett On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos <dva...@gm...> wrote: > We have access to several corpora and we are trying to put together > something appropriate. > > In the next couple of days, we will also volunteer a server to set it all > up and run the tests. > > Dimitris > > > On 24 Μαΐ 2015, at 02:06, Daniel Povey <dp...@gm...> wrote: > > > > One possibility is to use a completely open-source setup, e.g. > > Voxforge, and forget about the "has a clear advantage" requirement. > > E.g. target anything that looks like a year, and make a grammar for > > years. > > Dan > > > > > > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel > > <nag...@go...> wrote: > >> Since I cannot volunteer my enviornment, do you recommend another > >> enviornment where this can be prototyped and where you can check in > some > >> class lm recipe that has advantage. > >> > >> Nagendra > >> > >> Nagendra Kumar Goel > >> > >>> On May 21, 2015 11:01 PM, "Dimitris Vassos" <dva...@gm...> > wrote: > >>> > >>> +1 for the class-based LMs. I have also been interested in this > >>> functionality for some time now, so will be more than happy to try out > the > >>> current implementation, if possible. > >>> > >>> Thanks > >>> Dimitris > >>> > >>>> On 22 Μαΐ 2015, at 01:34, kal...@li... > >>>> wrote: > >>>> > >>>> Send Kaldi-users mailing list submissions to > >>>> kal...@li... > >>>> > >>>> To subscribe or unsubscribe via the World Wide Web, visit > >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> or, via email, send a message with subject or body 'help' to > >>>> kal...@li... > >>>> > >>>> You can reach the person managing the list at > >>>> kal...@li... > >>>> > >>>> When replying, please edit your Subject line so it is more specific > >>>> than "Re: Contents of Kaldi-users digest..." > >>>> > >>>> > >>>> Today's Topics: > >>>> > >>>> 1. Re: LM grafting (Daniel Povey) > >>>> 2. Re: LM grafting (Kirill Katsnelson) > >>>> 3. Re: LM grafting (Hainan Xu) > >>>> 4. Re: LM grafting (Sean True) > >>>> > >>>> > >>>> ---------------------------------------------------------------------- > >>>> > >>>> Message: 1 > >>>> Date: Thu, 21 May 2015 15:04:04 -0400 > >>>> From: Daniel Povey <dp...@gm...> > >>>> Subject: Re: [Kaldi-users] LM grafting > >>>> To: Sean True <se...@se...> > >>>> Cc: Hainan Xu <hai...@gm...>, > >>>> "kal...@li..." > >>>> <kal...@li...>, Kirill Katsnelson > >>>> <kir...@sm...> > >>>> Message-ID: > >>>> <CAEWAuySHaXwdNJZAoL6CanzHth=k4Y...@ma... > > > >>>> Content-Type: text/plain; charset=UTF-8 > >>>> > >>>> The general approach is to create an FST for the little language > >>>> model, and then to use fstreplace to replace instances of a particular > >>>> symbol in the top-level language model, with that FST. > >>>> The tricky part is ensuring that the result is determinizable after > >>>> composing with the lexicon. In general our solution is to add special > >>>> disambiguation symbols at the beginning and end of each of the > >>>> sub-FSTs, and of course making sure that the sub-FSTs are themselves > >>>> determinizable. > >>>> Dan > >>>> > >>>> > >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True < > se...@se...> > >>>>> wrote: > >>>>> That's a subject of some general interest. Is there a discussion of > the > >>>>> general approach that was taken somewhere? > >>>>> > >>>>> -- Sean > >>>>> > >>>>> Sean True > >>>>> Semantic Machines > >>>>> > >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> > >>>>>> wrote: > >>>>>> > >>>>>> Nagendra Goel has worked on some example scripts for this type of > >>>>>> thing, and with Hainan we were working on trying to get it cleaned > up > >>>>>> and checked in, but he's going for an internship so it will have to > >>>>>> wait. But Nagendra might be willing to share it with you. > >>>>>> Dan > >>>>>> > >>>>>> > >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson > >>>>>> <kir...@sm...> wrote: > >>>>>>> Suppose I have a language model where one token (a "word") is a > >>>>>>> pointer > >>>>>>> to a whole another LM. This is a practical case when you expect an > >>>>>>> abrupt > >>>>>>> change in model, a clear example being "my phone number is..." and > >>>>>>> then > >>>>>>> you'd expect them rattling a string of digits. Is there any support > >>>>>>> in kaldi > >>>>>>> for this? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> -kkm > >>>>>>> > >>>>>>> > >>>>>>> > ------------------------------------------------------------------------------ > >>>>>>> One dashboard for servers and applications across > >>>>>>> Physical-Virtual-Cloud > >>>>>>> Widest out-of-the-box monitoring support with 50+ applications > >>>>>>> Performance metrics, stats and reports that give you Actionable > >>>>>>> Insights > >>>>>>> Deep dive visibility with transaction tracing using APM Insight. > >>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>> _______________________________________________ > >>>>>>> Kaldi-users mailing list > >>>>>>> Kal...@li... > >>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>>> > >>>>>> > >>>>>> > >>>>>> > ------------------------------------------------------------------------------ > >>>>>> One dashboard for servers and applications across > >>>>>> Physical-Virtual-Cloud > >>>>>> Widest out-of-the-box monitoring support with 50+ applications > >>>>>> Performance metrics, stats and reports that give you Actionable > >>>>>> Insights > >>>>>> Deep dive visibility with transaction tracing using APM Insight. > >>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>> _______________________________________________ > >>>>>> Kaldi-users mailing list > >>>>>> Kal...@li... > >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> > >>>> > >>>> > >>>> ------------------------------ > >>>> > >>>> Message: 2 > >>>> Date: Thu, 21 May 2015 19:24:38 +0000 > >>>> From: Kirill Katsnelson <kir...@sm...> > >>>> Subject: Re: [Kaldi-users] LM grafting > >>>> To: "dp...@gm..." <dp...@gm...>, Sean True > >>>> <se...@se...> > >>>> Cc: Hainan Xu <hai...@gm...>, > >>>> "kal...@li..." > >>>> <kal...@li...> > >>>> Message-ID: > >>>> > >>>> < > CY1...@CY... > > > >>>> > >>>> Content-Type: text/plain; charset="utf-8" > >>>> > >>>> Also, from the practical standpoint, backoff/discounting weights > usually > >>>> need to be massaged. Otherwise when the grafted LM is small and the > main LM > >>>> is large, the little model will tend to shoehorn an utterance into > itself > >>>> rather than let go of it. In my phone number example, everything > becomes > >>>> digits once the phone number starts. > >>>> > >>>> -kkm > >>>> > >>>>> -----Original Message----- > >>>>> From: Daniel Povey [mailto:dp...@gm...] > >>>>> Sent: 2015-05-21 1204 > >>>>> To: Sean True > >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu; kaldi- > >>>>> us...@li... > >>>>> Subject: Re: [Kaldi-users] LM grafting > >>>>> > >>>>> The general approach is to create an FST for the little language > model, > >>>>> and then to use fstreplace to replace instances of a particular > symbol > >>>>> in the top-level language model, with that FST. > >>>>> The tricky part is ensuring that the result is determinizable after > >>>>> composing with the lexicon. In general our solution is to add > special > >>>>> disambiguation symbols at the beginning and end of each of the sub- > >>>>> FSTs, and of course making sure that the sub-FSTs are themselves > >>>>> determinizable. > >>>>> Dan > >>>>> > >>>>> > >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True < > se...@se...> > >>>>> wrote: > >>>>>> That's a subject of some general interest. Is there a discussion of > >>>>>> the general approach that was taken somewhere? > >>>>>> > >>>>>> -- Sean > >>>>>> > >>>>>> Sean True > >>>>>> Semantic Machines > >>>>>> > >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> > >>>>> wrote: > >>>>>>> > >>>>>>> Nagendra Goel has worked on some example scripts for this type of > >>>>>>> thing, and with Hainan we were working on trying to get it cleaned > >>>>> up > >>>>>>> and checked in, but he's going for an internship so it will have to > >>>>>>> wait. But Nagendra might be willing to share it with you. > >>>>>>> Dan > >>>>>>> > >>>>>>> > >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson > >>>>>>> <kir...@sm...> wrote: > >>>>>>>> Suppose I have a language model where one token (a "word") is a > >>>>>>>> pointer to a whole another LM. This is a practical case when you > >>>>>>>> expect an abrupt change in model, a clear example being "my phone > >>>>>>>> number is..." and then you'd expect them rattling a string of > >>>>>>>> digits. Is there any support in kaldi for this? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> -kkm > >>>>>>>> > >>>>>>>> ------------------------------------------------------------------ > >>>>> - > >>>>>>>> ----------- One dashboard for servers and applications across > >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box monitoring support > >>>>>>>> with 50+ applications Performance metrics, stats and reports that > >>>>>>>> give you Actionable Insights Deep dive visibility with transaction > >>>>>>>> tracing using APM Insight. > >>>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>>> _______________________________________________ > >>>>>>>> Kaldi-users mailing list > >>>>>>>> Kal...@li... > >>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>>>> > >>>>>>> > >>>>>>> > -------------------------------------------------------------------- > >>>>> - > >>>>>>> --------- One dashboard for servers and applications across > >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box monitoring support > with > >>>>>>> 50+ applications Performance metrics, stats and reports that give > >>>>> you > >>>>>>> Actionable Insights Deep dive visibility with transaction tracing > >>>>>>> using APM Insight. > >>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>> _______________________________________________ > >>>>>>> Kaldi-users mailing list > >>>>>>> Kal...@li... > >>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> > >>>> ------------------------------ > >>>> > >>>> Message: 3 > >>>> Date: Thu, 21 May 2015 15:29:54 -0400 > >>>> From: Hainan Xu <hai...@gm...> > >>>> Subject: Re: [Kaldi-users] LM grafting > >>>> To: Daniel Povey <dp...@gm...> > >>>> Cc: Sean True <se...@se...>, > >>>> "kal...@li..." > >>>> <kal...@li...>, Kirill Katsnelson > >>>> <kir...@sm...> > >>>> Message-ID: > >>>> <CALP+BDZvJP-2cZ+fEJEXaMaVWzgy63mtc=J1E...@ma... > > > >>>> Content-Type: text/plain; charset="utf-8" > >>>> > >>>> There is a paper in ICASSP 2015 that described some very similar idea: > >>>> > >>>> Improved recognition of contact names in voice commands > >>>> > >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey <dp...@gm...> > wrote: > >>>>> > >>>>> The general approach is to create an FST for the little language > >>>>> model, and then to use fstreplace to replace instances of a > particular > >>>>> symbol in the top-level language model, with that FST. > >>>>> The tricky part is ensuring that the result is determinizable after > >>>>> composing with the lexicon. In general our solution is to add > special > >>>>> disambiguation symbols at the beginning and end of each of the > >>>>> sub-FSTs, and of course making sure that the sub-FSTs are themselves > >>>>> determinizable. > >>>>> Dan > >>>>> > >>>>> > >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True < > se...@se...> > >>>>> wrote: > >>>>>> That's a subject of some general interest. Is there a discussion of > >>>>>> the > >>>>>> general approach that was taken somewhere? > >>>>>> > >>>>>> -- Sean > >>>>>> > >>>>>> Sean True > >>>>>> Semantic Machines > >>>>>> > >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> > >>>>>>> wrote: > >>>>>>> > >>>>>>> Nagendra Goel has worked on some example scripts for this type of > >>>>>>> thing, and with Hainan we were working on trying to get it cleaned > up > >>>>>>> and checked in, but he's going for an internship so it will have to > >>>>>>> wait. But Nagendra might be willing to share it with you. > >>>>>>> Dan > >>>>>>> > >>>>>>> > >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson > >>>>>>> <kir...@sm...> wrote: > >>>>>>>> Suppose I have a language model where one token (a "word") is a > >>>>> pointer > >>>>>>>> to a whole another LM. This is a practical case when you expect an > >>>>> abrupt > >>>>>>>> change in model, a clear example being "my phone number is..." and > >>>>> then > >>>>>>>> you'd expect them rattling a string of digits. Is there any > support > >>>>> in kaldi > >>>>>>>> for this? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> -kkm > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>>>>> One dashboard for servers and applications across > >>>>> Physical-Virtual-Cloud > >>>>>>>> Widest out-of-the-box monitoring support with 50+ applications > >>>>>>>> Performance metrics, stats and reports that give you Actionable > >>>>> Insights > >>>>>>>> Deep dive visibility with transaction tracing using APM Insight. > >>>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>>> _______________________________________________ > >>>>>>>> Kaldi-users mailing list > >>>>>>>> Kal...@li... > >>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>>>> One dashboard for servers and applications across > >>>>>>> Physical-Virtual-Cloud > >>>>>>> Widest out-of-the-box monitoring support with 50+ applications > >>>>>>> Performance metrics, stats and reports that give you Actionable > >>>>>>> Insights > >>>>>>> Deep dive visibility with transaction tracing using APM Insight. > >>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>>> _______________________________________________ > >>>>>>> Kaldi-users mailing list > >>>>>>> Kal...@li... > >>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> > >>>> > >>>> > >>>> -- > >>>> - Hainan > >>>> -------------- next part -------------- > >>>> An HTML attachment was scrubbed... > >>>> > >>>> ------------------------------ > >>>> > >>>> Message: 4 > >>>> Date: Thu, 21 May 2015 15:01:51 -0400 > >>>> From: Sean True <se...@se...> > >>>> Subject: Re: [Kaldi-users] LM grafting > >>>> To: Daniel Povey <dp...@gm...> > >>>> Cc: Hainan Xu <hai...@gm...>, > >>>> "kal...@li..." > >>>> <kal...@li...>, Kirill Katsnelson > >>>> <kir...@sm...> > >>>> Message-ID: > >>>> <CALtEaHntdAcmO_Ji5dxsPnT8i9M_LVuGnY0UjkJUPp=pY...@ma... > > > >>>> Content-Type: text/plain; charset="utf-8" > >>>> > >>>> That's a subject of some general interest. Is there a discussion of > the > >>>> general approach that was taken somewhere? > >>>> > >>>> -- Sean > >>>> > >>>> Sean True > >>>> Semantic Machines > >>>> > >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> > wrote: > >>>>> > >>>>> Nagendra Goel has worked on some example scripts for this type of > >>>>> thing, and with Hainan we were working on trying to get it cleaned up > >>>>> and checked in, but he's going for an internship so it will have to > >>>>> wait. But Nagendra might be willing to share it with you. > >>>>> Dan > >>>>> > >>>>> > >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson > >>>>> <kir...@sm...> wrote: > >>>>>> Suppose I have a language model where one token (a "word") is a > >>>>>> pointer > >>>>> to a whole another LM. This is a practical case when you expect an > >>>>> abrupt > >>>>> change in model, a clear example being "my phone number is..." and > then > >>>>> you'd expect them rattling a string of digits. Is there any support > in > >>>>> kaldi for this? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> -kkm > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>>> One dashboard for servers and applications across > >>>>>> Physical-Virtual-Cloud > >>>>>> Widest out-of-the-box monitoring support with 50+ applications > >>>>>> Performance metrics, stats and reports that give you Actionable > >>>>>> Insights > >>>>>> Deep dive visibility with transaction tracing using APM Insight. > >>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>>> _______________________________________________ > >>>>>> Kaldi-users mailing list > >>>>>> Kal...@li... > >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>> > >>>>> > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>> One dashboard for servers and applications across > >>>>> Physical-Virtual-Cloud > >>>>> Widest out-of-the-box monitoring support with 50+ applications > >>>>> Performance metrics, stats and reports that give you Actionable > >>>>> Insights > >>>>> Deep dive visibility with transaction tracing using APM Insight. > >>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>>> _______________________________________________ > >>>>> Kaldi-users mailing list > >>>>> Kal...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> -------------- next part -------------- > >>>> An HTML attachment was scrubbed... > >>>> > >>>> ------------------------------ > >>>> > >>>> > >>>> > ------------------------------------------------------------------------------ > >>>> One dashboard for servers and applications across > Physical-Virtual-Cloud > >>>> Widest out-of-the-box monitoring support with 50+ applications > >>>> Performance metrics, stats and reports that give you Actionable > Insights > >>>> Deep dive visibility with transaction tracing using APM Insight. > >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>>> > >>>> ------------------------------ > >>>> > >>>> _______________________________________________ > >>>> Kaldi-users mailing list > >>>> Kal...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>> > >>>> > >>>> End of Kaldi-users Digest, Vol 29, Issue 15 > >>>> ******************************************* > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> One dashboard for servers and applications across > Physical-Virtual-Cloud > >>> Widest out-of-the-box monitoring support with 50+ applications > >>> Performance metrics, stats and reports that give you Actionable > Insights > >>> Deep dive visibility with transaction tracing using APM Insight. > >>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >>> _______________________________________________ > >>> Kaldi-users mailing list > >>> Kal...@li... > >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > >> > >> > ------------------------------------------------------------------------------ > >> One dashboard for servers and applications across Physical-Virtual-Cloud > >> Widest out-of-the-box monitoring support with 50+ applications > >> Performance metrics, stats and reports that give you Actionable Insights > >> Deep dive visibility with transaction tracing using APM Insight. > >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > >> _______________________________________________ > >> Kaldi-users mailing list > >> Kal...@li... > >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |