You can subscribe to this list here.
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2012 |
Jan
|
Feb
|
Mar
(8) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(7) |
May
(31) |
Jun
(40) |
Jul
(65) |
Aug
(37) |
Sep
(12) |
Oct
(57) |
Nov
(15) |
Dec
(35) |
| 2014 |
Jan
(3) |
Feb
(30) |
Mar
(57) |
Apr
(26) |
May
(49) |
Jun
(26) |
Jul
(63) |
Aug
(33) |
Sep
(20) |
Oct
(153) |
Nov
(62) |
Dec
(20) |
| 2015 |
Jan
(6) |
Feb
(21) |
Mar
(42) |
Apr
(33) |
May
(76) |
Jun
(102) |
Jul
(39) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Vassil P. <vas...@gm...> - 2014-03-25 16:30:52
|
BTW, I seem to remember that this data set was freely available some time ago(I may be wrong though). Now as you have probably noticed the download page says: -------------------- *Download procedure :* Step 1. Download, fill, and sign the End User License Agreement ( EULA<https://www.idiap.ch/dataset/ami/eula.pdf>). You need to be a *representative for your organization* (students are not accepted) and use your official email address in the organization in order to request for the Database. After filling the form, please scan it and return the whole document to "data-manager at idiap.ch" -------------------- Vassil On Tue, Mar 25, 2014 at 5:18 PM, Daniel Povey <dp...@gm...> wrote: > As far as I know we don't have example scripts for the AMI data (Petr, are > there any you could easily add?), and I think this user is new to speech > recognition so might have trouble creating such scripts. > > Dan > > > On Tue, Mar 25, 2014 at 10:57 AM, Vesely Karel <ive...@fi...>wrote: > >> I think that AMI data can be downloaded *FREELY* for non-commercial >> purposes: >> https://www.idiap.ch/dataset/ami/ >> >> It is a good candidate for creating a ``free'' recipe, as it is >> challenging spontaneous / meeting speech. >> >> Not sure what's in the package, if there is a lexicon and STM too, >> there should be different microphones, headsets (easy) and distant mics >> (hard)... >> >> K. >> >> >> >> On 03/25/2014 03:30 PM, Daniel Povey wrote: >> >> The Resource Management data needs to be paid for, unfortunately, from >> the LDC. >> I think there are two example scripts, "yesno" (which is very tiny and >> not suitable for research, only for testing scripts), and "voxforge", which >> have free data. In future we plan to add more scripts for "free" data. >> Dan >> >> >> >> On Tue, Mar 25, 2014 at 9:13 AM, Margusja <ma...@ro...> wrote: >> >>> Hi >>> >>> I am doing first steps in kaldi and speech recognition world. >>> Started http://kaldi.sourceforge.net/tutorial_running.html and found >>> that part >>> ... >>> The best case is that there is some directory on your system, say >>> /export/corpora5/LDC/LDC93S3A/rm_comp, that contains three >>> subdirectories; call them rm1_audio1, rm1_audio2 and rm2_audio >>> ... >>> >>> is a little unclear to me. Can I download thous rm files? Or can I use >>> my own? What is the format of thous files? >>> >>> -- >>> Best regards, Margus (Margusja) Roo >>> +372 51 48 780 >>> http://margus.roo.ee >>> http://ee.linkedin.com/in/margusroo >>> skype: margusja >>> ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Learn Graph Databases - Download FREE O'Reilly Book >>> "Graph Databases" is the definitive new guide to graph databases and >>> their >>> applications. Written by three acclaimed leaders in the field, >>> this first edition is now available. Download your free book today! >>> http://p.sf.net/sfu/13534_NeoTech >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> >> >> >> ------------------------------------------------------------------------------ >> Learn Graph Databases - Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and their >> applications. Written by three acclaimed leaders in the field, >> this first edition is now available. Download your free book today!http://p.sf.net/sfu/13534_NeoTech >> >> >> >> _______________________________________________ >> Kaldi-users mailing lis...@li...://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> >> >> ------------------------------------------------------------------------------ >> Learn Graph Databases - Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and their >> applications. Written by three acclaimed leaders in the field, >> this first edition is now available. Download your free book today! >> http://p.sf.net/sfu/13534_NeoTech >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Daniel P. <dp...@gm...> - 2014-03-25 15:18:59
|
As far as I know we don't have example scripts for the AMI data (Petr, are there any you could easily add?), and I think this user is new to speech recognition so might have trouble creating such scripts. Dan On Tue, Mar 25, 2014 at 10:57 AM, Vesely Karel <ive...@fi...>wrote: > I think that AMI data can be downloaded *FREELY* for non-commercial > purposes: > https://www.idiap.ch/dataset/ami/ > > It is a good candidate for creating a ``free'' recipe, as it is > challenging spontaneous / meeting speech. > > Not sure what's in the package, if there is a lexicon and STM too, > there should be different microphones, headsets (easy) and distant mics > (hard)... > > K. > > > > On 03/25/2014 03:30 PM, Daniel Povey wrote: > > The Resource Management data needs to be paid for, unfortunately, from the > LDC. > I think there are two example scripts, "yesno" (which is very tiny and not > suitable for research, only for testing scripts), and "voxforge", which > have free data. In future we plan to add more scripts for "free" data. > Dan > > > > On Tue, Mar 25, 2014 at 9:13 AM, Margusja <ma...@ro...> wrote: > >> Hi >> >> I am doing first steps in kaldi and speech recognition world. >> Started http://kaldi.sourceforge.net/tutorial_running.html and found >> that part >> ... >> The best case is that there is some directory on your system, say >> /export/corpora5/LDC/LDC93S3A/rm_comp, that contains three >> subdirectories; call them rm1_audio1, rm1_audio2 and rm2_audio >> ... >> >> is a little unclear to me. Can I download thous rm files? Or can I use >> my own? What is the format of thous files? >> >> -- >> Best regards, Margus (Margusja) Roo >> +372 51 48 780 >> http://margus.roo.ee >> http://ee.linkedin.com/in/margusroo >> skype: margusja >> ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" >> >> >> >> ------------------------------------------------------------------------------ >> Learn Graph Databases - Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and their >> applications. Written by three acclaimed leaders in the field, >> this first edition is now available. Download your free book today! >> http://p.sf.net/sfu/13534_NeoTech >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today!http://p.sf.net/sfu/13534_NeoTech > > > > _______________________________________________ > Kaldi-users mailing lis...@li...://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Vesely K. <ive...@fi...> - 2014-03-25 14:57:29
|
I think that AMI data can be downloaded *FREELY* for non-commercial purposes: https://www.idiap.ch/dataset/ami/ It is a good candidate for creating a ``free'' recipe, as it is challenging spontaneous / meeting speech. Not sure what's in the package, if there is a lexicon and STM too, there should be different microphones, headsets (easy) and distant mics (hard)... K. On 03/25/2014 03:30 PM, Daniel Povey wrote: > The Resource Management data needs to be paid for, unfortunately, from > the LDC. > I think there are two example scripts, "yesno" (which is very tiny and > not suitable for research, only for testing scripts), and "voxforge", > which have free data. In future we plan to add more scripts for > "free" data. > Dan > > > > On Tue, Mar 25, 2014 at 9:13 AM, Margusja <ma...@ro... > <mailto:ma...@ro...>> wrote: > > Hi > > I am doing first steps in kaldi and speech recognition world. > Started http://kaldi.sourceforge.net/tutorial_running.html and found > that part > ... > The best case is that there is some directory on your system, say > /export/corpora5/LDC/LDC93S3A/rm_comp, that contains three > subdirectories; call them rm1_audio1, rm1_audio2 and rm2_audio > ... > > is a little unclear to me. Can I download thous rm files? Or can I use > my own? What is the format of thous files? > > -- > Best regards, Margus (Margusja) Roo > +372 51 48 780 <tel:%2B372%2051%2048%20780> > http://margus.roo.ee > http://ee.linkedin.com/in/margusroo > skype: margusja > ldapsearch -x -h ldap.sk.ee <http://ldap.sk.ee> -b c=EE > "(serialNumber=37303140314)" > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases > and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Daniel P. <dp...@gm...> - 2014-03-25 14:30:39
|
The Resource Management data needs to be paid for, unfortunately, from the LDC. I think there are two example scripts, "yesno" (which is very tiny and not suitable for research, only for testing scripts), and "voxforge", which have free data. In future we plan to add more scripts for "free" data. Dan On Tue, Mar 25, 2014 at 9:13 AM, Margusja <ma...@ro...> wrote: > Hi > > I am doing first steps in kaldi and speech recognition world. > Started http://kaldi.sourceforge.net/tutorial_running.html and found > that part > ... > The best case is that there is some directory on your system, say > /export/corpora5/LDC/LDC93S3A/rm_comp, that contains three > subdirectories; call them rm1_audio1, rm1_audio2 and rm2_audio > ... > > is a little unclear to me. Can I download thous rm files? Or can I use > my own? What is the format of thous files? > > -- > Best regards, Margus (Margusja) Roo > +372 51 48 780 > http://margus.roo.ee > http://ee.linkedin.com/in/margusroo > skype: margusja > ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Margusja <ma...@ro...> - 2014-03-25 13:37:55
|
... and found the answer too - https://sites.google.com/site/dpovey/kaldi-lectures Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" On 25/03/14 15:13, Margusja wrote: > Hi > > I am doing first steps in kaldi and speech recognition world. > Started http://kaldi.sourceforge.net/tutorial_running.html and found > that part > ... > The best case is that there is some directory on your system, say > /export/corpora5/LDC/LDC93S3A/rm_comp, that contains three > subdirectories; call them rm1_audio1, rm1_audio2 and rm2_audio > ... > > is a little unclear to me. Can I download thous rm files? Or can I use > my own? What is the format of thous files? > |
|
From: Margusja <ma...@ro...> - 2014-03-25 13:35:49
|
Hi I am doing first steps in kaldi and speech recognition world. Started http://kaldi.sourceforge.net/tutorial_running.html and found that part ... The best case is that there is some directory on your system, say /export/corpora5/LDC/LDC93S3A/rm_comp, that contains three subdirectories; call them rm1_audio1, rm1_audio2 and rm2_audio ... is a little unclear to me. Can I download thous rm files? Or can I use my own? What is the format of thous files? -- Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" |
|
From: Daniel P. <dp...@gm...> - 2014-03-24 15:52:59
|
There is no fundamental reason why they couldn't be used for alignment, but in general the alignments are not too strongly dependent on the system as long as it is reasonable, so there is probably not much point in using the very best system you can to align the data. Dan On Mon, Mar 24, 2014 at 7:08 AM, Valentin Mendelev <doc...@gm... > wrote: > Hi! all, > > I wonder why no one seem to use discriminatively trained models for > alignment purposes in Kaldi? > At least I haven't seen such usage in standard recipes. > > Regards, > Valentin > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Valentin M. <doc...@gm...> - 2014-03-24 11:08:30
|
Hi! all, I wonder why no one seem to use discriminatively trained models for alignment purposes in Kaldi? At least I haven't seen such usage in standard recipes. Regards, Valentin |
|
From: Daniel P. <dp...@gm...> - 2014-03-20 19:56:16
|
Hi everyone, I think it would be nice to have tools that would convert back and forth between my and Karel's neural nets, for testing purposes. [note: it might not just be a question of converting the network itself, since I think we may use different conventions on how the splicing is done]. But anyway, converting the network would be a start. Does anyone want to help with this? Covering the "common cases", or at least the easy cases, would be sufficient-- the two versions don't support exactly the same set of nonlinearities. Dan |
|
From: Vesely K. <ive...@fi...> - 2014-03-20 13:17:52
|
Hi Lahiru, not directly, what you can do is to slice the nnet using : 'nnet-copy --remove-last-layers=N nn_in nn_out' and use several calls of nnet-forward. K. On 03/20/2014 01:47 PM, Lahiru Samarakoon wrote: > Hi all, > > For a given frame, is there a way to get the hidden activations of > all the hidden layers? > AFAIK, nnet-forward only gives the values of the output units. > > I am using the Karel's setup. > > Thank you, > > Best Regards, > Lahiru > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Lahiru S. <lah...@gm...> - 2014-03-20 12:48:06
|
Hi all, For a given frame, is there a way to get the hidden activations of all the hidden layers? AFAIK, nnet-forward only gives the values of the output units. I am using the Karel's setup. Thank you, Best Regards, Lahiru |
|
From: Daniel P. <dp...@gm...> - 2014-03-17 20:30:26
|
There's no fundamental limitation, that's just the way it was coded for demo purposes. Going forward, that code will probably become deprecated. In ^/sandbox/online I am working on a new setup for online decoding. See the example script in egs/rm/s5/local/run_online_decoding.sh. In that version, it's configurable from the command line. However, it won't do the subsampling itself, I haven't coded that yet. You'll have to figure that out yourself. Features from different sampling rates are not comparable unless you first subsample. Dan On Mon, Mar 17, 2014 at 4:09 PM, Simon Klüpfel <sim...@gm...>wrote: > Hi, > > I had a closer look at the online decoders, and I saw that for both > online-wav-gmm-decode-faster and online-gmm-decode-faster the sampling > rate is fixed at 16k. In a first try I removed the check in the first > the binary still ran, reading my wav files and producing results not > worse than when using downsampled recordings (still very bad, so far... > most probably related to the earlier post). > > Is there a limitation to 16k in the underlying libraries for these two > recognizers, or is it just to make them work with the example models > provided? > > All the best, > > Simon > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Simon K. <sim...@gm...> - 2014-03-17 20:09:20
|
Hi, I had a closer look at the online decoders, and I saw that for both online-wav-gmm-decode-faster and online-gmm-decode-faster the sampling rate is fixed at 16k. In a first try I removed the check in the first the binary still ran, reading my wav files and producing results not worse than when using downsampled recordings (still very bad, so far... most probably related to the earlier post). Is there a limitation to 16k in the underlying libraries for these two recognizers, or is it just to make them work with the example models provided? All the best, Simon |
|
From: Daniel P. <dp...@gm...> - 2014-03-17 19:58:28
|
If you have high-sampling-rate recordings and want to recognize using a lower-sampling-rate model, you can subsample- probably the easiest way is some sox command. You can't really do it the opposite way because there will be zero energy or close to zero energy in the frequencies that were not in your original recording, and the models won't like it. If you just give a recording of the wrong sampling rate to Kaldi, it will just give you garbage because the models expect input with a certain sampling rate. In fact, the programs like compute-mfcc-feats will just crash unless you edit the sampling rate option to correspond to what is being input, but doing that without retraining the models is a bad idea. Dan On Mon, Mar 17, 2014 at 3:52 PM, Simon Klüpfel <sim...@gm...>wrote: > Hi, > > I wondered about a general thing regarding the sampling rate of > recordings, and I hope you can help me on that. > > If a model is trained using, e.g., 16k sampling of the recordings, and > then it is used to recognize, e.g. 44.1k sampling of the recordings, > will this go well. Or vice versa, will that go well? > > I am sure this is a very basic question, and as you might have guessed, > I tried this, and the results did not look too good so far. So I > wondered if I had just a few option flags set wrongly, or if I am going > down a dead end there. > > All the best, > > Simon > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Simon K. <sim...@gm...> - 2014-03-17 19:52:27
|
Hi, I wondered about a general thing regarding the sampling rate of recordings, and I hope you can help me on that. If a model is trained using, e.g., 16k sampling of the recordings, and then it is used to recognize, e.g. 44.1k sampling of the recordings, will this go well. Or vice versa, will that go well? I am sure this is a very basic question, and as you might have guessed, I tried this, and the results did not look too good so far. So I wondered if I had just a few option flags set wrongly, or if I am going down a dead end there. All the best, Simon |
|
From: Vesely K. <ive...@fi...> - 2014-03-13 15:54:47
|
Hi, maybe Dans' answer would be more accurate, still I'll try to answer. I believe there is no reliable conversion tool to resave kaldi models as HTK models. When designing kaldi, we chose not to take into account HTK compatibility, as it gave us more freedom to design the code. In kaldi, the transition probabilities are stored in the "TransitionModel", not in the form of matrix, but in a form of a vector, where each transition-id has corresponding transition-weight. Best, Karel. On 03/13/2014 09:22 AM, dophist wrote: > Hi, > > Recently, I have tried to train GMM model in Kaldi, and provide this > GMM model to a HTK-compatible decoder, there is a significant > inconsistency that I couldn't find a reasonable explanation, the > problem is: > > GMM models trained in Kaldi work pretty well in Kaldi decoder, as > expected. However, these Kaldi-GMM models work quite poor in a > HTK-model-compatible decoder (WER is 14% absolute worse, than in Kaldi > decoder). > > Transition prob is not a factor because I didn't even use the > transition probabilities in Kaldi model. > > Put aside transitions, GMM parameters in Kaldi GMM model should be > working as well in other decoder, but it is not in my case. Is there > any factor that makes Kaldi GMM models only work in Kaldi decoder? > > regards, > Jiayu > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Matthew A. <mat...@ce...> - 2014-03-13 11:57:15
|
Hi Kaldi Users The alpha text processing front end for Idlak for US English is complete and I'm running a test to see how it compares with a standard HTS demo. That's the problem with speech synthesis you have to actually listen to it :-) I would be very grateful if native and near-native English speakers could help be evaluate the front end. It is a 15 minute listening test requiring headphones and is at: http://homepages.inf.ed.ac.uk/cgi/matthewa/audioexperiment.cgi Q&A: Is this Kaldi synthesis? No, the kaldi element is generating the full context models only which gives phonetic, prosodic and linguistic information for each phone. The actual synthesis for all stimuli is from the non straight HTS demo using the hts_engine to synthesise. http://hts.sp.nitech.ac.jp/archives/2.3alpha/HTS-demo_CMU-ARCTIC-SLT.tar.bz2 The samples you hear come from the HTS demo system as it is downloaded and from th HTS demo system using kaldi front end and full context models. the HTS demo Why doesn't it sound natural? This is parametric synthesis which typically contains vocoder and modelling errors which degrade speech quality. The HTS demo baseline is NOT the best HTS can do but a good starting point for testing a new kaldi system. If you want to listen to state-of-the-art unit selection synthesis (which is generally excellent :-) ) there are online demos (i.e. www.cereproc.com) I'm interested in Kaldi synthesis can I get involved? Yes. Theres a lot of scope for improving the language front end (it still lacks full normalisation rules), and I am currently working on building decision trees and models using kaldi in order for kaldi to carry out the model building part of the process. Can I reproduce the results? Yes. Download and build sandbox/idlak and follow the instructions in th kaldi documentation (Testing the Idlak Front End with HTS) |
|
From: dophist <do...@gm...> - 2014-03-13 08:22:28
|
Hi, Recently, I have tried to train GMM model in Kaldi, and provide this GMM model to a HTK-compatible decoder, there is a significant inconsistency that I couldn't find a reasonable explanation, the problem is: GMM models trained in Kaldi work pretty well in Kaldi decoder, as expected. However, these Kaldi-GMM models work quite poor in a HTK-model-compatible decoder (WER is 14% absolute worse, than in Kaldi decoder). Transition prob is not a factor because I didn't even use the transition probabilities in Kaldi model. Put aside transitions, GMM parameters in Kaldi GMM model should be working as well in other decoder, but it is not in my case. Is there any factor that makes Kaldi GMM models only work in Kaldi decoder? regards, Jiayu |
|
From: Vassil P. <vas...@gm...> - 2014-03-13 06:19:49
|
Hi, this is probably due to the reordering: http://kaldi.sourceforge.net/hmm.html#hmm_reorder Vassil On Thu, Mar 13, 2014 at 8:08 AM, dophist <do...@gm...> wrote: > Hi, Dan, > > Say I have a silence model with 3 states, each has 2 arcs(one self-loop > and one to the next state) > The transition id associated with these arcs seems to be: > tid=0 : reserve for eps > tid=1 : state1 -> state1 > tid=2 : state1 -> state2 > tid=3 : state2 -> state2 > tid=4 : state2 -> state3 > ... > and so on > > Now what I'm confused is that the recognition gives best tid path as such: > some_key 2 1 1 1 4 3 .... > > if tid 2 represents the "out-going" arc from state1 to state 2, how comes > that tid 1 follows tid 2 (which representing it is still "looping" in > state1) ? it seems that tid 2 is actual the arc going INTO state 1, but > this is inconsistent with the transition-id description in "HMMTopology" > section in Kaldi homepage. > > regards, > Jiayu > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: dophist <do...@gm...> - 2014-03-13 06:08:18
|
Hi, Dan, Say I have a silence model with 3 states, each has 2 arcs(one self-loop and one to the next state) The transition id associated with these arcs seems to be: tid=0 : reserve for eps tid=1 : state1 -> state1 tid=2 : state1 -> state2 tid=3 : state2 -> state2 tid=4 : state2 -> state3 ... and so on Now what I'm confused is that the recognition gives best tid path as such: some_key 2 1 1 1 4 3 .... if tid 2 represents the "out-going" arc from state1 to state 2, how comes that tid 1 follows tid 2 (which representing it is still "looping" in state1) ? it seems that tid 2 is actual the arc going INTO state 1, but this is inconsistent with the transition-id description in "HMMTopology" section in Kaldi homepage. regards, Jiayu |
|
From: Daniel P. <dp...@gm...> - 2014-03-12 05:25:21
|
cc'ing kaldi-users as this answer may be of interest to others too. If you set up the system with position_dependent_phones=false, then lattice-align-words cannot run. However, there is an alternative program called lattice-align-words-lexicon, which you can use instead. This is not generally done in the example scripts because we assume you use word-position-dependent phones. However, the default scripts do generate the files that lattice-align-words-lexicon needs to run; these are present inside data/lang/phones/. See the usage message of that program to see what they are. Dan On Wed, Mar 12, 2014 at 1:18 AM, Xinglong Gao <gao...@gm...>wrote: > Hi,Dan > I have done some training implementation with > position_dependent_phones=false, Hence, the file named "word_boundary.txt" > is not created when executing prepare_lang.sh. > But there are some tools, such as lattice-to-align which needs this file > to be used as paramemeter, > I don't know what is embraced in this file and how to create this file? > Thanks. > > > > > xinglong gao > > |
|
From: Daniel P. <dp...@gm...> - 2014-03-11 14:29:51
|
I'm assuming you mean that you have filterbank features and want to get the per-state alignment with the fMLLR model. You should compute features from the same data that are compatible with the fMLLR model, i.e. MFCC features in this case, and run steps/align_fmllr.sh on it. The alignments produced (ali.*.gz) are independent of the feature type. Dan On Tue, Mar 11, 2014 at 4:45 AM, Lahiru Samarakoon <lah...@gm...>wrote: > Hi all, > > I ran the wsj/s5 set up in Kaldi and currently I have the tri4b model > (fMLLR). Is there a way to use tri4b model to align filterbanks? > > Thank you, > > Best Regards, > Lahiru > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Lahiru S. <lah...@gm...> - 2014-03-11 08:45:43
|
Hi all, I ran the wsj/s5 set up in Kaldi and currently I have the tri4b model (fMLLR). Is there a way to use tri4b model to align filterbanks? Thank you, Best Regards, Lahiru |
|
From: Simon K. <sim...@gm...> - 2014-03-06 19:23:12
|
Thanks Dan, an instant help, as usual! So I guess that will be the way to go for us for now. When my colleague asked me about this, I feared to have to deal with the endian and what-else when writing the files in binary from within Matlab. But I now see that if we have the coefficients printed out with (over-)sufficient precision, we will lose no relevant accuracy. So, this turned out easier than expected. All the best, Simon On 03/06/2014 07:15 PM, Daniel Povey wrote: > It's actually trivial when you know how. > The text version of the archive format is just the utterance-id, then, > starting on the same line, the Matlab form of the matrix, then a > newline. for instance > > utt1 [ 0 2 3 > 1 3 4 ] > utt2 [ 9 8 7 > 6 4 2 ] > etc. > > So just put them in a file foo and read them with ark:foo > You can then put them in binary format with an associated scp by doing > > copy-feats ark:foo > ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp > > and you can copy /some/dir/my_features.scp as data/<something>/feats.scp > and use them. > > or as a pipe you can do > > <matlab script> | copy-feats ark:- > ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp > > Dan > > > > On Thu, Mar 6, 2014 at 2:11 PM, Simon Klüpfel <sim...@gm... > <mailto:sim...@gm...>> wrote: > > Hi, > > A colleague of mine experimented with some 'exotic' feature vectors > using Matlab, and now we would like to see how the pretty great Kaldi > tools might be used to train some model using them. > > I believe, the clean way to do it, would be to write a routine that > creates these features using the Kaldi libraries, and then writing them > to an archive. However, I fear this will involve quite some work, and as > we do not know if it will be an endeavor worth the effort, we would like > to start off to export the features in a Kaldi readable format from > Matlab. This so far seemed the smaller effort. > > I tried to find out about the way those files are structured, but got > lost somewhere on the way. > > Looking into compute-mfcc-feats.cc, I saw that there is: > > BaseFloatMatrixWriter kaldi_writer; > > which is later used to write the archive: > > kaldi_writer.Write(utt, features); > > Trying to find what this call actually does, I got lost. > > I found this: > > http://kaldi.sourceforge.net/group__table__types.html#gaa9b0c000a2d8bbf1a7df386024110883 > > and from there this: > > http://kaldi.sourceforge.net/table-types_8h_source.html#l00036 > > and then eventually this: > > http://kaldi.sourceforge.net/classkaldi_1_1TableWriter.html > > > I however could not yet find anything I could use to understand the > particular format of the archive file of feature vectors. > > The scp file should be straightforward, but I hope someone of you could > point me to the right resource to learn how to write the matrices of a > set of features in the correct archive format. > > Perhaps doing a detour through non-binary files might be a way to get > there, but this surely would be very unfavorable. > > Thanks a lot, > > Simon > > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to > Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN > optimization and the > freedom to use Git, Perforce or both. Make the move to Perforce. > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Daniel P. <dp...@gm...> - 2014-03-06 19:15:17
|
It's actually trivial when you know how. The text version of the archive format is just the utterance-id, then, starting on the same line, the Matlab form of the matrix, then a newline. for instance utt1 [ 0 2 3 1 3 4 ] utt2 [ 9 8 7 6 4 2 ] etc. So just put them in a file foo and read them with ark:foo You can then put them in binary format with an associated scp by doing copy-feats ark:foo ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp and you can copy /some/dir/my_features.scp as data/<something>/feats.scp and use them. or as a pipe you can do <matlab script> | copy-feats ark:- ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp Dan On Thu, Mar 6, 2014 at 2:11 PM, Simon Klüpfel <sim...@gm...>wrote: > Hi, > > A colleague of mine experimented with some 'exotic' feature vectors > using Matlab, and now we would like to see how the pretty great Kaldi > tools might be used to train some model using them. > > I believe, the clean way to do it, would be to write a routine that > creates these features using the Kaldi libraries, and then writing them > to an archive. However, I fear this will involve quite some work, and as > we do not know if it will be an endeavor worth the effort, we would like > to start off to export the features in a Kaldi readable format from > Matlab. This so far seemed the smaller effort. > > I tried to find out about the way those files are structured, but got > lost somewhere on the way. > > Looking into compute-mfcc-feats.cc, I saw that there is: > > BaseFloatMatrixWriter kaldi_writer; > > which is later used to write the archive: > > kaldi_writer.Write(utt, features); > > Trying to find what this call actually does, I got lost. > > I found this: > > > http://kaldi.sourceforge.net/group__table__types.html#gaa9b0c000a2d8bbf1a7df386024110883 > > and from there this: > > http://kaldi.sourceforge.net/table-types_8h_source.html#l00036 > > and then eventually this: > > http://kaldi.sourceforge.net/classkaldi_1_1TableWriter.html > > > I however could not yet find anything I could use to understand the > particular format of the archive file of feature vectors. > > The scp file should be straightforward, but I hope someone of you could > point me to the right resource to learn how to write the matrices of a > set of features in the correct archive format. > > Perhaps doing a detour through non-binary files might be a way to get > there, but this surely would be very unfavorable. > > Thanks a lot, > > Simon > > > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to > Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN optimization and > the > freedom to use Git, Perforce or both. Make the move to Perforce. > > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |