You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(1) |
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
(1) |
Feb
(8) |
Mar
|
Apr
(1) |
May
(3) |
Jun
(13) |
Jul
(7) |
Aug
(11) |
Sep
(6) |
Oct
(14) |
Nov
(16) |
Dec
(1) |
2013 |
Jan
(3) |
Feb
(8) |
Mar
(17) |
Apr
(21) |
May
(27) |
Jun
(11) |
Jul
(11) |
Aug
(21) |
Sep
(39) |
Oct
(17) |
Nov
(39) |
Dec
(28) |
2014 |
Jan
(36) |
Feb
(30) |
Mar
(35) |
Apr
(17) |
May
(22) |
Jun
(28) |
Jul
(23) |
Aug
(41) |
Sep
(17) |
Oct
(10) |
Nov
(22) |
Dec
(56) |
2015 |
Jan
(30) |
Feb
(32) |
Mar
(37) |
Apr
(28) |
May
(79) |
Jun
(18) |
Jul
(35) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Jan C. <jan...@gm...> - 2014-09-26 16:17:30
|
Hello, first of all let me thank you for bringing cutting-edge speech recognition to the mortals! I am using Kaldi to jump-start training of recurrent neural networks for phoneme recognition on Timit and to compare results between Kaldi decoders and the recurrent net based ones. The s5 recipe for Timit ships with two scorers: sclite and basic. Sclite tends to compute lower error rates, which I attribute to different scoring of errors relating to the silence token. However, for scoring it requires not only the decoded phoneme sequence, but also the timing of each phoneme. Since my decoder doesn't align the decoded phones precisely in time, I was using the basic scoring script. I have two questions: 1. am I correct about the differences between the two scorers' computed error rates to different handling of the silence token? I rescored models obtained using the standard recipe and they get consistently higher error rates using the basic scorer. 2. Do you have any intuitions on how precise the phone timing information needs to be for the sclite scorer to work? Is the timing quality part of the score or is it only used to save on computations? Sincerely, Jan Chorowski |
From: <jen...@a2...> - 2014-09-25 17:11:35
|
Kaldi - Build # 752 - Failure: See the build log in attachment for the details. |
From: <jen...@a2...> - 2014-09-25 16:41:32
|
Kaldi - Build # 753 - Fixed: See the build log in attachment for the details. |
From: Daniel P. <dp...@gm...> - 2014-09-16 04:45:35
|
Hey, Someone from the SLT2014 organizing committee asked me if I knew of any cool Kaldi-based demos that people might want to show for the demo session. Let me know. Dan |
From: Daniel P. <dp...@gm...> - 2014-09-09 17:43:48
|
There are a lot of ways to do this but they would all require some kind of coding or scripting. Something to be aware of is the program gmm-align-compiled-fsts, which takes FSTs (instead of just text) as the input. You could use this with FSTs that allow some number of words at the beginning or end of the utterance to be skipped. But you would have to construct those yourself using some kind of script. The text FST archive format for Kaldi is the same as the text FST format in OpenFst, except with a blank line as termination. E.g. utterance_1 1 2 250 250 0.0 2 3 492 492 0.0 3 0.0 utterance_2 1 2 1924 1924 0.0 ... etc. Dan On Tue, Sep 9, 2014 at 6:56 AM, Saman Mousazadeh <smo...@gm...> wrote: > Hi everybody, > I have trained a model for alignment and I want to use that model for > aligning an audio file which is very long (suppose one hour). Is there any > way that I can speed up the process ? Something like parallelization? I can > not divide the audio file since I don't know how to divide the text ( I > mean if I divide it to 60 one minute audio I can not have 60 text file > which are appropriate for each utterance). I use gmm-aligned compiled (more > specifically the script align_si.sh). Is the decoder used in > gmm-aligned-compiled efficient computationally ? > > Best regards > Saman > > > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce. > Perforce version control. Predictably reliable. > > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Saman M. <smo...@gm...> - 2014-09-09 10:56:18
|
Hi everybody, I have trained a model for alignment and I want to use that model for aligning an audio file which is very long (suppose one hour). Is there any way that I can speed up the process ? Something like parallelization? I can not divide the audio file since I don't know how to divide the text ( I mean if I divide it to 60 one minute audio I can not have 60 text file which are appropriate for each utterance). I use gmm-aligned compiled (more specifically the script align_si.sh). Is the decoder used in gmm-aligned-compiled efficient computationally ? Best regards Saman |
From: Daniel P. <dp...@gm...> - 2014-09-05 02:38:24
|
Hm. You could try stepping through the program line by line in a debugger to see specifically where the problem occurs. My experience with Visual Studio is that it does not always deal well with templates, and contains both deviations from the C++ standard and outright compiler bugs. Dan On Thu, Sep 4, 2014 at 10:02 PM, K <chi...@gm...> wrote: > > > > > *发件人:* Daniel Povey [mailto:dp...@gm...] > *发送时间:* Friday, September 5, 2014 4:25 AM > *收件人:* K > *主题:* Re: [Kaldi-developers] runtime error in windows phone, unhandled > exception at 0x6930BD79 : 0xC000001D: Illegal Instruction。 > > > > Hi Dan > > > > In fact, I have solve the most compilation problem what you expect and > make it friendly to windows phone(ARM) and windows! > > > > For this problem, it’s strange that it run successfully in windows phone when I use the implement of SetProperties(DeleteAllStatesProperties(Properties(), kStaticProperties) instead of invoking the function(eg,DeleteAllStateProperties,Properties(),SetProperties) directly. For example: > > > > void DeleteStates() { > > BaseImpl::DeleteStates(); > > > // SetProperties(DeleteAllStatesProperties(Properties(),kStaticProperties));//crash > uint64 inprops = Properties(); > > uint64 outprops = inprops & kError; > > uint64 props = outprops | kNullProperties | kStaticProperties; > > > > properties_ &= kError; // kError can't be cleared > > properties_ |= props; > > > > } > > > > I know that the function is not doing anything unusual. However,in fact, it will crash only if one function(eg,DeleteArc) contain the SetProperties(). So I had to do the same thing. > > > > I want to know whether it may make other problem if I do like that. > > What do you think about ? > > > > K > > > > > > I'm impressed that you were able to compile it for ARM. > > > > I'm not sure why it's crashing in DeleteStates; that function is not doing > anything unusual that should use strange instructions. I would expect > either a compilation problem (e.g. using object files from the wrong > architecture- although the linker should detect that), or a compiler bug. > > > > See if you can successfully run the test programs in fstext/. > > > > Dan > > > > > > On Tue, Sep 2, 2014 at 4:12 AM, K <chi...@gm...> wrote: > > Hi all > > > > Question Description > > > > Recently, I have succefully compile the kaldi and run it in the platform > of x86/x86_64/ARM, but now, a runtime error occurred when I run it in > Windows Phone 8. > > > > Runtime Error > > > > unhandled exception at 0x6930BD79 : 0xC000001D: Illegal Instruction。 > > > > Stack Invoked Information : > > > > TrainingGraphCompiler::TrainingGraphCompiler in training-graph-compiler.cc > > > > à TrainingGraphCompiler::CompileGraph(const fst::VectorFst<fst::StdArc> & > word_fst, fst::VectorFst<fst::StdArc> *out_fst) in > training-graph-compiler.cc > > > > à void DeterminizeStarInLog(VectorFst<StdArc> *fst, float delta, bool *debug_ptr, int max_states) in fstext-utils-inl.h > > > > à bool DeterminizeStar(Fst<Arc> &ifst, MutableFst<Arc> *ofst,float delta, bool *debug_ptr, int max_states, bool allow_partial) in determinize-star-inl.h > > > > à void Output(MutableFst<Arc> *ofst, bool destroy = true) in determinize-star-inl.h > > > > à virtual void DeleteStates() in mutable-fst.h > > > > à void DeleteStates() in vector-fst.h > > > > Finally, it crashed in DeleteStates(). > > > > Any suggestion? > > > > Thanks! > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > |
From: K <chi...@gm...> - 2014-09-05 02:02:32
|
发件人: Daniel Povey [mailto:dp...@gm...] 发送时间: Friday, September 5, 2014 4:25 AM 收件人: K 主题: Re: [Kaldi-developers] runtime error in windows phone, unhandled exception at 0x6930BD79 : 0xC000001D: Illegal Instruction。 Hi Dan In fact, I have solve the most compilation problem what you expect and make it friendly to windows phone(ARM) and windows! For this problem, it’s strange that it run successfully in windows phone when I use the implement of SetProperties(DeleteAllStatesProperties(Properties(), kStaticProperties) instead of invoking the function(eg,DeleteAllStateProperties,Properties(),SetProperties) directly. For example: void DeleteStates() { BaseImpl::DeleteStates(); // SetProperties(DeleteAllStatesProperties(Properties(),kStaticProperties));//crash uint64 inprops = Properties(); uint64 outprops = inprops & kError; uint64 props = outprops | kNullProperties | kStaticProperties; properties_ &= kError; // kError can't be cleared properties_ |= props; } I know that the function is not doing anything unusual. However,in fact, it will crash only if one function(eg,DeleteArc) contain the SetProperties(). So I had to do the same thing. I want to know whether it may make other problem if I do like that. What do you think about ? K I'm impressed that you were able to compile it for ARM. I'm not sure why it's crashing in DeleteStates; that function is not doing anything unusual that should use strange instructions. I would expect either a compilation problem (e.g. using object files from the wrong architecture- although the linker should detect that), or a compiler bug. See if you can successfully run the test programs in fstext/. Dan On Tue, Sep 2, 2014 at 4:12 AM, K <chi...@gm... <mailto:chi...@gm...> > wrote: Hi all Question Description Recently, I have succefully compile the kaldi and run it in the platform of x86/x86_64/ARM, but now, a runtime error occurred when I run it in Windows Phone 8. Runtime Error unhandled exception at 0x6930BD79 : 0xC000001D: Illegal Instruction。 Stack Invoked Information : TrainingGraphCompiler::TrainingGraphCompiler in training-graph-compiler.cc --> TrainingGraphCompiler::CompileGraph(const fst::VectorFst<fst::StdArc> &word_fst, fst::VectorFst<fst::StdArc> *out_fst) in training-graph-compiler.cc --> void DeterminizeStarInLog(VectorFst<StdArc> *fst, float delta, bool *debug_ptr, int max_states) in fstext-utils-inl.h --> bool DeterminizeStar(Fst<Arc> &ifst, MutableFst<Arc> *ofst,float delta, bool *debug_ptr, int max_states, bool allow_partial) in determinize-star-inl.h --> void Output(MutableFst<Arc> *ofst, bool destroy = true) in determinize-star-inl.h --> virtual void DeleteStates() in mutable-fst.h --> void DeleteStates() in vector-fst.h Finally, it crashed in DeleteStates(). Any suggestion? Thanks! ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Kaldi-developers mailing list Kal...@li... <mailto:Kal...@li...> https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Daniel P. <dp...@gm...> - 2014-09-04 20:19:53
|
> Thanks for this fantastic toolkit. I have two quick questions: > > (1) Is there a way to control the number of points in an FFT (as from > compute-spectrogram-feats or other commandline tools) separately from > specifying the window duration? It is typical in phonetics to zero-pad > FFTs taken over short windows, and it would be nice to be able to > replicate this type of analysis with Kaldi. > Hi, Currently it is not possible; you would have to modify the code to do this. > (2) Do you have recommendations for reference acoustic models (possibly > also language models) trained on English that are appropriate for Kaldi? > The philosophy of Kaldi is probably to roll one's own, but it would good > to know if there are solid pretrained models available too. > The website kaldi-asr.org has been created so that people can upload their completed builds, and we will be gradually uploading the results of running all the example scripts so people can obtain the models online. Dan > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: K <chi...@gm...> - 2014-09-03 10:12:41
|
From: K <chi...@gm...> - 2014-09-03 10:08:17
|
From: K <chi...@gm...> - 2014-09-02 08:13:19
|
Hi all Question Description Recently, I have succefully compile the kaldi and run it in the platform of x86/x86_64/ARM, but now, a runtime error occurred when I run it in Windows Phone 8. Runtime Error unhandled exception at 0x6930BD79 : 0xC000001D: Illegal Instruction。 Stack Invoked Information : TrainingGraphCompiler::TrainingGraphCompiler in training-graph-compiler.cc --> TrainingGraphCompiler::CompileGraph(const fst::VectorFst<fst::StdArc> &word_fst, fst::VectorFst<fst::StdArc> *out_fst) in training-graph-compiler. cc --> void DeterminizeStarInLog(VectorFst<StdArc> *fst, float delta, bool *debug_ptr, int max_states) in fstext-utils-inl.h --> bool DeterminizeStar(Fst<Arc> &ifst, MutableFst<Arc> *ofst,float delta, bool *debug_ptr, int max_states, bool allow_partial) in determinize-star-inl.h --> void Output(MutableFst<Arc> *ofst, bool destroy = true) in determinize-star-inl.h --> virtual void DeleteStates() in mutable-fst.h --> void DeleteStates() in vector-fst.h Finally, it crashed in DeleteStates(). Any suggestion? Thanks! |
From: Tony R. <to...@ca...> - 2014-08-25 09:19:58
|
I have no relationship with Kaldi (or Apache Software Foundation) except as a user who respects all the good work that has been done. I do know something about building speech recognition systems and we could talk on an informal basis for us to discuss what resources you'd need to do the speech recognition part. Let's email to set up a Skype time. Tony On 09/08/2014 21:43, Steve Graham wrote: > Dear Sir, > > I am a university lecturer from northeastern Thailand and I am also > part of a charitable foundation called Udon Education Foundation > (UEF). We have been involved in educational projects in this area for > about seven years and are currently working on speech recognition > software as part of an ongoing project to help primary school children > in the region learn English. > > Would it be possible to speak to someone involved with The Apache > Software Foundation or KALDI as we are looking at replacing purchased > software with something 'home-grown'? We are working on speech > synthesis, speech recognition and a student teacher reporting process > and believe that KALDI could be ideal in the next stage of development. > > I would appreciate it if you could contact me with a view to starting > some kind of dialogue to hopefully get the ball rolling. > > Many thanks, > > Steven Graham > > Khon Kaen University International College (KKUIC) > KKUIC Email: st...@kk... <mailto:st...@kk...> > KKUIC Web: http://home.kku.ac.th/steven > > Personal Email: st...@st... > <mailto:st...@st...> > Personal Web: www.steves-english-zone.com > <http://www.steves-english-zone.com/> > Personal Skype: shed_chelsea -- ** Cantab is hiring: www.cantabResearch.com/openings ** Dr A J Robinson, Founder, Cantab Research Ltd Phone direct: 01223 778240 office: 01223 794497 Company reg no GB 05697423, VAT reg no 925606030 51 Canterbury Street, Cambridge, CB4 3QG, UK |
From: Daniel P. <dp...@gm...> - 2014-08-24 17:56:12
|
It should be there in the latest version: do "svn up" Dan On Mon, Aug 18, 2014 at 7:13 AM, Tang Yun <ho...@gm...> wrote: > Hi, everyone, > > I am trying to repeat the kaldi NN training recipe (nnet2) on the RM data > set. One script for nnet2 build is missing (get_num_frames.sh). Where I can > find this scripts? > > Thanks > > Yun > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Daniel P. <dp...@gm...> - 2014-08-20 19:56:10
|
Regarding compressed HTK features: you have to use HCopy, from the HTK tools, to remove the compression. I don't recall the exact options or config file that you need. Regarding the mismatch in length: there is an option to paste-feats, something like --length-mismatch-tolerance, that you can use to make it tolerate a small difference (it will output the length of the shortest input). Dan On Wed, Aug 20, 2014 at 12:21 PM, Korbinian Riedhammer <kor...@gm... > wrote: > Jo Arif, > > > On Wed, Aug 20, 2014 at 6:15 PM, Arif Khan <ife...@gm...> wrote: > > Thanks Korbinian for your usefull reply, I used the bin/paste-feats > module > Ok. > > > but its give me error like: > > > > "Code to read HTK features does not support compressed features, or > features > > with VQ.". I read on kaldi mailing list that we do need to uncompress (by > > Dan) it but I dont know which utility to use. > > > > I must mention that I am not using HTK or Sphinx, but (EST - edinburgh > > speech tools utility to extract features (that has an option for htk > format. > If you're not using HTK features you shouldn't see this error. Sure > you're working with the right tools? I might be wrong, but I believe > there is a difference between the kaldi and HTK feature compression. > > > module but there I got a mismatch in the lengths of two files in the > numer > > of rows. Any work arround how to fix the number of rows for the two > files. > There is an option to ignore this error, but most likely the > difference in length (rows) is due to the handling of the border > conditions at the beginning and end of the file. HTK/Kaldi produce > one feature for each complete frame (and extrapolate on begin/end), > other tools might work differently. As you're importing from a 3-rd > party module anyways, assuming that you do it via a text-archive, you > can just extend your processing in that conversion script to make sure > the boundary conditions match kaldi's behavior. > > Korbinian. > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Tony R. <to...@ca...> - 2014-08-20 19:29:23
|
Just to throw in some thoughts here. I definitely agree with Dan's advice that you have to be consistent, and in practice I also use the "best HMM system". However, there is an alternative view which says that chances are you trained your HMMs on one domain and want to align something completely different. Alignment is not a hard task, you can often get decent alignment using only GMM monophones and many people do just that. It's only hard when something different happens, e.g. back in 1999 I built the first automated subtitling (closed caption) system, getting the speech/text alignment right was easy, but the problem was that a lot of the audio wasn't speech and so you there was a mismatch between the not-speech model using in training and that used at deployment time. So if your audio for alignment is from a different domain than training you might find that a simpler model works better. Of course you could be asking a completely different question, that is how do you evaluate which model produces the best alignment? In that case you've got to ask what "best" means. In the case of subtitling the few frame noise that ASR systems produce isn't noticeable, but gross errors are. So RMS difference against manual timings is not a good measure, it's better to define what counts as a "gross error" and measure the error rate. In summary, we probably need a better idea why you are choosing one alignment over another in order to help. Tony On 20/08/14 19:54, Daniel Povey wrote: > Usually I think you want to be consistent in which system you use > alignments from- I'd recommend to always just use the best HMM system > you have to get the alignments. In principle two different systems > could align the data in systematically different ways. So I don't > think it makes sense to pick and choose alignments on an > utterance-by-utterance basis. > If you have data that you're not sure whether it's aligning correctly > at all (e.g. the transcripts may be wrong), see the script > steps/cleanup/find_bad_utts.sh. > > Dan > > > > On Wed, Aug 20, 2014 at 5:07 AM, Saman Mousazadeh > <smo...@gm... <mailto:smo...@gm...>> wrote: > > Hi everybody, > I have a question about alignment. Suppose we have a two > alignments of a sentence and an audio file (e.d obtained from two > different models). How can I determine which of them is more > appropriate. > Any guide will help me a lot. > Best > Saman > > > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > > > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers -- ** Cantab is hiring: www.cantabResearch.com/openings ** -- Dr A J Robinson, Founder, Cantab Research Ltd Phone direct: 01223 778240 office: 01223 794497 Company reg no GB 05697423, VAT reg no 925606030 51 Canterbury Street, Cambridge, CB4 3QG, UK |
From: Daniel P. <dp...@gm...> - 2014-08-20 18:54:32
|
Usually I think you want to be consistent in which system you use alignments from- I'd recommend to always just use the best HMM system you have to get the alignments. In principle two different systems could align the data in systematically different ways. So I don't think it makes sense to pick and choose alignments on an utterance-by-utterance basis. If you have data that you're not sure whether it's aligning correctly at all (e.g. the transcripts may be wrong), see the script steps/cleanup/find_bad_utts.sh. Dan On Wed, Aug 20, 2014 at 5:07 AM, Saman Mousazadeh <smo...@gm...> wrote: > Hi everybody, > I have a question about alignment. Suppose we have a two alignments of a > sentence and an audio file (e.d obtained from two different models). How > can I determine which of them is more appropriate. > Any guide will help me a lot. > Best > Saman > > > > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Korbinian R. <kor...@gm...> - 2014-08-20 16:22:29
|
Jo Arif, On Wed, Aug 20, 2014 at 6:15 PM, Arif Khan <ife...@gm...> wrote: > Thanks Korbinian for your usefull reply, I used the bin/paste-feats module Ok. > but its give me error like: > > "Code to read HTK features does not support compressed features, or features > with VQ.". I read on kaldi mailing list that we do need to uncompress (by > Dan) it but I dont know which utility to use. > > I must mention that I am not using HTK or Sphinx, but (EST - edinburgh > speech tools utility to extract features (that has an option for htk format. If you're not using HTK features you shouldn't see this error. Sure you're working with the right tools? I might be wrong, but I believe there is a difference between the kaldi and HTK feature compression. > module but there I got a mismatch in the lengths of two files in the numer > of rows. Any work arround how to fix the number of rows for the two files. There is an option to ignore this error, but most likely the difference in length (rows) is due to the handling of the border conditions at the beginning and end of the file. HTK/Kaldi produce one feature for each complete frame (and extrapolate on begin/end), other tools might work differently. As you're importing from a 3-rd party module anyways, assuming that you do it via a text-archive, you can just extend your processing in that conversion script to make sure the boundary conditions match kaldi's behavior. Korbinian. |
From: Arif K. <ife...@gm...> - 2014-08-20 16:15:15
|
Hi, I have two parts of my email both for solving the same problem. ie. Feature concatenation - mismatch Ist part: Thanks Korbinian for your usefull reply, I used the bin/paste-feats module but its give me error like: "Code to read HTK features does not support compressed features, or features with VQ.". I read on kaldi mailing list that we do need to uncompress (by Dan) it but I dont know which utility to use. I must mention that I am not using HTK or Sphinx, but (EST - edinburgh speech tools utility to extract features (that has an option for htk format. ) Here is the link: http://www.cstr.ed.ac.uk/projects/speech_tools/manual-1.2.0/x737.htm . 2nd part: I also tried to compute the mfcc features with kaldi and than convert the other set of features to kaldi archive format and use the "paste-feats" module but there I got a mismatch in the lengths of two files in the numer of rows. Any work arround how to fix the number of rows for the two files. Best regards, Arif On 20/08/14 12:19, Korbinian Riedhammer wrote: > Hi, > > it's easiest to use bin/copy-feats. If your feature program supports > HTK or Sphinx format, then use --htk-in or --sphinx-in, otherwise > parse from ascii using ark,t and some script to produce the proper > kaldi archive format > turn = [ > [ 0 0 0 ... ] > [ 0 0 0 ... ] > ] > > Korbinian. > > On Wed, Aug 20, 2014 at 12:04 PM, Arif Khan <ife...@gm...> wrote: >> Dear Kaladi authors, >> >> I want to use an external features extraction program/module >> (EST-edinburg speech tools) to extract features . Its just a matrix of N >> x M size , with N no of frames and M no of feature vector (in my case 48 >> features - i am doing some multi-modal feature fusion experiments). >> >> How to transform this feature vector, so that it is usable with KALDI >> i.e matrix in kaldi format? >> >> >> Best regards, >> Arif >> >> >> >> ------------------------------------------------------------------------------ >> Slashdot TV. >> Video for Nerds. Stuff that matters. >> http://tv.slashdot.org/ >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Korbinian R. <kor...@gm...> - 2014-08-20 10:20:02
|
Hi, it's easiest to use bin/copy-feats. If your feature program supports HTK or Sphinx format, then use --htk-in or --sphinx-in, otherwise parse from ascii using ark,t and some script to produce the proper kaldi archive format turn = [ [ 0 0 0 ... ] [ 0 0 0 ... ] ] Korbinian. On Wed, Aug 20, 2014 at 12:04 PM, Arif Khan <ife...@gm...> wrote: > Dear Kaladi authors, > > I want to use an external features extraction program/module > (EST-edinburg speech tools) to extract features . Its just a matrix of N > x M size , with N no of frames and M no of feature vector (in my case 48 > features - i am doing some multi-modal feature fusion experiments). > > How to transform this feature vector, so that it is usable with KALDI > i.e matrix in kaldi format? > > > Best regards, > Arif > > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Arif K. <ife...@gm...> - 2014-08-20 10:04:27
|
Dear Kaladi authors, I want to use an external features extraction program/module (EST-edinburg speech tools) to extract features . Its just a matrix of N x M size , with N no of frames and M no of feature vector (in my case 48 features - i am doing some multi-modal feature fusion experiments). How to transform this feature vector, so that it is usable with KALDI i.e matrix in kaldi format? Best regards, Arif |
From: Saman M. <smo...@gm...> - 2014-08-20 09:07:51
|
Hi everybody, I have a question about alignment. Suppose we have a two alignments of a sentence and an audio file (e.d obtained from two different models). How can I determine which of them is more appropriate. Any guide will help me a lot. Best Saman |
From: Tang Y. <ho...@gm...> - 2014-08-18 14:14:24
|
Hi, everyone, I am trying to repeat the kaldi NN training recipe (nnet2) on the RM data set. One script for nnet2 build is missing (get_num_frames.sh). Where I can find this scripts? Thanks Yun |
From: <jen...@a2...> - 2014-08-16 07:53:42
|
Kaldi - Build # 679 - Fixed: See the build log in attachment for the details. |
From: <jen...@a2...> - 2014-08-15 19:32:16
|
Kaldi - Build # 677 - Still Failing: See the build log in attachment for the details. |