You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(1) |
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
(1) |
Feb
(8) |
Mar
|
Apr
(1) |
May
(3) |
Jun
(13) |
Jul
(7) |
Aug
(11) |
Sep
(6) |
Oct
(14) |
Nov
(16) |
Dec
(1) |
2013 |
Jan
(3) |
Feb
(8) |
Mar
(17) |
Apr
(21) |
May
(27) |
Jun
(11) |
Jul
(11) |
Aug
(21) |
Sep
(39) |
Oct
(17) |
Nov
(39) |
Dec
(28) |
2014 |
Jan
(36) |
Feb
(30) |
Mar
(35) |
Apr
(17) |
May
(22) |
Jun
(28) |
Jul
(23) |
Aug
(41) |
Sep
(17) |
Oct
(10) |
Nov
(22) |
Dec
(56) |
2015 |
Jan
(30) |
Feb
(32) |
Mar
(37) |
Apr
(28) |
May
(79) |
Jun
(18) |
Jul
(35) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Kartik A. <aud...@us...> - 2013-03-11 16:02:44
|
Thanks. I understand the OpenFST format but was under the impression that lattice-align-words will replace FST state IDs with the corresponding frame numbers. On Mon, Mar 11, 2013 at 8:59 AM, Daniel Povey <dp...@gm...> wrote: > cc-ing kaldi-developers so it's archived, in case anyone has similar > questions. > > You need to understand the OpenFst format. This is the "acceptor" version > of the format, which is (start-state end-state symbol weight), and the > weight is (graph cost, acoustic cost, sequence of transition-ids). [the > "normal", non-acceptor format is start-state end-state input-symbol > output-symbol weight]. The time can be obtained by summing up the number > of transition-ids starting from the beginning of the lattice; in the code > it's LatticeStateTimes. > > Note that the state numbers are arbitrary in a sense, they contain no real > information. See www.openfst.org for more info on WFSTs. > > > On Mon, Mar 11, 2013 at 11:55 AM, Kartik Audhkhasi <aud...@us...>wrote: > >> Thanks Dan. I have started using the new scripts. However the timing >> issue still remains. I used lattice-align-words to get times on the lattice >> nodes in the same way as is demonstrated in the run.sh script. I think I am >> not interpreting the times correctly. Do the IDs on both start and end >> nodes represent frame numbers? E.g. the first line of my lattice is: >> >> 0 2337 44870 >> 17.0063,2553.48,9468_9482_9492_9491_9491_9491_9491_9910_9909_9909_9916_9924_9194_9210_9242_9241_9241_9241 >> >> Does this says that word 44870 goes from frame 0 to frame 2237? The >> transition ID sequence however shows only 18 frames. >> >> >> On Sun, Mar 10, 2013 at 9:26 AM, Daniel Povey <dp...@gm...> wrote: >> >>> And RE how to debug it-- before and after lattice-align-words, you could >>> run something like lattice-to-post; this program will crash if there are >>> inconsistent times in the lattice, i.e. the lengths of input-symbol >>> sequences are not all the same. I suspect you actually mixed something up. >>> Dan >>> >>> >>> On Sun, Mar 10, 2013 at 12:24 PM, Daniel Povey <dp...@gm...> wrote: >>> >>>> It's a shame that you're using the older versions of the script. >>>> Currently the "s5" scripts are the canonical ones. Your issue with times >>>> greater than the length of the file is very unexpected. This is not the >>>> kind of error I would expect to ever arise. >>>> RE getting the N-best or 1-best sequences-- the programs lattice-nbest >>>> and lattice-1best are relevant here; they output stuff in the regular >>>> lattice format, and you can then put them through lattice-word-align (old >>>> scripts) or lattice-align-words (new scripts), and convert the output to, >>>> say, ctm format-- you can check the scripts for how to convert to ctm >>>> format, it's something like lattice-to-ctm, but there are scripts such as >>>> get_ctm.sh and get_train_ctm.sh in s5. >>>> Dan >>>> >>>> >>>> >>>> On Sat, Mar 9, 2013 at 8:41 PM, Kartik Audhkhasi <aud...@us...>wrote: >>>> >>>>> Hello, >>>>> >>>>> I am working with Kaldi lattices and used the walign_lats.sh script to >>>>> get times on nodes. I am using position dependent phones and believe that >>>>> all required files are in place. However, I see that some frame numbers (in >>>>> units of 10ms) exceed the total length of the file. Do you have any >>>>> suggestions as to what could be going wrong? >>>>> >>>>> Also: Is there an easy way to get the N-best or 1-best sequences with >>>>> word boundaries? >>>>> >>>>> Thanks, >>>>> Kartik >>>>> >>>>> -- >>>>> Kartik Audhkhasi >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester >>>>> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the >>>>> endpoint security space. For insight on selecting the right partner to >>>>> tackle endpoint security challenges, access the full report. >>>>> http://p.sf.net/sfu/symantec-dev2dev >>>>> _______________________________________________ >>>>> Kaldi-developers mailing list >>>>> Kal...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>>> >>>>> >>>> >>> >> >> >> -- >> Kartik Audhkhasi >> > > -- Kartik Audhkhasi |
From: Daniel P. <dp...@gm...> - 2013-03-11 15:59:17
|
cc-ing kaldi-developers so it's archived, in case anyone has similar questions. You need to understand the OpenFst format. This is the "acceptor" version of the format, which is (start-state end-state symbol weight), and the weight is (graph cost, acoustic cost, sequence of transition-ids). [the "normal", non-acceptor format is start-state end-state input-symbol output-symbol weight]. The time can be obtained by summing up the number of transition-ids starting from the beginning of the lattice; in the code it's LatticeStateTimes. Note that the state numbers are arbitrary in a sense, they contain no real information. See www.openfst.org for more info on WFSTs. On Mon, Mar 11, 2013 at 11:55 AM, Kartik Audhkhasi <aud...@us...> wrote: > Thanks Dan. I have started using the new scripts. However the timing issue > still remains. I used lattice-align-words to get times on the lattice nodes > in the same way as is demonstrated in the run.sh script. I think I am not > interpreting the times correctly. Do the IDs on both start and end nodes > represent frame numbers? E.g. the first line of my lattice is: > > 0 2337 44870 > 17.0063,2553.48,9468_9482_9492_9491_9491_9491_9491_9910_9909_9909_9916_9924_9194_9210_9242_9241_9241_9241 > > Does this says that word 44870 goes from frame 0 to frame 2237? The > transition ID sequence however shows only 18 frames. > > > On Sun, Mar 10, 2013 at 9:26 AM, Daniel Povey <dp...@gm...> wrote: > >> And RE how to debug it-- before and after lattice-align-words, you could >> run something like lattice-to-post; this program will crash if there are >> inconsistent times in the lattice, i.e. the lengths of input-symbol >> sequences are not all the same. I suspect you actually mixed something up. >> Dan >> >> >> On Sun, Mar 10, 2013 at 12:24 PM, Daniel Povey <dp...@gm...> wrote: >> >>> It's a shame that you're using the older versions of the script. >>> Currently the "s5" scripts are the canonical ones. Your issue with times >>> greater than the length of the file is very unexpected. This is not the >>> kind of error I would expect to ever arise. >>> RE getting the N-best or 1-best sequences-- the programs lattice-nbest >>> and lattice-1best are relevant here; they output stuff in the regular >>> lattice format, and you can then put them through lattice-word-align (old >>> scripts) or lattice-align-words (new scripts), and convert the output to, >>> say, ctm format-- you can check the scripts for how to convert to ctm >>> format, it's something like lattice-to-ctm, but there are scripts such as >>> get_ctm.sh and get_train_ctm.sh in s5. >>> Dan >>> >>> >>> >>> On Sat, Mar 9, 2013 at 8:41 PM, Kartik Audhkhasi <aud...@us...>wrote: >>> >>>> Hello, >>>> >>>> I am working with Kaldi lattices and used the walign_lats.sh script to >>>> get times on nodes. I am using position dependent phones and believe that >>>> all required files are in place. However, I see that some frame numbers (in >>>> units of 10ms) exceed the total length of the file. Do you have any >>>> suggestions as to what could be going wrong? >>>> >>>> Also: Is there an easy way to get the N-best or 1-best sequences with >>>> word boundaries? >>>> >>>> Thanks, >>>> Kartik >>>> >>>> -- >>>> Kartik Audhkhasi >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester >>>> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the >>>> endpoint security space. For insight on selecting the right partner to >>>> tackle endpoint security challenges, access the full report. >>>> http://p.sf.net/sfu/symantec-dev2dev >>>> _______________________________________________ >>>> Kaldi-developers mailing list >>>> Kal...@li... >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>> >>>> >>> >> > > > -- > Kartik Audhkhasi > |
From: Daniel P. <dp...@gm...> - 2013-03-10 16:26:13
|
And RE how to debug it-- before and after lattice-align-words, you could run something like lattice-to-post; this program will crash if there are inconsistent times in the lattice, i.e. the lengths of input-symbol sequences are not all the same. I suspect you actually mixed something up. Dan On Sun, Mar 10, 2013 at 12:24 PM, Daniel Povey <dp...@gm...> wrote: > It's a shame that you're using the older versions of the script. > Currently the "s5" scripts are the canonical ones. Your issue with times > greater than the length of the file is very unexpected. This is not the > kind of error I would expect to ever arise. > RE getting the N-best or 1-best sequences-- the programs lattice-nbest and > lattice-1best are relevant here; they output stuff in the regular lattice > format, and you can then put them through lattice-word-align (old scripts) > or lattice-align-words (new scripts), and convert the output to, say, ctm > format-- you can check the scripts for how to convert to ctm format, it's > something like lattice-to-ctm, but there are scripts such as get_ctm.sh and > get_train_ctm.sh in s5. > Dan > > > > On Sat, Mar 9, 2013 at 8:41 PM, Kartik Audhkhasi <aud...@us...> wrote: > >> Hello, >> >> I am working with Kaldi lattices and used the walign_lats.sh script to >> get times on nodes. I am using position dependent phones and believe that >> all required files are in place. However, I see that some frame numbers (in >> units of 10ms) exceed the total length of the file. Do you have any >> suggestions as to what could be going wrong? >> >> Also: Is there an easy way to get the N-best or 1-best sequences with >> word boundaries? >> >> Thanks, >> Kartik >> >> -- >> Kartik Audhkhasi >> >> >> ------------------------------------------------------------------------------ >> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester >> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the >> endpoint security space. For insight on selecting the right partner to >> tackle endpoint security challenges, access the full report. >> http://p.sf.net/sfu/symantec-dev2dev >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> > |
From: Daniel P. <dp...@gm...> - 2013-03-10 16:24:18
|
It's a shame that you're using the older versions of the script. Currently the "s5" scripts are the canonical ones. Your issue with times greater than the length of the file is very unexpected. This is not the kind of error I would expect to ever arise. RE getting the N-best or 1-best sequences-- the programs lattice-nbest and lattice-1best are relevant here; they output stuff in the regular lattice format, and you can then put them through lattice-word-align (old scripts) or lattice-align-words (new scripts), and convert the output to, say, ctm format-- you can check the scripts for how to convert to ctm format, it's something like lattice-to-ctm, but there are scripts such as get_ctm.sh and get_train_ctm.sh in s5. Dan On Sat, Mar 9, 2013 at 8:41 PM, Kartik Audhkhasi <aud...@us...> wrote: > Hello, > > I am working with Kaldi lattices and used the walign_lats.sh script to get > times on nodes. I am using position dependent phones and believe that all > required files are in place. However, I see that some frame numbers (in > units of 10ms) exceed the total length of the file. Do you have any > suggestions as to what could be going wrong? > > Also: Is there an easy way to get the N-best or 1-best sequences with word > boundaries? > > Thanks, > Kartik > > -- > Kartik Audhkhasi > > > ------------------------------------------------------------------------------ > Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester > Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the > endpoint security space. For insight on selecting the right partner to > tackle endpoint security challenges, access the full report. > http://p.sf.net/sfu/symantec-dev2dev > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Kartik A. <aud...@us...> - 2013-03-10 02:36:39
|
Hello, I am working with Kaldi lattices and used the walign_lats.sh script to get times on nodes. I am using position dependent phones and believe that all required files are in place. However, I see that some frame numbers (in units of 10ms) exceed the total length of the file. Do you have any suggestions as to what could be going wrong? Also: Is there an easy way to get the N-best or 1-best sequences with word boundaries? Thanks, Kartik -- Kartik Audhkhasi |
From: Nathan D. <nd...@ca...> - 2013-02-26 21:32:44
|
Great. I will start there the. I really appreciate your help. Nathan On Feb 26, 2013, at 1:30 PM, Daniel Povey wrote: > I don't think you can ever do without a language model, but you could start off with a simple phone bigram trained on phone sequences extracted from words. > Dan > > > On Tue, Feb 26, 2013 at 4:29 PM, Nathan Dunn <nd...@ca...> wrote: > > I'll definitely need children's corpuses regardless. > > However, I'm wondering if I can skip the decoding step such that I will not need a language / word model as I am only trying to match phonemes — not actual words. Is there a way to do that with this system? My intuition is "no" and if there was I would probably be losing valuable statistics data, but I would be more happy to be wrong. > > Thanks, > > Nathan > > On Feb 26, 2013, at 1:18 PM, Daniel Povey wrote: > >> It seems to me that what you need to do is to create a suitable language model for sequences of phones. E.g. get examples of the kind of phone sequences that children doing these exercises will typically produce, and build a language model on those the same way you would for word sequences. You could accomplish this using a lexicon that was trivial, with one word for each phone. >> It will be difficult to get good results without matched training data, as children's speech is quite different from adults'. >> Dan >> >> >> On Tue, Feb 26, 2013 at 4:15 PM, Nathan Dunn <nd...@ca...> wrote: >> >> I'm trying to put to create a tool to recognize spoken phonemes for children's reading comprehension, i.e., children speaking phonemes only, not the words and of course not a sentence. >> >> After looking a bit more, it looks there are a couple of good options: >> >> 1 - (thanks Dan) Create a lexicon consisting of just phones, that you can use at test time - removing the word-position-dependencey >> 2 - Extract phones directly from transitions prior to word alignment (i.e., directly from the acoustic model). >> >> For #2 - I would worry that the lack of information might be problematic. The advantage is that I only need enough data for the acoustic model. Anyway, I would be very happy to share whatever I do come up with with. >> >> Any thoughts on this would be helpful. >> >> Thanks, >> >> Nathan Dunn, PhD. >> 541-221-2418 >> CAS Scientific Programmer >> http://blogs.uoregon.edu/casspr/ >> nd...@ca... >> >> >> > > |
From: Daniel P. <dp...@gm...> - 2013-02-26 21:31:25
|
I don't think you can ever do without a language model, but you could start off with a simple phone bigram trained on phone sequences extracted from words. Dan On Tue, Feb 26, 2013 at 4:29 PM, Nathan Dunn <nd...@ca...> wrote: > > I'll definitely need children's corpuses regardless. > > However, I'm wondering if I can skip the decoding step such that I will > not need a language / word model as I am only trying to match phonemes — > not actual words. Is there a way to do that with this system? My > intuition is "no" and if there was I would probably be losing valuable > statistics data, but I would be more happy to be wrong. > > Thanks, > > Nathan > > On Feb 26, 2013, at 1:18 PM, Daniel Povey wrote: > > It seems to me that what you need to do is to create a suitable language > model for sequences of phones. E.g. get examples of the kind of phone > sequences that children doing these exercises will typically produce, and > build a language model on those the same way you would for word sequences. > You could accomplish this using a lexicon that was trivial, with one word > for each phone. > It will be difficult to get good results without matched training data, as > children's speech is quite different from adults'. > Dan > > > On Tue, Feb 26, 2013 at 4:15 PM, Nathan Dunn <nd...@ca...>wrote: > >> >> I'm trying to put to create a tool to recognize spoken phonemes for >> children's reading comprehension, i.e., children speaking phonemes only, >> not the words and of course not a sentence. >> >> After looking a bit more, it looks there are a couple of good options: >> >> 1 - (thanks Dan) Create a lexicon consisting of just phones, that you can >> use at test time - removing the word-position-dependencey >> 2 - Extract phones directly from transitions prior to word alignment >> (i.e., directly from the acoustic model). >> >> For #2 - I would worry that the lack of information might be problematic. >> The advantage is that I only need enough data for the acoustic model. >> Anyway, I would be very happy to share whatever I do come up with with. >> >> Any thoughts on this would be helpful. >> >> Thanks, >> >> Nathan Dunn, PhD. >> 541-221-2418 >> CAS Scientific Programmer >> http://blogs.uoregon.edu/casspr/ >> nd...@ca... >> >> >> > > |
From: Nathan D. <nd...@ca...> - 2013-02-26 21:30:19
|
I'll definitely need children's corpuses regardless. However, I'm wondering if I can skip the decoding step such that I will not need a language / word model as I am only trying to match phonemes — not actual words. Is there a way to do that with this system? My intuition is "no" and if there was I would probably be losing valuable statistics data, but I would be more happy to be wrong. Thanks, Nathan On Feb 26, 2013, at 1:18 PM, Daniel Povey wrote: > It seems to me that what you need to do is to create a suitable language model for sequences of phones. E.g. get examples of the kind of phone sequences that children doing these exercises will typically produce, and build a language model on those the same way you would for word sequences. You could accomplish this using a lexicon that was trivial, with one word for each phone. > It will be difficult to get good results without matched training data, as children's speech is quite different from adults'. > Dan > > > On Tue, Feb 26, 2013 at 4:15 PM, Nathan Dunn <nd...@ca...> wrote: > > I'm trying to put to create a tool to recognize spoken phonemes for children's reading comprehension, i.e., children speaking phonemes only, not the words and of course not a sentence. > > After looking a bit more, it looks there are a couple of good options: > > 1 - (thanks Dan) Create a lexicon consisting of just phones, that you can use at test time - removing the word-position-dependencey > 2 - Extract phones directly from transitions prior to word alignment (i.e., directly from the acoustic model). > > For #2 - I would worry that the lack of information might be problematic. The advantage is that I only need enough data for the acoustic model. Anyway, I would be very happy to share whatever I do come up with with. > > Any thoughts on this would be helpful. > > Thanks, > > Nathan Dunn, PhD. > 541-221-2418 > CAS Scientific Programmer > http://blogs.uoregon.edu/casspr/ > nd...@ca... > > > |
From: Daniel P. <dp...@gm...> - 2013-02-26 21:18:56
|
It seems to me that what you need to do is to create a suitable language model for sequences of phones. E.g. get examples of the kind of phone sequences that children doing these exercises will typically produce, and build a language model on those the same way you would for word sequences. You could accomplish this using a lexicon that was trivial, with one word for each phone. It will be difficult to get good results without matched training data, as children's speech is quite different from adults'. Dan On Tue, Feb 26, 2013 at 4:15 PM, Nathan Dunn <nd...@ca...> wrote: > > I'm trying to put to create a tool to recognize spoken phonemes for > children's reading comprehension, i.e., children speaking phonemes only, > not the words and of course not a sentence. > > After looking a bit more, it looks there are a couple of good options: > > 1 - (thanks Dan) Create a lexicon consisting of just phones, that you can > use at test time - removing the word-position-dependencey > 2 - Extract phones directly from transitions prior to word alignment > (i.e., directly from the acoustic model). > > For #2 - I would worry that the lack of information might be problematic. > The advantage is that I only need enough data for the acoustic model. > Anyway, I would be very happy to share whatever I do come up with with. > > Any thoughts on this would be helpful. > > Thanks, > > Nathan Dunn, PhD. > 541-221-2418 > CAS Scientific Programmer > http://blogs.uoregon.edu/casspr/ > nd...@ca... > > > |
From: Nathan D. <nd...@ca...> - 2013-02-26 21:16:56
|
I'm trying to put to create a tool to recognize spoken phonemes for children's reading comprehension, i.e., children speaking phonemes only, not the words and of course not a sentence. After looking a bit more, it looks there are a couple of good options: 1 - (thanks Dan) Create a lexicon consisting of just phones, that you can use at test time - removing the word-position-dependencey 2 - Extract phones directly from transitions prior to word alignment (i.e., directly from the acoustic model). For #2 - I would worry that the lack of information might be problematic. The advantage is that I only need enough data for the acoustic model. Anyway, I would be very happy to share whatever I do come up with with. Any thoughts on this would be helpful. Thanks, Nathan Dunn, PhD. 541-221-2418 CAS Scientific Programmer http://blogs.uoregon.edu/casspr/ nd...@ca... |
From: Arnab G. <ar...@gm...> - 2013-02-21 13:11:30
|
Yes, I agree that keeping things simple when we first started was a good idea. And it hasn't been an issue since the executables are generally quite atomic. But as more people use Kaldi, they may want to put things together in different ways to suit their need, which may be different from what the executables are currently providing. In fact, that is how I discovered that these two configs are registering the same switch. Things don't crash when this happens. The option that is registered first is used, and all others are ignored with a warning message. As I said, it's not a big deal currently. It is easy to change, and I am not even using that particular switch. But I see this as a potential source of pain later on. -Arnab On Wed, Feb 20, 2013 at 6:26 PM, Daniel Povey <dp...@gm...> wrote: > Arnab-- there was a bit of a discussion about this in the early days of > Kaldi. Some (e.g. Karel) felt that we should have some kind of > name-qualification for options. But my feeling was it was more in the > spirit of what I had in mind with Kaldi, to keep things simple and try to > ensure that a single program never had more than one option with the same > name. I kind of still think this is the right way to go. In the > particular case you mention, it seems like the option should always have > the same value anyway, so it might be possible to make it so it sets both > to the same value. Is there a situation where you envisage having both > those configs used? > > I think there may already be code there to detect that you registered two > configs with the same name (it will crash in this situation). > > Dan > > > On Wed, Feb 20, 2013 at 11:35 AM, Arnab Ghoshal <ar...@gm...> wrote: > >> Hi all, >> >> I just realized that it is possible for options of multiple modules to >> clash, since there is no way to uniquely identify an option from a >> particular class. For example, TrainingGraphCompilerOptions and >> WordBoundaryInfoNewOpts both have a 'reorder' option, which means only one >> of the 'reorder' options will be used in an executable that uses both. >> >> While this particular problem is easy to fix, there is a deeper issue >> with option parsing that I am mentioning here in case someone has time to >> fix this. I couldn't figure out how to use the bug-tracker on souceforge. >> >> -Arnab >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_feb >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> > |
From: Daniel P. <dp...@gm...> - 2013-02-20 18:26:17
|
Arnab-- there was a bit of a discussion about this in the early days of Kaldi. Some (e.g. Karel) felt that we should have some kind of name-qualification for options. But my feeling was it was more in the spirit of what I had in mind with Kaldi, to keep things simple and try to ensure that a single program never had more than one option with the same name. I kind of still think this is the right way to go. In the particular case you mention, it seems like the option should always have the same value anyway, so it might be possible to make it so it sets both to the same value. Is there a situation where you envisage having both those configs used? I think there may already be code there to detect that you registered two configs with the same name (it will crash in this situation). Dan On Wed, Feb 20, 2013 at 11:35 AM, Arnab Ghoshal <ar...@gm...> wrote: > Hi all, > > I just realized that it is possible for options of multiple modules to > clash, since there is no way to uniquely identify an option from a > particular class. For example, TrainingGraphCompilerOptions and > WordBoundaryInfoNewOpts both have a 'reorder' option, which means only one > of the 'reorder' options will be used in an executable that uses both. > > While this particular problem is easy to fix, there is a deeper issue with > option parsing that I am mentioning here in case someone has time to fix > this. I couldn't figure out how to use the bug-tracker on souceforge. > > -Arnab > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Arnab G. <ar...@gm...> - 2013-02-20 16:36:18
|
Hi all, I just realized that it is possible for options of multiple modules to clash, since there is no way to uniquely identify an option from a particular class. For example, TrainingGraphCompilerOptions and WordBoundaryInfoNewOpts both have a 'reorder' option, which means only one of the 'reorder' options will be used in an executable that uses both. While this particular problem is easy to fix, there is a deeper issue with option parsing that I am mentioning here in case someone has time to fix this. I couldn't figure out how to use the bug-tracker on souceforge. -Arnab |
From: Daniel P. <dp...@gm...> - 2013-01-20 17:05:32
|
What OS are you running on? And what output is it producing? BTW, there is another way to make the tools, by typing "make". Dan On Sun, Jan 20, 2013 at 4:09 AM, Arnab Ghoshal <ar...@gm...> wrote: > Yes, something is definitely wrong. It should not take more than some > tens of minutes. You have to check the log messages to see where it > got stuck and why it got stuck. > > On Sun, Jan 20, 2013 at 6:19 AM, 蘇仲銘 <chu...@gm...> wrote: > > excuse me,how long does command "./install.sh" run? It ran 4 days and > did > > not end. I am wondering is there something wrong? > > > ------------------------------------------------------------------------------ > > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > > MVPs and experts. ON SALE this month only -- learn more at: > > http://p.sf.net/sfu/learnmore_123012 > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_123012 > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Arnab G. <ar...@gm...> - 2013-01-20 09:10:13
|
Yes, something is definitely wrong. It should not take more than some tens of minutes. You have to check the log messages to see where it got stuck and why it got stuck. On Sun, Jan 20, 2013 at 6:19 AM, 蘇仲銘 <chu...@gm...> wrote: > excuse me,how long does command "./install.sh" run? It ran 4 days and did > not end. I am wondering is there something wrong? > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_123012 > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: 蘇仲銘 <chu...@gm...> - 2013-01-20 06:19:19
|
excuse me,how long does command "./install.sh" run? It ran 4 days and did not end. I am wondering is there something wrong? |
From: Ho Y. C. <ric...@gm...> - 2012-12-22 17:44:47
|
Hi, some findings in the kaldi-trunk compilation In the file: kaldi-trunk/tools/INSTALL line 150: If you get a bug about strncasecmp, you have to modify the file 'sctk-4.0/src/rfilter1/makefile'. should be: If you get a bug about strncasecmp, you have to modify the file 'sctk-2.4.0/src/rfilter1/makefile'. compilation for sctk in centos is fine without modification, while compilation with cygwin require the above modification. ------------------------------------------------------------------------------------------------------------------------------------------------------------------- compilation of current version fail in my cygwin for the file cudamatrix/cu-math-inl.h with following error g++ -msse -msse2 -Wall -I.. -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -DHAVE_CLAPACK -I ../../tools/CLAPACK_include/ -Wno-sign-compare -Winit-self -I ../../tools/CLAPACK_include/ -I /home/work/kaldi-trunk/tools/openfst/include -O3 -DNDEBUG -c -o nnet-nnet.o nnet-nnet.cc In file included from ../cudamatrix/cu-math.h:101:0, from ../nnet/nnet-activation.h:23, from nnet-nnet.cc:20: ../cudamatrix/cu-math-inl.h: In function ‘void kaldi::cu::Tanh(const kaldi::CuMatrix<Real>&, kaldi::CuMatrix<Real>*) [with Real = float]’: ../nnet/nnet-activation.h:98:21: instantiated from here ../cudamatrix/cu-math-inl.h:164:9: error: ‘isinf’ was not declared in this scope ../base/kaldi-math.h: At global scope: ../base/kaldi-math.h:84:17: warning: ‘kaldi::kBaseLogZero’ defined but not used ../base/kaldi-math.h:87:17: warning: ‘kaldi::kBaseFloatMax’ defined but not used <builtin>: recipe for target `nnet-nnet.o' failed make[1]: *** [nnet-nnet.o] Error 1 make[1]: Leaving directory `/home/work/kaldi-trunk/src/nnet' Makefile:52: recipe for target `nnet' failed make: *** [nnet] Error 2 The below modification works $ svn diff Index: cudamatrix/cu-math-inl.h =================================================================== --- cudamatrix/cu-math-inl.h (revision 1701) +++ cudamatrix/cu-math-inl.h (working copy) @@ -161,7 +161,7 @@ for(MatrixIndexT r=0; r<x.NumRows(); r++) { for(MatrixIndexT c=0; c<x.NumCols(); c++) { Real exp_2x = exp(2.0*x(r, c)); - if(isinf(exp_2x)) { + if(KALDI_ISINF(exp_2x)) { y(r, c) = 1.0; } else { y(r, c) = (exp_2x - 1.0) / (exp_2x + 1.0); |
From: Arnab G. <ar...@gm...> - 2012-11-12 10:28:06
|
Hi Michael, according to http://www.ldc.upenn.edu/Catalog/docs/LDC93S3B/readme the rm1_audio1, rm1_audio2 is how it should be expected. In this case, you will have to write a custom version of local/rm_data_prep.sh. Maybe this list will help you: http://www.ldc.upenn.edu/Catalog/docs/LDC93S3B/file.tbl If not, I can send you the training and test utterance IDs. -Arnab On Fri, Nov 9, 2012 at 6:24 PM, Deisher, Michael <mic...@in...> wrote: > Hi. I’m working through the KALDI tutorial and am trying to get the RM > system up and running. Unfortunately, my RM CDs from LDC are quite old. > When I run the KALDI setup script data_prep/run.sh and point it at my RM > directory (copied from the CDs), it tells me that the directory I specify > must have the folders rm1_audio1 and rm1_audio2 in it. I see no such > folders on the CDs. Perhaps the corpus was re-organized at some point and I > didn’t get the memo? (From > http://www.ldc.upenn.edu/Catalog/docs/LDC93S3B/disc_1/readme_ind.txt it > seems this is not the case). Could someone please do an ls -laR on it and > send me the output? Thanks! > > > > --Mike > > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Deisher, M. <mic...@in...> - 2012-11-09 18:30:20
|
Hi. The web site says: Contacting the Kaldi team If you would like to ask a question or make a comment about Kaldi or would like to be added to a Kaldi-users email list, please send an email to kal...@li...<mailto:kal...@li...> Could you please add me to kaldi-users? Thanks! --Mike |
From: Deisher, M. <mic...@in...> - 2012-11-09 18:24:47
|
Hi. I'm working through the KALDI tutorial and am trying to get the RM system up and running. Unfortunately, my RM CDs from LDC are quite old. When I run the KALDI setup script data_prep/run.sh and point it at my RM directory (copied from the CDs), it tells me that the directory I specify must have the folders rm1_audio1 and rm1_audio2 in it. I see no such folders on the CDs. Perhaps the corpus was re-organized at some point and I didn't get the memo? (From http://www.ldc.upenn.edu/Catalog/docs/LDC93S3B/disc_1/readme_ind.txt it seems this is not the case). Could someone please do an ls -laR on it and send me the output? Thanks! --Mike |
From: Daniel P. <dp...@gm...> - 2012-11-06 15:59:32
|
Also, you can just add a self-loop at each state (or just at the unigram backoff state) in the LM, for each noise type, with an appropriate cost. In IBM I think we did it just at the unigram backoff state (this was for silence modeling). Dan On Tue, Nov 6, 2012 at 6:39 AM, Arnab Ghoshal <ar...@gm...> wrote: > Of course you need those words in your LM if you need them in your > output. You can use a single <NOISE> tag like in the WSJ recipe, or > use more fine-grained noise tags like in the Switchboard recipe (or > the particular tags you suggest). Under normal circumstances, you are > supposed to remove the noises from transcript and output before > scoring. > > On Mon, Nov 5, 2012 at 9:08 PM, mark li <mar...@gm...> wrote: > > Hi, Guys: > > > > I am using the Kaldi to decode my data, which includes 5 percent noise > > words (e.g. UM UHMM, Laughter, ++BRTH++, ++Music++, etc ). Those words > > are not in LM. I found the Kaldi is really great except for those > > noise words. Whenever such words occur, the decoding output an error > > result. Is there a way to deal with them during decoding the noise > > words even if they are not in LM. It is almost not possible to model > > those noise word in the LM because they may appear anywhere. I mean is > > there a way like that Sphinx dealing with filler words)? > > > > thanks > > > > Mark > > > > > ------------------------------------------------------------------------------ > > LogMeIn Central: Instant, anywhere, Remote PC access and management. > > Stay in control, update software, and manage PCs from one command center > > Diagnose problems and improve visibility into emerging IT issues > > Automate, monitor and manage. Do more in less time with Central > > http://p.sf.net/sfu/logmein12331_d2d > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > ------------------------------------------------------------------------------ > LogMeIn Central: Instant, anywhere, Remote PC access and management. > Stay in control, update software, and manage PCs from one command center > Diagnose problems and improve visibility into emerging IT issues > Automate, monitor and manage. Do more in less time with Central > http://p.sf.net/sfu/logmein12331_d2d > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Arnab G. <ar...@gm...> - 2012-11-06 11:39:41
|
Of course you need those words in your LM if you need them in your output. You can use a single <NOISE> tag like in the WSJ recipe, or use more fine-grained noise tags like in the Switchboard recipe (or the particular tags you suggest). Under normal circumstances, you are supposed to remove the noises from transcript and output before scoring. On Mon, Nov 5, 2012 at 9:08 PM, mark li <mar...@gm...> wrote: > Hi, Guys: > > I am using the Kaldi to decode my data, which includes 5 percent noise > words (e.g. UM UHMM, Laughter, ++BRTH++, ++Music++, etc ). Those words > are not in LM. I found the Kaldi is really great except for those > noise words. Whenever such words occur, the decoding output an error > result. Is there a way to deal with them during decoding the noise > words even if they are not in LM. It is almost not possible to model > those noise word in the LM because they may appear anywhere. I mean is > there a way like that Sphinx dealing with filler words)? > > thanks > > Mark > > ------------------------------------------------------------------------------ > LogMeIn Central: Instant, anywhere, Remote PC access and management. > Stay in control, update software, and manage PCs from one command center > Diagnose problems and improve visibility into emerging IT issues > Automate, monitor and manage. Do more in less time with Central > http://p.sf.net/sfu/logmein12331_d2d > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: mark li <mar...@gm...> - 2012-11-05 15:39:08
|
Hi, Guys: I am using the Kaldi to decode my data, which includes 5 percent noise words (e.g. UM UHMM, Laughter, ++BRTH++, ++Music++, etc ). Those words are not in LM. I found the Kaldi is really great except for those noise words. Whenever such words occur, the decoding output an error result. Is there a way to deal with them during decoding the noise words even if they are not in LM. It is almost not possible to model those noise word in the LM because they may appear anywhere. I mean is there a way like that Sphinx dealing with filler words)? thanks Mark |
From: Vassil P. <vas...@gm...> - 2012-11-05 10:30:34
|
I don't know what you mean exactly by dictation application, but if you just want online decoding you may want to have a look at src/online, src/onlinebin and egs/voxforge/online_demo. As Arnab said if you need something more sophisticated this is probably doable too, but you will need to write it by yourself. Vassil On Mon, Nov 5, 2012 at 11:56 AM, Arnab Ghoshal <ar...@gm...> wrote: > In principle, yes. Most of the required components are there, but you > have to write your own solution. > > On Fri, Nov 2, 2012 at 9:33 PM, Talat Tüfekçi <tal...@gm...> wrote: >> Could I use kaldi for a dictation application ? >> >> Thanks in advance. >> >> ------------------------------------------------------------------------------ >> LogMeIn Central: Instant, anywhere, Remote PC access and management. >> Stay in control, update software, and manage PCs from one command center >> Diagnose problems and improve visibility into emerging IT issues >> Automate, monitor and manage. Do more in less time with Central >> http://p.sf.net/sfu/logmein12331_d2d >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> > > ------------------------------------------------------------------------------ > LogMeIn Central: Instant, anywhere, Remote PC access and management. > Stay in control, update software, and manage PCs from one command center > Diagnose problems and improve visibility into emerging IT issues > Automate, monitor and manage. Do more in less time with Central > http://p.sf.net/sfu/logmein12331_d2d > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Shi Hu <fin...@gm...> - 2012-11-05 10:23:24
|
I got pass that part thanks to Dan's help. Basically, gmm-latgen-fast is the one that eats quite a bit memory! For me, many decode loops spawn 6 processes (with six iterations) each having around 8 - 10 jobs just doing gmm-latgen-fast so the memory run short quickly. So I made the code run serially and for each iteration it takes about 15 - 20 GB ram, but that is okay. Thanks! On Mon, Nov 5, 2012 at 2:02 AM, Arnab Ghoshal <ar...@gm...> wrote: > That's odd! The only memory intensive part of UBM training is the > clustering step (init-ubm) but to run out of 35G you will need a very > very big model (not possible for WSJ). Gaussian selection uses fairly > low resources. Could you send the exact error messages? -Arnab > > On Sun, Nov 4, 2012 at 7:57 AM, Shi Hu <fin...@gm...> wrote: > > Hello > > > > I run local/run_mmi_tri2b.sh (this is a step in run.sh for WSJ) on a > single > > machine at Stanford clusters which has 35GB RAM, but I still run out of > RAM > > and swap memory when steps/train_ubm.sh at line 99 is called (doing > Gaussian > > selection). > > > > How do I solve this problem? > > > > Thanks! > > Shi > > > > > ------------------------------------------------------------------------------ > > LogMeIn Central: Instant, anywhere, Remote PC access and management. > > Stay in control, update software, and manage PCs from one command center > > Diagnose problems and improve visibility into emerging IT issues > > Automate, monitor and manage. Do more in less time with Central > > http://p.sf.net/sfu/logmein12331_d2d > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > |