You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(1) |
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
(1) |
Feb
(8) |
Mar
|
Apr
(1) |
May
(3) |
Jun
(13) |
Jul
(7) |
Aug
(11) |
Sep
(6) |
Oct
(14) |
Nov
(16) |
Dec
(1) |
2013 |
Jan
(3) |
Feb
(8) |
Mar
(17) |
Apr
(21) |
May
(27) |
Jun
(11) |
Jul
(11) |
Aug
(21) |
Sep
(39) |
Oct
(17) |
Nov
(39) |
Dec
(28) |
2014 |
Jan
(36) |
Feb
(30) |
Mar
(35) |
Apr
(17) |
May
(22) |
Jun
(28) |
Jul
(23) |
Aug
(41) |
Sep
(17) |
Oct
(10) |
Nov
(22) |
Dec
(56) |
2015 |
Jan
(30) |
Feb
(32) |
Mar
(37) |
Apr
(28) |
May
(79) |
Jun
(18) |
Jul
(35) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Alexander S. <aso...@gm...> - 2015-04-06 20:00:11
|
I would recommend to stick to SVN as long as possible just because when the history is linear you can easily convert repository from one VCS to another. When the git branching hell comes you'll have to stay with it forever. On CMU Sphinx when a pull request is received, it actually goes to SVN first and then is closed with the revision number in the comment, like this https://github.com/cmusphinx/sphinxtrain/pull/4. On Mon, Apr 6, 2015 at 8:04 PM, Jan Trmal <af...@ce...> wrote: > Kirill and all, > first, thanks for your comments and encouragement > As it looks right now, we will have only single, trunk, branch in the > repository (we might even delete the stable branch). > During a few past days, I have imported the kaldi repository once more. The > reason was that if you select the project's licence during repository > creation on github, github will commit some files into the repository > (LICENCE file at least) and that messed up the commit history so that git > svn rebase didn't work -- I had to do recursive merge every time before I > called rebase and everything was getting out my control. There might be easy > fix for this, but the most straightforward way for me was to run the > conversion again. After that, the svn rebase work smoothly (I have a cron > job taking care of it now). > > During this week, I intend to do the conversion once more again and > hopefully for the last time (i.e. after that, we will have an official kaldi > git repository mirroring changes from SVN) > The current tentative plans for kaldi transition is: > * first, for a few months, operate the github kaldi repo as an official > mirror of svn. That means all commits will go to SVN and the less > adventurous users (and possibly companies with more inflexible workflow) can > keep using SVN. This is to give us time to sort out the issues, write up the > docs and get familiar with git > * after that, we can make the git repository primary and instead of > mirroring SVN->GIT we will mirror GIT->SVN (note: I still have to figure out > if this is feasible -- but I think it is). This should -force- motivate all > users to switch to GIT > * after that (or right after skipping the previous stage completely) keep > only git and stop using and supporting SVN. > > y. > > > > > On Mon, Apr 6, 2015 at 2:36 PM, Kirill Katsnelson > <kir...@sm...> wrote: >> >> Totally true. Depending on the type of workflow that Kaldi developers see, >> there are up to 3 cloned repositories involved. This is what happens in the >> project I maintain (https://github.com/fsprojects/FsLexYacc). We do not >> accept any other way to submit changes to the main repository (which is a >> bare repo in github) than using github's pull requests. Even people with >> write access to the repo must follow this path. Since the pull request may >> only be generated in github between 2 branches on repos necessarily stored >> on gihub (although the branches may be in different repos, having a common >> history fork in the past), the patch submitter must have cloned the main >> repo at some point. This is the clone #2. >> >> Since the submitter obviously has to do some hacking first, he clones his >> github clone #2 to a desktop, locally, and this is the clone #3. He normally >> pushes into a branch or the master "trunk," and, come time to send the >> changes upstream, issues a pull request for a diff between clones #1 and #2 >> (or 2 points in history in #2). >> >> The only tricky part here is merging upstream changes into this train of >> clones (analogous to "svn update") in such a way that the upstream changes >> would not show up in the resulting pull request. >> >> Another workflow would be allowing changes into the main trunk branch "the >> svn way." To me, the main benefit of having the pending commits (as pull >> requests) inspected by the peers is lost in this workflow. But the decision >> is on Kaldi dev's I any case. >> >> It perhaps makes sense to write up a walk-through for first time git (or >> github) users. I'd take on the task, but I need to play back the >> step-by-step as I write, so I need time for the task. A workflow decision >> must be done first, however. There might also be an existing guide on the >> 'Net for those migrating from svn into the selected git workflow. >> >> -kkm >> >> > -----Original Message----- >> > From: Phil Garner [mailto:Phi...@id...] >> > Sent: 2015-04-05 0622 >> > To: kal...@li... >> > Subject: Re: [Kaldi-developers] Testing Github repository >> > >> > > So far, I pushed only the trunk and stable branch -- I'm not sure if >> > > import of all sandboxes makes sense. >> > >> > If I understand the concept of sandboxes well, then in git they may >> > better correspond to branches in user clones of the repository. That >> > is, the person responsible for the sandbox clones the trunk repo and >> > commits to a branch as they wish. Github makes this very easy. >> > >> > -- >> > Phil Garner >> > http://www.idiap.ch/~pgarner >> > >> > >> > ----------------------------------------------------------------------- >> > ------- >> > Dive into the World of Parallel Programming The Go Parallel Website, >> > sponsored >> > by Intel and developed in partnership with Slashdot Media, is your hub >> > for all >> > things parallel software development, from weekly thought leadership >> > blogs to >> > news, videos, case studies, tutorials and more. Take a look and join >> > the >> > conversation now. http://goparallel.sourceforge.net/ >> > _______________________________________________ >> > Kaldi-developers mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> ------------------------------------------------------------------------------ >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> Develop your own process in accordance with the BPMN 2 standard >> Learn Process modeling best practices with Bonita BPM through live >> exercises >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >> event?utm_ >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > -- Sincerely, Alexander |
From: Jan T. <af...@ce...> - 2015-04-06 19:04:17
|
Kirill and all, first, thanks for your comments and encouragement As it looks right now, we will have only single, trunk, branch in the repository (we might even delete the stable branch). During a few past days, I have imported the kaldi repository once more. The reason was that if you select the project's licence during repository creation on github, github will commit some files into the repository (LICENCE file at least) and that messed up the commit history so that git svn rebase didn't work -- I had to do recursive merge every time before I called rebase and everything was getting out my control. There might be easy fix for this, but the most straightforward way for me was to run the conversion again. After that, the svn rebase work smoothly (I have a cron job taking care of it now). During this week, I intend to do the conversion once more again and hopefully for the last time (i.e. after that, we will have an official kaldi git repository mirroring changes from SVN) The current tentative plans for kaldi transition is: * first, for a few months, operate the github kaldi repo as an official mirror of svn. That means all commits will go to SVN and the less adventurous users (and possibly companies with more inflexible workflow) can keep using SVN. This is to give us time to sort out the issues, write up the docs and get familiar with git * after that, we can make the git repository primary and instead of mirroring SVN->GIT we will mirror GIT->SVN (note: I still have to figure out if this is feasible -- but I think it is). This should -force- motivate all users to switch to GIT * after that (or right after skipping the previous stage completely) keep only git and stop using and supporting SVN. y. On Mon, Apr 6, 2015 at 2:36 PM, Kirill Katsnelson < kir...@sm...> wrote: > Totally true. Depending on the type of workflow that Kaldi developers see, > there are up to 3 cloned repositories involved. This is what happens in the > project I maintain (https://github.com/fsprojects/FsLexYacc). We do not > accept any other way to submit changes to the main repository (which is a > bare repo in github) than using github's pull requests. Even people with > write access to the repo must follow this path. Since the pull request may > only be generated in github between 2 branches on repos necessarily stored > on gihub (although the branches may be in different repos, having a common > history fork in the past), the patch submitter must have cloned the main > repo at some point. This is the clone #2. > > Since the submitter obviously has to do some hacking first, he clones his > github clone #2 to a desktop, locally, and this is the clone #3. He > normally pushes into a branch or the master "trunk," and, come time to send > the changes upstream, issues a pull request for a diff between clones #1 > and #2 (or 2 points in history in #2). > > The only tricky part here is merging upstream changes into this train of > clones (analogous to "svn update") in such a way that the upstream changes > would not show up in the resulting pull request. > > Another workflow would be allowing changes into the main trunk branch "the > svn way." To me, the main benefit of having the pending commits (as pull > requests) inspected by the peers is lost in this workflow. But the decision > is on Kaldi dev's I any case. > > It perhaps makes sense to write up a walk-through for first time git (or > github) users. I'd take on the task, but I need to play back the > step-by-step as I write, so I need time for the task. A workflow decision > must be done first, however. There might also be an existing guide on the > 'Net for those migrating from svn into the selected git workflow. > > -kkm > > > -----Original Message----- > > From: Phil Garner [mailto:Phi...@id...] > > Sent: 2015-04-05 0622 > > To: kal...@li... > > Subject: Re: [Kaldi-developers] Testing Github repository > > > > > So far, I pushed only the trunk and stable branch -- I'm not sure if > > > import of all sandboxes makes sense. > > > > If I understand the concept of sandboxes well, then in git they may > > better correspond to branches in user clones of the repository. That > > is, the person responsible for the sandbox clones the trunk repo and > > commits to a branch as they wish. Github makes this very easy. > > > > -- > > Phil Garner > > http://www.idiap.ch/~pgarner > > > > > > ----------------------------------------------------------------------- > > ------- > > Dive into the World of Parallel Programming The Go Parallel Website, > > sponsored > > by Intel and developed in partnership with Slashdot Media, is your hub > > for all > > things parallel software development, from weekly thought leadership > > blogs to > > news, videos, case studies, tutorials and more. Take a look and join > > the > > conversation now. http://goparallel.sourceforge.net/ > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Daniel P. <dp...@gm...> - 2015-04-06 18:47:42
|
Totally true. Depending on the type of workflow that Kaldi developers see, > there are up to 3 cloned repositories involved. This is what happens in the > project I maintain (https://github.com/fsprojects/FsLexYacc). We do not > accept any other way to submit changes to the main repository (which is a > bare repo in github) than using github's pull requests. Even people with > write access to the repo must follow this path. Since the pull request may > only be generated in github between 2 branches on repos necessarily stored > on gihub (although the branches may be in different repos, having a common > history fork in the past), the patch submitter must have cloned the main > repo at some point. This is the clone #2. > > Since the submitter obviously has to do some hacking first, he clones his > github clone #2 to a desktop, locally, and this is the clone #3. He > normally pushes into a branch or the master "trunk," and, come time to send > the changes upstream, issues a pull request for a diff between clones #1 > and #2 (or 2 points in history in #2). > > The only tricky part here is merging upstream changes into this train of > clones (analogous to "svn update") in such a way that the upstream changes > would not show up in the resulting pull request. > Perhaps you could explain how this is done? It might help us make an informed decision if you could give us some details on the workflow that you use, at the level of specific commands. Dan > > Another workflow would be allowing changes into the main trunk branch "the > svn way." To me, the main benefit of having the pending commits (as pull > requests) inspected by the peers is lost in this workflow. But the decision > is on Kaldi dev's I any case. > > It perhaps makes sense to write up a walk-through for first time git (or > github) users. I'd take on the task, but I need to play back the > step-by-step as I write, so I need time for the task. A workflow decision > must be done first, however. There might also be an existing guide on the > 'Net for those migrating from svn into the selected git workflow. > > -kkm > > > -----Original Message----- > > From: Phil Garner [mailto:Phi...@id...] > > Sent: 2015-04-05 0622 > > To: kal...@li... > > Subject: Re: [Kaldi-developers] Testing Github repository > > > > > So far, I pushed only the trunk and stable branch -- I'm not sure if > > > import of all sandboxes makes sense. > > > > If I understand the concept of sandboxes well, then in git they may > > better correspond to branches in user clones of the repository. That > > is, the person responsible for the sandbox clones the trunk repo and > > commits to a branch as they wish. Github makes this very easy. > > > > -- > > Phil Garner > > http://www.idiap.ch/~pgarner > > > > > > ----------------------------------------------------------------------- > > ------- > > Dive into the World of Parallel Programming The Go Parallel Website, > > sponsored > > by Intel and developed in partnership with Slashdot Media, is your hub > > for all > > things parallel software development, from weekly thought leadership > > blogs to > > news, videos, case studies, tutorials and more. Take a look and join > > the > > conversation now. http://goparallel.sourceforge.net/ > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Kirill K. <kir...@sm...> - 2015-04-06 18:36:50
|
Totally true. Depending on the type of workflow that Kaldi developers see, there are up to 3 cloned repositories involved. This is what happens in the project I maintain (https://github.com/fsprojects/FsLexYacc). We do not accept any other way to submit changes to the main repository (which is a bare repo in github) than using github's pull requests. Even people with write access to the repo must follow this path. Since the pull request may only be generated in github between 2 branches on repos necessarily stored on gihub (although the branches may be in different repos, having a common history fork in the past), the patch submitter must have cloned the main repo at some point. This is the clone #2. Since the submitter obviously has to do some hacking first, he clones his github clone #2 to a desktop, locally, and this is the clone #3. He normally pushes into a branch or the master "trunk," and, come time to send the changes upstream, issues a pull request for a diff between clones #1 and #2 (or 2 points in history in #2). The only tricky part here is merging upstream changes into this train of clones (analogous to "svn update") in such a way that the upstream changes would not show up in the resulting pull request. Another workflow would be allowing changes into the main trunk branch "the svn way." To me, the main benefit of having the pending commits (as pull requests) inspected by the peers is lost in this workflow. But the decision is on Kaldi dev's I any case. It perhaps makes sense to write up a walk-through for first time git (or github) users. I'd take on the task, but I need to play back the step-by-step as I write, so I need time for the task. A workflow decision must be done first, however. There might also be an existing guide on the 'Net for those migrating from svn into the selected git workflow. -kkm > -----Original Message----- > From: Phil Garner [mailto:Phi...@id...] > Sent: 2015-04-05 0622 > To: kal...@li... > Subject: Re: [Kaldi-developers] Testing Github repository > > > So far, I pushed only the trunk and stable branch -- I'm not sure if > > import of all sandboxes makes sense. > > If I understand the concept of sandboxes well, then in git they may > better correspond to branches in user clones of the repository. That > is, the person responsible for the sandbox clones the trunk repo and > commits to a branch as they wish. Github makes this very easy. > > -- > Phil Garner > http://www.idiap.ch/~pgarner > > > ----------------------------------------------------------------------- > ------- > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub > for all > things parallel software development, from weekly thought leadership > blogs to > news, videos, case studies, tutorials and more. Take a look and join > the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Phil G. <Phi...@id...> - 2015-04-05 13:35:26
|
> So far, I pushed only the trunk and stable branch -- I'm not sure if import of > all sandboxes makes sense. If I understand the concept of sandboxes well, then in git they may better correspond to branches in user clones of the repository. That is, the person responsible for the sandbox clones the trunk repo and commits to a branch as they wish. Github makes this very easy. -- Phil Garner http://www.idiap.ch/~pgarner |
From: Jan T. <jt...@gm...> - 2015-04-03 16:33:16
|
Guys, I've finished "testing" push of the git repo to github and I'd like to hear your comments opinions on whether the migration process could/shoul be improved -- I guess lots of you have more extensive experience with git than I have. The Kaldi organization is here: https://github.com/kaldi-asr/ and the testing repository is https://github.com/kaldi-asr/test-migration-2.git So far, I pushed only the trunk and stable branch -- I'm not sure if import of all sandboxes makes sense. For conversion, I used the svn2git: svn2git -v https://svn.code.sf.net/p/kaldi/code/ --trunk /trunk --branches /sandbox --notags --authors ../kaldi-devs-map-tmp.txt -m after this finished, I added line "fetch = stable:refs/remotes/svn/stable" into the .git/config file so it looks like this: [core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [svn-remote "svn"] url = https://svn.code.sf.net/p/kaldi/code fetch = trunk:refs/remotes/svn/trunk fetch = stable:refs/remotes/svn/stable branches = sandbox/*:refs/remotes/svn/* [svn] authorsfile = ../kaldi-devs-map-tmp.txt then I did git svn fetch # to get the stable branch fetched git checkout -b "stable" "remotes/svn/stable" #to convert the remote to local git remote add origin https://github.com/kaldi-asr/test-migration-2.git #add target of push git push -u origin master #to push the master to github git push -u origin stable #to push the "stable" branch to github y. |
From: Daniel P. <dp...@gm...> - 2015-04-01 21:31:02
|
> > However, I am not so sure how to implement the Backward algorithm, since I > must traverse the edges in the FST backwards (to do the backward pass in > O(T * (V + E))), and OpenFST does not support this AFAIK. Also, I am not > sure if simply transposing the FST would work, since I would have many > initial states... Any suggestion on that? > It should be possible to handle this by arranging your code in the right way, e.g. first iterate over the start-state of the arc. > > This would create a difficulty for converting alignments though (this > happens > > when bootstrapping later systems, e.g. starting tri2 from tri1). You > would > > probably have to just to Viterbi for that one stage. > > I am not sure what you mean. Could you extend your explanation or point to > a recipe where you had to overcome this difficulty? > I mean the program convert-ali wouldn't work unless you had alignments. Dan > Many thanks for your help and advices, > > Joan Puigcerver. > |
From: Joan P. <joa...@gm...> - 2015-04-01 21:19:08
|
> Personally I suspect that you would not get any improvement from doing E-M as > opposed to Viterbi-- the posteriors tend to be pretty peaky anyway. I understand, but precisely my intuition is that the posteriors are not so peaky for my particular application. Of course, I need to check this experimentally and check whether or not EM is worth. > If you are concerned about the randomness of initialization you could always > duplicate your training examples several times, so several different random > paths will be taken. AFAIK the current implementation of Viterbi is deterministic, so even if a copy the training samples, I would get the same path always. Anyway, I think it is a good suggestion before implementing EM or for validating it: instead of doing the Viterbi alignment, I could randomly sample a path with probability proportional to its likelihood and do the data replication trick. With many replicas of each training sample, this should be similar to EM. > Also, E-M training would be at least ten times slower- probably closer to 100 > times slower depending what tricks like pruning you know how to implement. I am aware of this. The first thing after the vanilla-EM that I wanted to implement is beam-search in EM, as HTK does. Anyhow, the data replication trick would also increase the computational cost. > If you really did want to do E-M training, the way to do this would probably to > implement, instead of Viterbi, some kind of forward-backward algorithm that > would directly output posteriors over transition-ids. This is what I intended to do. In fact, I already sketched a Forward algorithm that does this. It needs more work (debugging, beam search, ...) but it seems to work with toy examples. However, I am not so sure how to implement the Backward algorithm, since I must traverse the edges in the FST backwards (to do the backward pass in O(T * (V + E))), and OpenFST does not support this AFAIK. Also, I am not sure if simply transposing the FST would work, since I would have many initial states... Any suggestion on that? > This would create a difficulty for converting alignments though (this happens > when bootstrapping later systems, e.g. starting tri2 from tri1). You would > probably have to just to Viterbi for that one stage. I am not sure what you mean. Could you extend your explanation or point to a recipe where you had to overcome this difficulty? Many thanks for your help and advices, Joan Puigcerver. |
From: Daniel P. <dp...@gm...> - 2015-04-01 17:19:59
|
> I know Kaldi developers have been advocating for Viterbi training > instead of Baum-Welch training for a long time. At least, that is what > I get from the documentation and slides that I found on the Internet. > Kaldi training is based on Viterbi because it's more efficient, no worse than E-M (for speech applications), and much easier to integrate with FSTs. Personally I suspect that you would not get any improvement from doing E-M as opposed to Viterbi-- the posteriors tend to be pretty peaky anyway. If you are concerned about the randomness of initialization you could always duplicate your training examples several times, so several different random paths will be taken. But I think this will make no difference. Also, E-M training would be at least ten times slower- probably closer to 100 times slower depending what tricks like pruning you know how to implement. If you really did want to do E-M training, the way to do this would probably to implement, instead of Viterbi, some kind of forward-backward algorithm that would directly output posteriors over transition-ids. This would create a difficulty for converting alignments though (this happens when bootstrapping later systems, e.g. starting tri2 from tri1). You would probably have to just to Viterbi for that one stage. You'd want to either store the posteriors on disk (maybe pruned a bit), or pipe them into stats-accumulation programs. I'm not aware that anyone has done this. Dan > However, I need EM training for a couple of reasons and I think it may > still be useful for others in some cases. > > I work on handwriting text recognition and our HMMs are much simpler > than those used in ASR (we don't use context-dependent models, for > instance). Until today, what I was doing was training my models using > HTK, and then converting the HTK models to Kaldi's format. But this is > a pain in the ass, because: > > 1. My script makes strong assumptions that work for me, but could not > be true if I change the HMM's topology. > 2. It requires to have installed both HTK and Kaldi, and it would be > much nicer to have a single tool. > 3. HTK does not support a feature that I need during training: using a > FST as a "transcription" during training. Kaldi seems to support this > (see compile-train-graphs-fsts), although it does not support EM > training, and I think that EM would make an important difference for > my particular application (*). > > I have been playing with the Kaldi source code for a while, and I > thought I could implement EM-training for Kaldi. But before starting > to code anything, I wanted to now if somebody else has worked/is > working on that, share my thoughts on how to do this and listen to > some ideas or advices from the Kaldi pro developers. > > First of all, the traditional Baum-Welch recipe for HMM EM-training > has to be adapted to work with transition-ids. I have not derived this > formally, but the only thing I need is to compute the average number > of times each transition-id is traversed. Of course, this is quite > easy to do in the case of Viterbi-training, since we only consider the > 1-best path and we just need to count the number of times each > transition-id is visited. If we could "expand" all possible paths, we > would just need to average the count of a particular transition-id in > each path, with the posterior probability of that path. > Forward-Backward algorithm can do this without the need of "expanding" > the transition-id paths. > > Once the transition-id average counts have been computed, updating the > parameters of the model should be easy, since pdf-id, state-id, etc > can be recovered from the transition-id. If I'm not wrong, the code in > TransitionModel::MleUpdate and MleDiagGmmUpdate should work without > any change. > > (*) Why do I (think I) need EM? Well, in my application the supervised > data is scarce and noisy. However, for each input line I have > transcriptions coming from different humans. My idea is to encode > these multiple transcriptions using a FST, and use EM algorithm > training on this FST. I could use Viterbi training as well, but > initialization is critical for Viterbi align-equal-compiled is likely > to produce a bad alignment in this case. Using EM, I cant take full > advantage from the labeling (at least, that is what I hope). > > In summary: > - Does anyone have tried to implement EM in Kaldi? If so, is the code > publicly available? > - What are you thoughts on this? > > Cheers, > Joan Puigcerver. > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Joan P. <joa...@gm...> - 2015-04-01 14:44:51
|
Hi, I know Kaldi developers have been advocating for Viterbi training instead of Baum-Welch training for a long time. At least, that is what I get from the documentation and slides that I found on the Internet. However, I need EM training for a couple of reasons and I think it may still be useful for others in some cases. I work on handwriting text recognition and our HMMs are much simpler than those used in ASR (we don't use context-dependent models, for instance). Until today, what I was doing was training my models using HTK, and then converting the HTK models to Kaldi's format. But this is a pain in the ass, because: 1. My script makes strong assumptions that work for me, but could not be true if I change the HMM's topology. 2. It requires to have installed both HTK and Kaldi, and it would be much nicer to have a single tool. 3. HTK does not support a feature that I need during training: using a FST as a "transcription" during training. Kaldi seems to support this (see compile-train-graphs-fsts), although it does not support EM training, and I think that EM would make an important difference for my particular application (*). I have been playing with the Kaldi source code for a while, and I thought I could implement EM-training for Kaldi. But before starting to code anything, I wanted to now if somebody else has worked/is working on that, share my thoughts on how to do this and listen to some ideas or advices from the Kaldi pro developers. First of all, the traditional Baum-Welch recipe for HMM EM-training has to be adapted to work with transition-ids. I have not derived this formally, but the only thing I need is to compute the average number of times each transition-id is traversed. Of course, this is quite easy to do in the case of Viterbi-training, since we only consider the 1-best path and we just need to count the number of times each transition-id is visited. If we could "expand" all possible paths, we would just need to average the count of a particular transition-id in each path, with the posterior probability of that path. Forward-Backward algorithm can do this without the need of "expanding" the transition-id paths. Once the transition-id average counts have been computed, updating the parameters of the model should be easy, since pdf-id, state-id, etc can be recovered from the transition-id. If I'm not wrong, the code in TransitionModel::MleUpdate and MleDiagGmmUpdate should work without any change. (*) Why do I (think I) need EM? Well, in my application the supervised data is scarce and noisy. However, for each input line I have transcriptions coming from different humans. My idea is to encode these multiple transcriptions using a FST, and use EM algorithm training on this FST. I could use Viterbi training as well, but initialization is critical for Viterbi align-equal-compiled is likely to produce a bad alignment in this case. Using EM, I cant take full advantage from the labeling (at least, that is what I hope). In summary: - Does anyone have tried to implement EM in Kaldi? If so, is the code publicly available? - What are you thoughts on this? Cheers, Joan Puigcerver. |
From: Jan T. <jt...@gm...> - 2015-03-25 17:07:09
|
arnab13@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 danielpovey@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 katakombi@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 mhannemann@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 ngoel17@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 vesis84@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 yanminqian@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 bouliagi@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 motlicek@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 sikoried@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 ijanda@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 cweng@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 ndjaitly@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 thangvu@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 vpanayotov@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 chenguoguo@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 yajiemiao@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 funous@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 londel@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 jtrmal@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 matthewaylett@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 evariani@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 hhx502@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 pegita@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 xiaohuizhang@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 gauravkumar87@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 agrawalneha@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 bbabaali@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 rickychanhoyin@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 oplatek@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 korzinek@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 rwilliamspankoi@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 sudheerkovela@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 talumae@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 david-snyder@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 edobashira@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 hainan-xv@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 er1k27@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 dimseng@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 nshmyrev@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 shi-wei@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 syq163@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 vimalmanohar91@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 ahhentz@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 canbaz@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 ahmaksod1@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 akirkedal@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 kzmolikova@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 agospodinov@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 leixin@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 pswietojanski@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 vpeddin@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 sw005320@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 wendy722@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 qipeng@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 ilyaedrenkin@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 eitakvn@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 tomkocse@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 mbait@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 alex1glubshev@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8 |
From: Charles C. <cha...@nv...> - 2015-03-23 18:56:44
|
nVoq in Boulder, Colorado, has an opening for a summer intern (undergraduate or graduate) who is interested in working on applying Kaldi to some real world problems. Previous experience with Kaldi is a definite plus, but we would also consider students with experience using other speech recognition toolkits. Please contact Sherin Tedeschi (she...@nv... <mailto:she...@nv...> ) and attach a bio. |
From: Rouzbeh <rou...@bi...> - 2015-03-21 23:59:25
|
Hello dear admin, This is my email(rou...@bi...), please add me to the list. Thanks for your cool software and all the inspirations. Many thanks, Rouzbeh |
From: Etienne M. <et...@ac...> - 2015-03-14 00:06:51
|
We've been using dial tone removal here for a few months with good results. Doesn't implement Goertzel, but removes about 98% of dial tone frames across a wide range of tones frequencies. It removes extremely little speech (< 0.1%). It could easily be extended to DTMF tones by allowing a second energy peak within 400 - 1100 hz of the first and restricting both peaks to 600 - 1700 hz. It's in python, but someone who knows more C than I do could port it over. On Fri, Mar 13, 2015 at 3:55 PM, Daniel Povey <dp...@gm...> wrote: > Interesting info. > (I was getting ready to reply to Vijay's note and just saw your reply). > Nagendra Goel also told me that the Goertzel algorithm is standard. It's > good to know that DTMF will not overlap with speech. The timeframe is not > super urgent, maybe a month or two-- it depends who we can get to help on > this. If you can help it would be great- even just sketching a proposal > for an interface. Remember that we may sometimes want to do this in > realtime settings, so minimizing the added latency is a factor. > Dan > > > On Fri, Mar 13, 2015 at 6:51 PM, Kirill Katsnelson < > kir...@sm...> wrote: > >> An almost default way to detect DTMF tones is the Goertzel filter < >> https://en.wikipedia.org/w/index.php?title=Goertzel_algorithm&oldid=401512443> >> (somebody rewrote the article since then making it totally indigestible, so >> here's a 2010 version). Generally, one filter per DTMF tone in the 4×4 grid >> (8 total) are run in parallel. Somewhat akin to the notch filterbank >> proposal by Vijayaditya, only much cheaper to compute, a 2-nd order IIR >> each. >> >> Be aware that there is no "speech on top" of DTMF tones. It either one or >> another. In telephony, only one is normally transmitted. A summation may >> somehow happen on an analog trunk, but in all digital protocols I am aware >> of, the payload can be only one of the two types, not both. So we can >> silence tones, but usually not restore speech interrupted by them. >> >> I'd love to help, but I am horrifically swamped in things currently. What >> is the timeframe that you have in mind? >> >> > -----Original Message----- >> > From: Daniel Povey [mailto:dp...@gm...] >> > Sent: 2015-03-12 2205 >> > To: kal...@li... >> > Subject: [Kaldi-developers] DTMF and dialtone detection >> > >> > Everyone, >> > >> > Something that is sometimes needed is code to detect DTMF tones and >> > dial tones. It would be good to have the option to remove them from >> > the audio in addition to detecting them (so we can correctly process >> > speech that occurs on top of DTMF tones). And this ideally should be >> > done in an algorithm which can be in principle applied online, as the >> > signal comes in. >> > Does anyone want to help with this? If so, you might want to draft an >> > interface for this. >> > >> > Dan >> >> > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Daniel P. <dp...@gm...> - 2015-03-13 22:55:56
|
Interesting info. (I was getting ready to reply to Vijay's note and just saw your reply). Nagendra Goel also told me that the Goertzel algorithm is standard. It's good to know that DTMF will not overlap with speech. The timeframe is not super urgent, maybe a month or two-- it depends who we can get to help on this. If you can help it would be great- even just sketching a proposal for an interface. Remember that we may sometimes want to do this in realtime settings, so minimizing the added latency is a factor. Dan On Fri, Mar 13, 2015 at 6:51 PM, Kirill Katsnelson < kir...@sm...> wrote: > An almost default way to detect DTMF tones is the Goertzel filter < > https://en.wikipedia.org/w/index.php?title=Goertzel_algorithm&oldid=401512443> > (somebody rewrote the article since then making it totally indigestible, so > here's a 2010 version). Generally, one filter per DTMF tone in the 4×4 grid > (8 total) are run in parallel. Somewhat akin to the notch filterbank > proposal by Vijayaditya, only much cheaper to compute, a 2-nd order IIR > each. > > Be aware that there is no "speech on top" of DTMF tones. It either one or > another. In telephony, only one is normally transmitted. A summation may > somehow happen on an analog trunk, but in all digital protocols I am aware > of, the payload can be only one of the two types, not both. So we can > silence tones, but usually not restore speech interrupted by them. > > I'd love to help, but I am horrifically swamped in things currently. What > is the timeframe that you have in mind? > > > -----Original Message----- > > From: Daniel Povey [mailto:dp...@gm...] > > Sent: 2015-03-12 2205 > > To: kal...@li... > > Subject: [Kaldi-developers] DTMF and dialtone detection > > > > Everyone, > > > > Something that is sometimes needed is code to detect DTMF tones and > > dial tones. It would be good to have the option to remove them from > > the audio in addition to detecting them (so we can correctly process > > speech that occurs on top of DTMF tones). And this ideally should be > > done in an algorithm which can be in principle applied online, as the > > signal comes in. > > Does anyone want to help with this? If so, you might want to draft an > > interface for this. > > > > Dan > > |
From: Kirill K. <kir...@sm...> - 2015-03-13 22:51:24
|
An almost default way to detect DTMF tones is the Goertzel filter <https://en.wikipedia.org/w/index.php?title=Goertzel_algorithm&oldid=401512443> (somebody rewrote the article since then making it totally indigestible, so here's a 2010 version). Generally, one filter per DTMF tone in the 4×4 grid (8 total) are run in parallel. Somewhat akin to the notch filterbank proposal by Vijayaditya, only much cheaper to compute, a 2-nd order IIR each. Be aware that there is no "speech on top" of DTMF tones. It either one or another. In telephony, only one is normally transmitted. A summation may somehow happen on an analog trunk, but in all digital protocols I am aware of, the payload can be only one of the two types, not both. So we can silence tones, but usually not restore speech interrupted by them. I'd love to help, but I am horrifically swamped in things currently. What is the timeframe that you have in mind? > -----Original Message----- > From: Daniel Povey [mailto:dp...@gm...] > Sent: 2015-03-12 2205 > To: kal...@li... > Subject: [Kaldi-developers] DTMF and dialtone detection > > Everyone, > > Something that is sometimes needed is code to detect DTMF tones and > dial tones. It would be good to have the option to remove them from > the audio in addition to detecting them (so we can correctly process > speech that occurs on top of DTMF tones). And this ideally should be > done in an algorithm which can be in principle applied online, as the > signal comes in. > Does anyone want to help with this? If so, you might want to draft an > interface for this. > > Dan |
From: Vijayaditya P. <p.v...@gm...> - 2015-03-13 06:09:18
|
Are you looking for signal processing solutions or model based solutions. For signal processing solutions without making a lot of assumptions about the duration of the tones we can take the approach below. Seeing the DTMF frequency ranges (table below) and given that the requirement is to process speech which has overlap with DTMF tones, I think a good detector can be designed by 1. measuring energy at the pair of DTMF frequencies (for each digit) 2. compare it with energy in the total speech frequency range. (This can be done after the FFT is generated during MFCC extraction to avoid redundant computation.) 3. detect a digit if the energy difference is greater than certain level With regards to elimination of DTMF tones, since we are just interested in clean MFCCs 1. we can just subtract the detected DTMF energies from the corresponding mel-bins after scaling DTMF energies with the mel weight of corresponding frequencies *or* 2. we can run the signal through a notch filter which has notches as the two frequencies corresponding to the digit, for the duration of the digit, and extract MFCCs from the filtered signal spectrogram for 112163_112196_11#9632_##9696 (audio file attached): DTMF keypad frequencies (with sound clips)1209 Hz1336 Hz1477 Hz1633 Hz697 Hz 1 <http://upload.wikimedia.org/wikipedia/commons/b/bf/Dtmf1.ogg>2 <http://upload.wikimedia.org/wikipedia/commons/7/7d/Dtmf2.ogg>3 <http://upload.wikimedia.org/wikipedia/commons/2/28/Dtmf3.ogg>A <http://upload.wikimedia.org/wikipedia/commons/d/d5/DtmfA.ogg>770 Hz4 <http://upload.wikimedia.org/wikipedia/commons/9/9f/Dtmf4.ogg>5 <http://upload.wikimedia.org/wikipedia/commons/1/1c/Dtmf5.ogg>6 <http://upload.wikimedia.org/wikipedia/commons/7/7b/Dtmf6.ogg>B <http://upload.wikimedia.org/wikipedia/commons/5/5a/DtmfB.ogg>852 Hz7 <http://upload.wikimedia.org/wikipedia/commons/9/9f/Dtmf7.ogg>8 <http://upload.wikimedia.org/wikipedia/commons/f/f7/Dtmf8.ogg>9 <http://upload.wikimedia.org/wikipedia/commons/5/59/Dtmf9.ogg>C <http://upload.wikimedia.org/wikipedia/commons/9/96/DtmfC.ogg>941 Hz* <http://upload.wikimedia.org/wikipedia/commons/e/e7/DtmfStar.ogg>0 <http://upload.wikimedia.org/wikipedia/commons/2/2d/Dtmf0.ogg># <http://upload.wikimedia.org/wikipedia/commons/c/c4/Dtmf-.ogg>D <http://upload.wikimedia.org/wikipedia/commons/9/99/DtmfD.ogg> On Fri, Mar 13, 2015 at 1:04 AM, Daniel Povey <dp...@gm...> wrote: > Everyone, > > Something that is sometimes needed is code to detect DTMF tones and dial > tones. It would be good to have the option to remove them from the audio > in addition to detecting them (so we can correctly process speech that > occurs on top of DTMF tones). And this ideally should be done in an > algorithm which can be in principle applied online, as the signal comes in. > Does anyone want to help with this? If so, you might want to draft an > interface for this. > > Dan > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Daniel P. <dp...@gm...> - 2015-03-13 05:04:48
|
Everyone, Something that is sometimes needed is code to detect DTMF tones and dial tones. It would be good to have the option to remove them from the audio in addition to detecting them (so we can correctly process speech that occurs on top of DTMF tones). And this ideally should be done in an algorithm which can be in principle applied online, as the signal comes in. Does anyone want to help with this? If so, you might want to draft an interface for this. Dan |
From: Daniel P. <dp...@gm...> - 2015-03-11 23:55:43
|
Hm. It looks to me from http://kaldi.sourceforge.net/data_prep.html like the decision was made not to support that option in the data-directory definition, for the segments file, in order to avoid having to support that option in too many scripts, and for other reasons too. So that is supported only at the command-line level, not the script level. The easiest way to accomplish what you want to do is to define your wav.scp in such a way that instead of just having a filename for each utterance-id, you have a command (e.g. a sox command) that selects the channel you want, followed by a pipe symbol. Or if it doesn't matter for your application, you could just ignore the data-validation error. Dan On Wed, Mar 11, 2015 at 7:23 PM, <Dan...@pa...> wrote: > Thanks for pointing out my error! I appended the channel number, 0 or > 1, to the end of each line in segments. I still get this as a result of > running steps/make_mfcc.sh: > > > > > steps/make_mfcc.sh --nj 4 --cmd "$train_cmd" data/train > exp/make_mfcc/train $mfccdir > > steps/make_mfcc.sh --nj 4 --cmd run.pl data/train exp/make_mfcc/train > /usr/share/capa/data/kaldi-scripts/TESCO/data/feats > > Bad line in segments file S00000R00001-000005-000049 R00001 0.5 4.9 0 > > utils/validate_data_dir.sh: badly formatted segments file > > > > I think the problem is in lines 129 – 135 of kaldi-trunk/egs/wsj/s5/utils/ > validate_data_dir.sh > > > > if [ -f $data/segments ]; then > > > > check_sorted_and_uniq $data/segments > > # We have a segments file -> interpret wav file as "recording-ids" not > utterance-ids. > > ! cat $data/segments | \ > > awk '{if (NF != 4 || ($4 <= $3 && $4 != -1)) { print "Bad line in > segments file", $0; exit(1); }}' && \ > > echo "$0: badly formatted segments file" && exit 1; > > > > The problem seems to be that appending the channel number to each segments > line boosts the number of fields to 5. Is there another way I should have > added the channel number? > > > > Thanks again, > > > > Dan > > > > *From:* Daniel Povey [mailto:dp...@gm...] > *Sent:* Tuesday, March 10, 2015 9:14 PM > *To:* Davies, Dan <Dan...@pa...> > *Cc:* kal...@li... > *Subject:* Re: [Kaldi-developers] Issue with dual channel recordings > > > > I see that the documentation in extract-segments is not quite right. The > channel is supposed to be 0 for left and 1 for right. > Dan > > > > > > On Wed, Mar 11, 2015 at 12:10 AM, Daniel Povey <dp...@gm...> wrote: > > It's not the number of channels you need in the segments file, but the > identity of the channel-- I think it's probably supposed to be 0 or 1, > depending which channel you want. Maybe that's why validate_data_dir.pl > is failing, because 2 is not an expected channel id. If you want to sum > the channels, then do that manually by having a command ending with a pipe > symbol in the wav.scp file. > > Dan > > > > > > On Wed, Mar 11, 2015 at 12:07 AM, <Dan...@pa...> wrote: > > Hi, > > > > Our .wav files have two channels. If I don’t do anything special, > src/featbin/extract-segments says I need to put the number of channels into > the segments file. So far as I can tell, this means appending a “2” after > the stop time in each line in segments. When I append the “2”, > validate_data_dir.sh complains that the segments file is malformed because > lines in segments have 5 fields instead of 4. All this happens as a result > of calling steps/make_mfcc.sh. > > > > Am I doing something screwy? > > > > Dan > > > > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > > > |
From: <Dan...@pa...> - 2015-03-11 23:23:11
|
Thanks for pointing out my error! I appended the channel number, 0 or 1, to the end of each line in segments. I still get this as a result of running steps/make_mfcc.sh: > steps/make_mfcc.sh --nj 4 --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir steps/make_mfcc.sh --nj 4 --cmd run.pl data/train exp/make_mfcc/train /usr/share/capa/data/kaldi-scripts/TESCO/data/feats Bad line in segments file S00000R00001-000005-000049 R00001 0.5 4.9 0 utils/validate_data_dir.sh: badly formatted segments file I think the problem is in lines 129 – 135 of kaldi-trunk/egs/wsj/s5/utils/ validate_data_dir.sh if [ -f $data/segments ]; then check_sorted_and_uniq $data/segments # We have a segments file -> interpret wav file as "recording-ids" not utterance-ids. ! cat $data/segments | \ awk '{if (NF != 4 || ($4 <= $3 && $4 != -1)) { print "Bad line in segments file", $0; exit(1); }}' && \ echo "$0: badly formatted segments file" && exit 1; The problem seems to be that appending the channel number to each segments line boosts the number of fields to 5. Is there another way I should have added the channel number? Thanks again, Dan From: Daniel Povey [mailto:dp...@gm...] Sent: Tuesday, March 10, 2015 9:14 PM To: Davies, Dan <Dan...@pa...> Cc: kal...@li... Subject: Re: [Kaldi-developers] Issue with dual channel recordings I see that the documentation in extract-segments is not quite right. The channel is supposed to be 0 for left and 1 for right. Dan On Wed, Mar 11, 2015 at 12:10 AM, Daniel Povey <dp...@gm...<mailto:dp...@gm...>> wrote: It's not the number of channels you need in the segments file, but the identity of the channel-- I think it's probably supposed to be 0 or 1, depending which channel you want. Maybe that's why validate_data_dir.pl<http://validate_data_dir.pl> is failing, because 2 is not an expected channel id. If you want to sum the channels, then do that manually by having a command ending with a pipe symbol in the wav.scp file. Dan On Wed, Mar 11, 2015 at 12:07 AM, <Dan...@pa...<mailto:Dan...@pa...>> wrote: Hi, Our .wav files have two channels. If I don’t do anything special, src/featbin/extract-segments says I need to put the number of channels into the segments file. So far as I can tell, this means appending a “2” after the stop time in each line in segments. When I append the “2”, validate_data_dir.sh complains that the segments file is malformed because lines in segments have 5 fields instead of 4. All this happens as a result of calling steps/make_mfcc.sh. Am I doing something screwy? Dan ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Kaldi-developers mailing list Kal...@li...<mailto:Kal...@li...> https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Daniel P. <dp...@gm...> - 2015-03-11 21:24:48
|
Hi, I don't recall exactly how much memory that arpa conversion for the 4-gram takes, but I guess it must be more than 32G. Just skip the 4-gram models for now. Regarding the patches: please send them to me at dp...@gm.... After I look at them we'll discuss the best way to get them checked in. Guoguo, I just had a look at the code for the const-arpa LM conversion and I noticed a couple of things that can be improved in const-arpa-lm.cc, for both speed and memory consumption. Here: KALDI_ASSERT(seq_to_state_.find(hist) != seq_to_state_.end()); seq_to_state_[hist]->AddChild(std::make_pair(word, lm_state)); it would be better to re-use the iterator returned by find in order to avoid two consecutive lookups in the seq_to_state_ array. In addition, you could have a variable "std::vector<int32> cur_hist" equal to the previous history-state, and if hist is the same as cur_hist you can avoid the associative array lookup- this might be a little faster in the normal case (worth testing though). Secondly about memory consumption, I noticed that you create this: LmState *lm_state = new LmState(is_unigram, logprob, backoff_logprob); regardless of whether order == ngram_order_ or not. (note: I would rename the variable order to cur_order). If order == ngram_order_, there is no reason to allocate this or to insert it into the seq_to_state_ table. This is probably responsible for the bulk of the memory consumption. Dan On Wed, Mar 11, 2015 at 4:42 PM, Kirill Katsnelson < kir...@sm...> wrote: > I am running quite out of RAM in arpa-to-const-arpa in librispeech/s5 for > the 4-gram model. > > The input argument to arpa-to-const-arpa is the massaged data from > data/local/lm/lm_fglarge.arpa.gz (61M 4-grams additional). > > The 3-gram file passed with ~16G of peak commit size. The 4-gram crashed > with OOM overnight, running out of 32G available to it. > > What memory usage should I expect? > > On the windows port side: pipes fixed, build system works, and I have > advanced as far as decoding the unigram GMM model through the recipe. The > troubles I am getting are from Cygwin files. First, absolute paths do not > work, as files in Cygwin are essentially chrooted to a virtual root path. > Second of all, links do not work. I am tweaking the scripts so far to get > past the problems, but there is a general solution to handle the paths in > code. I want my experiments gone through first however, already spending a > lot of time on the technical stuff. > > Progress is here <https://github.com/kkm000/kaldi/compare/winbuild>, but > the history is messy, I'll itemize it. > > I fixed a weird bug that would not be caught with gcc because of > constructor elision, and also supported WAVEFORMATEXTENSIBLE in wave files > (my flac 1.3.1 sends this format to stdout). How can I send patches for > these changes? Let's start with windows-unrelated patches now. > > -kkm > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Kirill K. <kir...@sm...> - 2015-03-11 20:58:49
|
I am running quite out of RAM in arpa-to-const-arpa in librispeech/s5 for the 4-gram model. The input argument to arpa-to-const-arpa is the massaged data from data/local/lm/lm_fglarge.arpa.gz (61M 4-grams additional). The 3-gram file passed with ~16G of peak commit size. The 4-gram crashed with OOM overnight, running out of 32G available to it. What memory usage should I expect? On the windows port side: pipes fixed, build system works, and I have advanced as far as decoding the unigram GMM model through the recipe. The troubles I am getting are from Cygwin files. First, absolute paths do not work, as files in Cygwin are essentially chrooted to a virtual root path. Second of all, links do not work. I am tweaking the scripts so far to get past the problems, but there is a general solution to handle the paths in code. I want my experiments gone through first however, already spending a lot of time on the technical stuff. Progress is here <https://github.com/kkm000/kaldi/compare/winbuild>, but the history is messy, I'll itemize it. I fixed a weird bug that would not be caught with gcc because of constructor elision, and also supported WAVEFORMATEXTENSIBLE in wave files (my flac 1.3.1 sends this format to stdout). How can I send patches for these changes? Let's start with windows-unrelated patches now. -kkm |
From: Daniel P. <dp...@gm...> - 2015-03-11 04:19:21
|
I see that the documentation in extract-segments is not quite right. The channel is supposed to be 0 for left and 1 for right. Dan On Wed, Mar 11, 2015 at 12:10 AM, Daniel Povey <dp...@gm...> wrote: > It's not the number of channels you need in the segments file, but the > identity of the channel-- I think it's probably supposed to be 0 or 1, > depending which channel you want. Maybe that's why validate_data_dir.pl > is failing, because 2 is not an expected channel id. If you want to sum > the channels, then do that manually by having a command ending with a pipe > symbol in the wav.scp file. > Dan > > > On Wed, Mar 11, 2015 at 12:07 AM, <Dan...@pa...> wrote: > >> Hi, >> >> >> >> Our .wav files have two channels. If I don’t do anything special, >> src/featbin/extract-segments says I need to put the number of channels into >> the segments file. So far as I can tell, this means appending a “2” after >> the stop time in each line in segments. When I append the “2”, >> validate_data_dir.sh complains that the segments file is malformed because >> lines in segments have 5 fields instead of 4. All this happens as a result >> of calling steps/make_mfcc.sh. >> >> >> >> Am I doing something screwy? >> >> >> >> Dan >> >> >> >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming The Go Parallel Website, >> sponsored >> by Intel and developed in partnership with Slashdot Media, is your hub >> for all >> things parallel software development, from weekly thought leadership >> blogs to >> news, videos, case studies, tutorials and more. Take a look and join the >> conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> > |
From: Daniel P. <dp...@gm...> - 2015-03-11 04:10:55
|
It's not the number of channels you need in the segments file, but the identity of the channel-- I think it's probably supposed to be 0 or 1, depending which channel you want. Maybe that's why validate_data_dir.pl is failing, because 2 is not an expected channel id. If you want to sum the channels, then do that manually by having a command ending with a pipe symbol in the wav.scp file. Dan On Wed, Mar 11, 2015 at 12:07 AM, <Dan...@pa...> wrote: > Hi, > > > > Our .wav files have two channels. If I don’t do anything special, > src/featbin/extract-segments says I need to put the number of channels into > the segments file. So far as I can tell, this means appending a “2” after > the stop time in each line in segments. When I append the “2”, > validate_data_dir.sh complains that the segments file is malformed because > lines in segments have 5 fields instead of 4. All this happens as a result > of calling steps/make_mfcc.sh. > > > > Am I doing something screwy? > > > > Dan > > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: <Dan...@pa...> - 2015-03-11 04:07:16
|
Hi, Our .wav files have two channels. If I don’t do anything special, src/featbin/extract-segments says I need to put the number of channels into the segments file. So far as I can tell, this means appending a “2” after the stop time in each line in segments. When I append the “2”, validate_data_dir.sh complains that the segments file is malformed because lines in segments have 5 fields instead of 4. All this happens as a result of calling steps/make_mfcc.sh. Am I doing something screwy? Dan |