From: Vassil P. <vas...@gm...> - 2012-06-15 12:07:33
|
Hi Robert, I think the VoxForge recipe is a very good idea. I am not sure what language model should be used for the decoding part of the recipe, but since it will be just a demo system with free data, it doesn't matter that much and probably just training a "cheating" LM on all transcripts is OK. I agree with Arnab, that it would be the best if you base your recipe on wsj/s5. In fact I think it shouldn't be very difficult to modify it to use the VoxForge data, because for example in egs/rm/s5 just the data normalization scripts("local/") are specific to RM, and steps/ and utils/ directories are just symlinks to wsj/s5. Vassil On Fri, Jun 15, 2012 at 2:21 PM, Arnab Ghoshal <ar...@gm...> wrote: > On Mon, Jun 11, 2012 at 3:09 PM, Robert Mullins <r_p...@ya...> wrote: >> I downloaded and installed the Kaldi code on my machine, however I do not >> have the data from the LDC RM discs. I was just wondering whether anyone has >> used the voxforge speech corpus for training your system? And if not is >> there a reason and would it be a worthwhile exercise? > > Hi Robert, > > you could use the egs/rm/s4 recipe that uses a free subset of RM. Here > is a blog post about it: > http://vpanayotov.blogspot.co.uk/2012/02/poor-mans-kaldi-recipe-setup.html > > There may be people on this list who have used voxforge. You are > certainly welcome to contribute a recipe for it. > >> I would also be interested in seeing an example of how to use the kaldi >> system as an end to end speech recognition system > > That's exactly what Kaldi is meant to be. Once you go through the > steps of rm/s4 you will get a hang of it. If you want to start > building a voxforge recipe, I will recommend that you start from > egs/wsj/s5. > > -Arnab |