|
From: Simon K. <sim...@gm...> - 2014-03-06 19:11:33
|
Hi, A colleague of mine experimented with some 'exotic' feature vectors using Matlab, and now we would like to see how the pretty great Kaldi tools might be used to train some model using them. I believe, the clean way to do it, would be to write a routine that creates these features using the Kaldi libraries, and then writing them to an archive. However, I fear this will involve quite some work, and as we do not know if it will be an endeavor worth the effort, we would like to start off to export the features in a Kaldi readable format from Matlab. This so far seemed the smaller effort. I tried to find out about the way those files are structured, but got lost somewhere on the way. Looking into compute-mfcc-feats.cc, I saw that there is: BaseFloatMatrixWriter kaldi_writer; which is later used to write the archive: kaldi_writer.Write(utt, features); Trying to find what this call actually does, I got lost. I found this: http://kaldi.sourceforge.net/group__table__types.html#gaa9b0c000a2d8bbf1a7df386024110883 and from there this: http://kaldi.sourceforge.net/table-types_8h_source.html#l00036 and then eventually this: http://kaldi.sourceforge.net/classkaldi_1_1TableWriter.html I however could not yet find anything I could use to understand the particular format of the archive file of feature vectors. The scp file should be straightforward, but I hope someone of you could point me to the right resource to learn how to write the matrices of a set of features in the correct archive format. Perhaps doing a detour through non-binary files might be a way to get there, but this surely would be very unfavorable. Thanks a lot, Simon |
|
From: Daniel P. <dp...@gm...> - 2014-03-06 19:15:17
|
It's actually trivial when you know how. The text version of the archive format is just the utterance-id, then, starting on the same line, the Matlab form of the matrix, then a newline. for instance utt1 [ 0 2 3 1 3 4 ] utt2 [ 9 8 7 6 4 2 ] etc. So just put them in a file foo and read them with ark:foo You can then put them in binary format with an associated scp by doing copy-feats ark:foo ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp and you can copy /some/dir/my_features.scp as data/<something>/feats.scp and use them. or as a pipe you can do <matlab script> | copy-feats ark:- ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp Dan On Thu, Mar 6, 2014 at 2:11 PM, Simon Klüpfel <sim...@gm...>wrote: > Hi, > > A colleague of mine experimented with some 'exotic' feature vectors > using Matlab, and now we would like to see how the pretty great Kaldi > tools might be used to train some model using them. > > I believe, the clean way to do it, would be to write a routine that > creates these features using the Kaldi libraries, and then writing them > to an archive. However, I fear this will involve quite some work, and as > we do not know if it will be an endeavor worth the effort, we would like > to start off to export the features in a Kaldi readable format from > Matlab. This so far seemed the smaller effort. > > I tried to find out about the way those files are structured, but got > lost somewhere on the way. > > Looking into compute-mfcc-feats.cc, I saw that there is: > > BaseFloatMatrixWriter kaldi_writer; > > which is later used to write the archive: > > kaldi_writer.Write(utt, features); > > Trying to find what this call actually does, I got lost. > > I found this: > > > http://kaldi.sourceforge.net/group__table__types.html#gaa9b0c000a2d8bbf1a7df386024110883 > > and from there this: > > http://kaldi.sourceforge.net/table-types_8h_source.html#l00036 > > and then eventually this: > > http://kaldi.sourceforge.net/classkaldi_1_1TableWriter.html > > > I however could not yet find anything I could use to understand the > particular format of the archive file of feature vectors. > > The scp file should be straightforward, but I hope someone of you could > point me to the right resource to learn how to write the matrices of a > set of features in the correct archive format. > > Perhaps doing a detour through non-binary files might be a way to get > there, but this surely would be very unfavorable. > > Thanks a lot, > > Simon > > > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to > Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN optimization and > the > freedom to use Git, Perforce or both. Make the move to Perforce. > > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Simon K. <sim...@gm...> - 2014-03-06 19:23:12
|
Thanks Dan, an instant help, as usual! So I guess that will be the way to go for us for now. When my colleague asked me about this, I feared to have to deal with the endian and what-else when writing the files in binary from within Matlab. But I now see that if we have the coefficients printed out with (over-)sufficient precision, we will lose no relevant accuracy. So, this turned out easier than expected. All the best, Simon On 03/06/2014 07:15 PM, Daniel Povey wrote: > It's actually trivial when you know how. > The text version of the archive format is just the utterance-id, then, > starting on the same line, the Matlab form of the matrix, then a > newline. for instance > > utt1 [ 0 2 3 > 1 3 4 ] > utt2 [ 9 8 7 > 6 4 2 ] > etc. > > So just put them in a file foo and read them with ark:foo > You can then put them in binary format with an associated scp by doing > > copy-feats ark:foo > ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp > > and you can copy /some/dir/my_features.scp as data/<something>/feats.scp > and use them. > > or as a pipe you can do > > <matlab script> | copy-feats ark:- > ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp > > Dan > > > > On Thu, Mar 6, 2014 at 2:11 PM, Simon Klüpfel <sim...@gm... > <mailto:sim...@gm...>> wrote: > > Hi, > > A colleague of mine experimented with some 'exotic' feature vectors > using Matlab, and now we would like to see how the pretty great Kaldi > tools might be used to train some model using them. > > I believe, the clean way to do it, would be to write a routine that > creates these features using the Kaldi libraries, and then writing them > to an archive. However, I fear this will involve quite some work, and as > we do not know if it will be an endeavor worth the effort, we would like > to start off to export the features in a Kaldi readable format from > Matlab. This so far seemed the smaller effort. > > I tried to find out about the way those files are structured, but got > lost somewhere on the way. > > Looking into compute-mfcc-feats.cc, I saw that there is: > > BaseFloatMatrixWriter kaldi_writer; > > which is later used to write the archive: > > kaldi_writer.Write(utt, features); > > Trying to find what this call actually does, I got lost. > > I found this: > > http://kaldi.sourceforge.net/group__table__types.html#gaa9b0c000a2d8bbf1a7df386024110883 > > and from there this: > > http://kaldi.sourceforge.net/table-types_8h_source.html#l00036 > > and then eventually this: > > http://kaldi.sourceforge.net/classkaldi_1_1TableWriter.html > > > I however could not yet find anything I could use to understand the > particular format of the archive file of feature vectors. > > The scp file should be straightforward, but I hope someone of you could > point me to the right resource to learn how to write the matrices of a > set of features in the correct archive format. > > Perhaps doing a detour through non-binary files might be a way to get > there, but this surely would be very unfavorable. > > Thanks a lot, > > Simon > > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to > Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN > optimization and the > freedom to use Git, Perforce or both. Make the move to Perforce. > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |