|
From: Simon K. <sim...@gm...> - 2014-03-06 19:23:12
|
Thanks Dan, an instant help, as usual! So I guess that will be the way to go for us for now. When my colleague asked me about this, I feared to have to deal with the endian and what-else when writing the files in binary from within Matlab. But I now see that if we have the coefficients printed out with (over-)sufficient precision, we will lose no relevant accuracy. So, this turned out easier than expected. All the best, Simon On 03/06/2014 07:15 PM, Daniel Povey wrote: > It's actually trivial when you know how. > The text version of the archive format is just the utterance-id, then, > starting on the same line, the Matlab form of the matrix, then a > newline. for instance > > utt1 [ 0 2 3 > 1 3 4 ] > utt2 [ 9 8 7 > 6 4 2 ] > etc. > > So just put them in a file foo and read them with ark:foo > You can then put them in binary format with an associated scp by doing > > copy-feats ark:foo > ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp > > and you can copy /some/dir/my_features.scp as data/<something>/feats.scp > and use them. > > or as a pipe you can do > > <matlab script> | copy-feats ark:- > ark,scp:/some/dir/my_features.ark,/some/dir/my_features.scp > > Dan > > > > On Thu, Mar 6, 2014 at 2:11 PM, Simon Klüpfel <sim...@gm... > <mailto:sim...@gm...>> wrote: > > Hi, > > A colleague of mine experimented with some 'exotic' feature vectors > using Matlab, and now we would like to see how the pretty great Kaldi > tools might be used to train some model using them. > > I believe, the clean way to do it, would be to write a routine that > creates these features using the Kaldi libraries, and then writing them > to an archive. However, I fear this will involve quite some work, and as > we do not know if it will be an endeavor worth the effort, we would like > to start off to export the features in a Kaldi readable format from > Matlab. This so far seemed the smaller effort. > > I tried to find out about the way those files are structured, but got > lost somewhere on the way. > > Looking into compute-mfcc-feats.cc, I saw that there is: > > BaseFloatMatrixWriter kaldi_writer; > > which is later used to write the archive: > > kaldi_writer.Write(utt, features); > > Trying to find what this call actually does, I got lost. > > I found this: > > http://kaldi.sourceforge.net/group__table__types.html#gaa9b0c000a2d8bbf1a7df386024110883 > > and from there this: > > http://kaldi.sourceforge.net/table-types_8h_source.html#l00036 > > and then eventually this: > > http://kaldi.sourceforge.net/classkaldi_1_1TableWriter.html > > > I however could not yet find anything I could use to understand the > particular format of the archive file of feature vectors. > > The scp file should be straightforward, but I hope someone of you could > point me to the right resource to learn how to write the matrices of a > set of features in the correct archive format. > > Perhaps doing a detour through non-binary files might be a way to get > there, but this surely would be very unfavorable. > > Thanks a lot, > > Simon > > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to > Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN > optimization and the > freedom to use Git, Perforce or both. Make the move to Perforce. > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |