|
From: Daniel P. <dp...@gm...> - 2014-07-10 19:45:32
|
You could try running fix_data_dir.sh which will try to automatically fix the sorting problems. If this fails then you may have to make sure that the speaker-id is a prefix of the utterance-id, which will help ensure the speakers and utterances can be simultaneously sorted. In your case, if all your utterances start with S it should not be a problem. Dan On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: > Hi, > > I am preparing the data for dnn training using my own data set. I followed > the instruction on http://kaldi.sourceforge.net/data_prep.html. > > I created the file "text" as the first 3 lines: > S002-U-000300-000470 OH > S002-U-000470-000630 I'D > S002-U-000630-000870 LIKE > > the wav.scp file: > S002-U <path to the corresponding wav file> > S002-O <path to the corresponding wav file> > S003-U <path to the corresponding wav file> > > and the utt2spk file: > S002-U-000300-000470 002-U > S002-U-000470-000630 002-U > S002-U-000630-000870 002-U > > Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. Everything > went well until I tried to use the mak_mfcc.sh to create the feats.scp file > where I got the error message like: > > utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or > has duplicates > > seems like my utt2spk file could not pass through the validation. > > Can any body help me out of here? Thank you so much. > > Best, > > Zibo > > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |