|
From: Zibo M. <mzb...@gm...> - 2014-07-10 19:37:55
|
Hi, I am preparing the data for dnn training using my own data set. I followed the instruction on http://kaldi.sourceforge.net/data_prep.html. I created the file "text" as the first 3 lines: S002-U-000300-000470 OH S002-U-000470-000630 I'D S002-U-000630-000870 LIKE the wav.scp file: S002-U <path to the corresponding wav file> S002-O <path to the corresponding wav file> S003-U <path to the corresponding wav file> and the utt2spk file: S002-U-000300-000470 002-U S002-U-000470-000630 002-U S002-U-000630-000870 002-U Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. Everything went well until I tried to use the mak_mfcc.sh to create the feats.scp file where I got the error message like: utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or has duplicates seems like my utt2spk file could not pass through the validation. Can any body help me out of here? Thank you so much. Best, Zibo |
|
From: Daniel P. <dp...@gm...> - 2014-07-10 19:45:32
|
You could try running fix_data_dir.sh which will try to automatically fix the sorting problems. If this fails then you may have to make sure that the speaker-id is a prefix of the utterance-id, which will help ensure the speakers and utterances can be simultaneously sorted. In your case, if all your utterances start with S it should not be a problem. Dan On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: > Hi, > > I am preparing the data for dnn training using my own data set. I followed > the instruction on http://kaldi.sourceforge.net/data_prep.html. > > I created the file "text" as the first 3 lines: > S002-U-000300-000470 OH > S002-U-000470-000630 I'D > S002-U-000630-000870 LIKE > > the wav.scp file: > S002-U <path to the corresponding wav file> > S002-O <path to the corresponding wav file> > S003-U <path to the corresponding wav file> > > and the utt2spk file: > S002-U-000300-000470 002-U > S002-U-000470-000630 002-U > S002-U-000630-000870 002-U > > Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. Everything > went well until I tried to use the mak_mfcc.sh to create the feats.scp file > where I got the error message like: > > utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or > has duplicates > > seems like my utt2spk file could not pass through the validation. > > Can any body help me out of here? Thank you so much. > > Best, > > Zibo > > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Neil N. <nn...@in...> - 2014-07-11 01:19:48
|
As a very minor point, the wav.scp file sorted becomes S002-O <path to the corresponding wav file> S002-U <path to the corresponding wav file> S003-U <path to the corresponding wav file> Neil On 07/10/2014 01:37 PM, Zibo Meng wrote: > Hi, > > I am preparing the data for dnn training using my own data set. I followed > the instruction on http://kaldi.sourceforge.net/data_prep.html. > > I created the file "text" as the first 3 lines: > S002-U-000300-000470 OH > S002-U-000470-000630 I'D > S002-U-000630-000870 LIKE > > the wav.scp file: > S002-U <path to the corresponding wav file> > S002-O <path to the corresponding wav file> > S003-U <path to the corresponding wav file> > > and the utt2spk file: > S002-U-000300-000470 002-U > S002-U-000470-000630 002-U > S002-U-000630-000870 002-U > > Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. Everything > went well until I tried to use the mak_mfcc.sh to create the feats.scp file > where I got the error message like: > > utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or has > duplicates > > seems like my utt2spk file could not pass through the validation. > > Can any body help me out of here? Thank you so much. > > Best, > > Zibo > > > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Karel V. <ive...@fi...> - 2014-07-11 11:59:21
|
Hi, you have to use : utils/fix_data_dir.sh K. Dne 11. 7. 2014 2:59, Neil Nelson napsal(a): > As a very minor point, > the wav.scp file sorted becomes > S002-O <path to the corresponding wav file> > S002-U <path to the corresponding wav file> > S003-U <path to the corresponding wav file> > Neil > > On 07/10/2014 01:37 PM, Zibo Meng wrote: >> Hi, >> >> I am preparing the data for dnn training using my own data set. I followed >> the instruction onhttp://kaldi.sourceforge.net/data_prep.html. >> >> I created the file "text" as the first 3 lines: >> S002-U-000300-000470 OH >> S002-U-000470-000630 I'D >> S002-U-000630-000870 LIKE >> >> the wav.scp file: >> S002-U <path to the corresponding wav file> >> S002-O <path to the corresponding wav file> >> S003-U <path to the corresponding wav file> >> >> and the utt2spk file: >> S002-U-000300-000470 002-U >> S002-U-000470-000630 002-U >> S002-U-000630-000870 002-U >> >> Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. Everything >> went well until I tried to use the mak_mfcc.sh to create the feats.scp file >> where I got the error message like: >> >> utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or has >> duplicates >> >> seems like my utt2spk file could not pass through the validation. >> >> Can any body help me out of here? Thank you so much. >> >> Best, >> >> Zibo >> >> >> >> ------------------------------------------------------------------------------ >> Open source business process management suite built on Java and Eclipse >> Turn processes into business applications with Bonita BPM Community Edition >> Quickly connect people, data, and systems into organized workflows >> Winner of BOSSIE, CODIE, OW2 and Gartner awards >> http://p.sf.net/sfu/Bonitasoft >> >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Zibo M. <mzb...@gm...> - 2014-07-11 15:24:54
|
Hi, I got another problem. When I tried make_mfcc.sh to create the feats.scp files it did not work. I checked the log file where it said some thing like: compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) WaveData: can read only PCM data, audio_format is not 1: 65534 WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:148) Exception caught in WaveHolder object (reading). WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) TableReader: failed to load object from 'test.wav' Then I checked the attributes of my test.wav file which were as follows: Input File : 'test.wav' Channels : 1 Sample Rate : 48000 Precision : 24-bit Duration : 00:03:30.09 = 10084224 samples ~ 15756.6 CDDA sectors File Size : 30.3M Bit Rate : 1.15M Sample Encoding: 24-bit Signed Integer PCM Can you tell me what should I modify to my audio files. Thank you so much! Best, Zibo On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: > Hi, > > I am preparing the data for dnn training using my own data set. I followed > the instruction on http://kaldi.sourceforge.net/data_prep.html. > > I created the file "text" as the first 3 lines: > S002-U-000300-000470 OH > S002-U-000470-000630 I'D > S002-U-000630-000870 LIKE > > the wav.scp file: > S002-U <path to the corresponding wav file> > S002-O <path to the corresponding wav file> > S003-U <path to the corresponding wav file> > > and the utt2spk file: > S002-U-000300-000470 002-U > S002-U-000470-000630 002-U > S002-U-000630-000870 002-U > > Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. Everything > went well until I tried to use the mak_mfcc.sh to create the feats.scp file > where I got the error message like: > > utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or > has duplicates > > seems like my utt2spk file could not pass through the validation. > > Can any body help me out of here? Thank you so much. > > Best, > > Zibo > |
|
From: Zibo M. <mzb...@gm...> - 2014-07-16 14:19:06
|
Hi, After I created the lang directory, I used steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono. But I got the error message as follows: steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono vads = data/train.1k/split4/1/vad.scp vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp data/train.1k/split4/3/vad.scp vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp data/train.1k/split4/3/vad.scp data/train.1k/split4/4/vad.scp steps/train_mono.sh: Initializing monophone system. steps/train_mono.sh: Compiling training graphs steps/train_mono.sh: Aligning data equally (pass 0) steps/train_mono.sh: Pass 1 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 2 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 3 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 4 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 5 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 6 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 7 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 8 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 9 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 10 steps/train_mono.sh: Aligning data steps/train_mono.sh: Pass 11 steps/train_mono.sh: Pass 12 steps/train_mono.sh: Aligning data *** Error in `gmm-acc-stats-ali': free(): corrupted unsorted chunks: 0x0000000001e10e60 *** ======= Backtrace: ========= /lib64/libc.so.6[0x367887d0b8] gmm-acc-stats-ali(_ZN5kaldi6VectorIfE7DestroyEv+0x27)[0x59f127] gmm-acc-stats-ali(_ZN5kaldi6VectorIfED1Ev+0x19)[0x4da151] gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm14LogLikelihoodsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x1e6)[0x4fc156] gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm19ComponentPosteriorsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x10a)[0x4fc946] gmm-acc-stats-ali(_ZN5kaldi12AccumDiagGmm18AccumulateFromDiagERKNS_7DiagGmmERKNS_10VectorBaseIfEEf+0x118)[0x507410] gmm-acc-stats-ali(_ZN5kaldi14AccumAmDiagGmm16AccumulateForGmmERKNS_9AmDiagGmmERKNS_10VectorBaseIfEEif+0x9e)[0x4f2ec0] gmm-acc-stats-ali(main+0x56c)[0x4d7aec] /lib64/libc.so.6(__libc_start_main+0xf5)[0x3678821b45] gmm-acc-stats-ali[0x4d74b9] Can you please tell me what went wrong here? Thank you so much! Zibo On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng <mzb...@gm...> wrote: > Hi, > > I got another problem. > > When I tried make_mfcc.sh to create the feats.scp files it did not work. > > I checked the log file where it said some thing like: > > compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf > scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- > ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) WaveData: can read > only PCM data, audio_format is not 1: 65534 > WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:148) Exception > caught in WaveHolder object (reading). > WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) > TableReader: failed to load object from 'test.wav' > > Then I checked the attributes of my test.wav file which were as follows: > Input File : 'test.wav' > Channels : 1 > Sample Rate : 48000 > Precision : 24-bit > Duration : 00:03:30.09 = 10084224 samples ~ 15756.6 CDDA sectors > File Size : 30.3M > Bit Rate : 1.15M > Sample Encoding: 24-bit Signed Integer PCM > > Can you tell me what should I modify to my audio files. Thank you so much! > > Best, > > Zibo > > > > On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: > >> Hi, >> >> I am preparing the data for dnn training using my own data set. I >> followed the instruction on http://kaldi.sourceforge.net/data_prep.html. >> >> I created the file "text" as the first 3 lines: >> S002-U-000300-000470 OH >> S002-U-000470-000630 I'D >> S002-U-000630-000870 LIKE >> >> the wav.scp file: >> S002-U <path to the corresponding wav file> >> S002-O <path to the corresponding wav file> >> S003-U <path to the corresponding wav file> >> >> and the utt2spk file: >> S002-U-000300-000470 002-U >> S002-U-000470-000630 002-U >> S002-U-000630-000870 002-U >> >> Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. Everything >> went well until I tried to use the mak_mfcc.sh to create the feats.scp file >> where I got the error message like: >> >> utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or >> has duplicates >> >> seems like my utt2spk file could not pass through the validation. >> >> Can any body help me out of here? Thank you so much. >> >> Best, >> >> Zibo >> > > |
|
From: Jan T. <af...@ce...> - 2014-07-16 14:35:36
|
This looks like a problem with your machine or the toolchain that was used to compiled kaldi (especially the compiler and/or the glibc). If you have experience with debugging, you can run the command again, generate core dump (using ulimit –c unlimited) and load it into gdb to figure out the details. What distribution and gcc and glibc are you using? y. On Wed, Jul 16, 2014 at 10:18 AM, Zibo Meng <mzb...@gm...> wrote: > Hi, > > After I created the lang directory, I used steps/train_mono.sh --nj 4 > data/train.1k data/lang exp/mono. But I got the error message as follows: > > steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono > steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono > vads = data/train.1k/split4/1/vad.scp > vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp > vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp > data/train.1k/split4/3/vad.scp > vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp > data/train.1k/split4/3/vad.scp data/train.1k/split4/4/vad.scp > steps/train_mono.sh: Initializing monophone system. > steps/train_mono.sh: Compiling training graphs > steps/train_mono.sh: Aligning data equally (pass 0) > steps/train_mono.sh: Pass 1 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 2 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 3 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 4 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 5 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 6 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 7 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 8 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 9 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 10 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 11 > steps/train_mono.sh: Pass 12 > steps/train_mono.sh: Aligning data > *** Error in `gmm-acc-stats-ali': free(): corrupted unsorted chunks: > 0x0000000001e10e60 *** > ======= Backtrace: ========= > /lib64/libc.so.6[0x367887d0b8] > gmm-acc-stats-ali(_ZN5kaldi6VectorIfE7DestroyEv+0x27)[0x59f127] > gmm-acc-stats-ali(_ZN5kaldi6VectorIfED1Ev+0x19)[0x4da151] > > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm14LogLikelihoodsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x1e6)[0x4fc156] > > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm19ComponentPosteriorsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x10a)[0x4fc946] > > gmm-acc-stats-ali(_ZN5kaldi12AccumDiagGmm18AccumulateFromDiagERKNS_7DiagGmmERKNS_10VectorBaseIfEEf+0x118)[0x507410] > > gmm-acc-stats-ali(_ZN5kaldi14AccumAmDiagGmm16AccumulateForGmmERKNS_9AmDiagGmmERKNS_10VectorBaseIfEEif+0x9e)[0x4f2ec0] > gmm-acc-stats-ali(main+0x56c)[0x4d7aec] > /lib64/libc.so.6(__libc_start_main+0xf5)[0x3678821b45] > gmm-acc-stats-ali[0x4d74b9] > > Can you please tell me what went wrong here? > > Thank you so much! > > Zibo > > > > On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng <mzb...@gm...> wrote: > >> Hi, >> >> I got another problem. >> >> When I tried make_mfcc.sh to create the feats.scp files it did not work. >> >> I checked the log file where it said some thing like: >> >> compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf >> scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- >> ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) WaveData: can read >> only PCM data, audio_format is not 1: 65534 >> WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:148) Exception >> caught in WaveHolder object (reading). >> WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) >> TableReader: failed to load object from 'test.wav' >> >> Then I checked the attributes of my test.wav file which were as follows: >> Input File : 'test.wav' >> Channels : 1 >> Sample Rate : 48000 >> Precision : 24-bit >> Duration : 00:03:30.09 = 10084224 samples ~ 15756.6 CDDA sectors >> File Size : 30.3M >> Bit Rate : 1.15M >> Sample Encoding: 24-bit Signed Integer PCM >> >> Can you tell me what should I modify to my audio files. Thank you so much! >> >> Best, >> >> Zibo >> >> >> >> On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: >> >>> Hi, >>> >>> I am preparing the data for dnn training using my own data set. I >>> followed the instruction on http://kaldi.sourceforge.net/data_prep.html. >>> >>> I created the file "text" as the first 3 lines: >>> S002-U-000300-000470 OH >>> S002-U-000470-000630 I'D >>> S002-U-000630-000870 LIKE >>> >>> the wav.scp file: >>> S002-U <path to the corresponding wav file> >>> S002-O <path to the corresponding wav file> >>> S003-U <path to the corresponding wav file> >>> >>> and the utt2spk file: >>> S002-U-000300-000470 002-U >>> S002-U-000470-000630 002-U >>> S002-U-000630-000870 002-U >>> >>> Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. >>> Everything went well until I tried to use the mak_mfcc.sh to create the >>> feats.scp file where I got the error message like: >>> >>> utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or >>> has duplicates >>> >>> seems like my utt2spk file could not pass through the validation. >>> >>> Can any body help me out of here? Thank you so much. >>> >>> Best, >>> >>> Zibo >>> >> >> > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Zibo M. <mzb...@gm...> - 2014-07-16 14:43:39
|
Hi Jan, Thank you so much for your reply. Here is the information about my distribution, gcc and glibc: Fedora release 19 (Schrödinger’s Cat) NAME=Fedora VERSION="19 (Schrödinger’s Cat)" ID=fedora VERSION_ID=19 PRETTY_NAME="Fedora 19 (Schrödinger’s Cat)" ANSI_COLOR="0;34" CPE_NAME="cpe:/o:fedoraproject:fedora:19" Fedora release 19 (Schrödinger’s Cat) Fedora release 19 (Schrödinger’s Cat) gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7) ldd (GNU libc) 2.17 Thank you! Zibo On Wed, Jul 16, 2014 at 10:35 AM, Jan Trmal <af...@ce...> wrote: > This looks like a problem with your machine or the toolchain that was used > to compiled kaldi (especially the compiler and/or the glibc). > If you have experience with debugging, you can run the command again, > generate core dump (using ulimit –c unlimited) and load it into gdb to > figure out the details. > What distribution and gcc and glibc are you using? > > y. > > > > On Wed, Jul 16, 2014 at 10:18 AM, Zibo Meng <mzb...@gm...> wrote: > >> Hi, >> >> After I created the lang directory, I used steps/train_mono.sh --nj 4 >> data/train.1k data/lang exp/mono. But I got the error message as follows: >> >> steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono >> steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono >> vads = data/train.1k/split4/1/vad.scp >> vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp >> vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp >> data/train.1k/split4/3/vad.scp >> vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp >> data/train.1k/split4/3/vad.scp data/train.1k/split4/4/vad.scp >> steps/train_mono.sh: Initializing monophone system. >> steps/train_mono.sh: Compiling training graphs >> steps/train_mono.sh: Aligning data equally (pass 0) >> steps/train_mono.sh: Pass 1 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 2 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 3 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 4 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 5 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 6 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 7 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 8 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 9 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 10 >> steps/train_mono.sh: Aligning data >> steps/train_mono.sh: Pass 11 >> steps/train_mono.sh: Pass 12 >> steps/train_mono.sh: Aligning data >> *** Error in `gmm-acc-stats-ali': free(): corrupted unsorted chunks: >> 0x0000000001e10e60 *** >> ======= Backtrace: ========= >> /lib64/libc.so.6[0x367887d0b8] >> gmm-acc-stats-ali(_ZN5kaldi6VectorIfE7DestroyEv+0x27)[0x59f127] >> gmm-acc-stats-ali(_ZN5kaldi6VectorIfED1Ev+0x19)[0x4da151] >> >> gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm14LogLikelihoodsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x1e6)[0x4fc156] >> >> gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm19ComponentPosteriorsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x10a)[0x4fc946] >> >> gmm-acc-stats-ali(_ZN5kaldi12AccumDiagGmm18AccumulateFromDiagERKNS_7DiagGmmERKNS_10VectorBaseIfEEf+0x118)[0x507410] >> >> gmm-acc-stats-ali(_ZN5kaldi14AccumAmDiagGmm16AccumulateForGmmERKNS_9AmDiagGmmERKNS_10VectorBaseIfEEif+0x9e)[0x4f2ec0] >> gmm-acc-stats-ali(main+0x56c)[0x4d7aec] >> /lib64/libc.so.6(__libc_start_main+0xf5)[0x3678821b45] >> gmm-acc-stats-ali[0x4d74b9] >> >> Can you please tell me what went wrong here? >> >> Thank you so much! >> >> Zibo >> >> >> >> On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng <mzb...@gm...> wrote: >> >>> Hi, >>> >>> I got another problem. >>> >>> When I tried make_mfcc.sh to create the feats.scp files it did not work. >>> >>> I checked the log file where it said some thing like: >>> >>> compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf >>> scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- >>> ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) WaveData: can read >>> only PCM data, audio_format is not 1: 65534 >>> WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:148) Exception >>> caught in WaveHolder object (reading). >>> WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) >>> TableReader: failed to load object from 'test.wav' >>> >>> Then I checked the attributes of my test.wav file which were as follows: >>> Input File : 'test.wav' >>> Channels : 1 >>> Sample Rate : 48000 >>> Precision : 24-bit >>> Duration : 00:03:30.09 = 10084224 samples ~ 15756.6 CDDA sectors >>> File Size : 30.3M >>> Bit Rate : 1.15M >>> Sample Encoding: 24-bit Signed Integer PCM >>> >>> Can you tell me what should I modify to my audio files. Thank you so >>> much! >>> >>> Best, >>> >>> Zibo >>> >>> >>> >>> On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: >>> >>>> Hi, >>>> >>>> I am preparing the data for dnn training using my own data set. I >>>> followed the instruction on http://kaldi.sourceforge.net/data_prep.html >>>> . >>>> >>>> I created the file "text" as the first 3 lines: >>>> S002-U-000300-000470 OH >>>> S002-U-000470-000630 I'D >>>> S002-U-000630-000870 LIKE >>>> >>>> the wav.scp file: >>>> S002-U <path to the corresponding wav file> >>>> S002-O <path to the corresponding wav file> >>>> S003-U <path to the corresponding wav file> >>>> >>>> and the utt2spk file: >>>> S002-U-000300-000470 002-U >>>> S002-U-000470-000630 002-U >>>> S002-U-000630-000870 002-U >>>> >>>> Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. >>>> Everything went well until I tried to use the mak_mfcc.sh to create the >>>> feats.scp file where I got the error message like: >>>> >>>> utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or >>>> has duplicates >>>> >>>> seems like my utt2spk file could not pass through the validation. >>>> >>>> Can any body help me out of here? Thank you so much. >>>> >>>> Best, >>>> >>>> Zibo >>>> >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Want fast and easy access to all the code in your enterprise? Index and >> search up to 200,000 lines of code with a free copy of Black Duck >> Code Sight - the same software that powers the world's largest code >> search on Ohloh, the Black Duck Open Hub! Try it now. >> http://p.sf.net/sfu/bds >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> > |
|
From: Daniel P. <dp...@gm...> - 2014-07-16 19:18:38
|
I think it's possible that this is caused by a bug in Kaldi itself. The way I would debug this is to first figure out which of the log files corresponds to the error (probably one of exp/mono/log/align.12.*.log), and run the command line that you'll see at the top of the log file manually to verify that you can reproduce the error by running it again. (BTW, I'm a little confused here as normally the stderr of the job should go to the log file, and this error is produced on the console. ). If you can, then instead of running <program> <args> from the console, you'll run gdb --args <program> <args> and at the (gdb) prompt you'll type r Then hopefully it will run until you get an error. At that point you can type bt to get a backtrace, which you'll show us. Dan On Wed, Jul 16, 2014 at 7:43 AM, Zibo Meng <mzb...@gm...> wrote: > Hi Jan, > Thank you so much for your reply. > > Here is the information about my distribution, gcc and glibc: > > Fedora release 19 (Schrödinger’s Cat) > NAME=Fedora > VERSION="19 (Schrödinger’s Cat)" > ID=fedora > VERSION_ID=19 > PRETTY_NAME="Fedora 19 (Schrödinger’s Cat)" > ANSI_COLOR="0;34" > CPE_NAME="cpe:/o:fedoraproject:fedora:19" > Fedora release 19 (Schrödinger’s Cat) > Fedora release 19 (Schrödinger’s Cat) > > gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7) > > ldd (GNU libc) 2.17 > > Thank you! > > Zibo > > > > On Wed, Jul 16, 2014 at 10:35 AM, Jan Trmal <af...@ce...> wrote: > >> This looks like a problem with your machine or the toolchain that was >> used to compiled kaldi (especially the compiler and/or the glibc). >> If you have experience with debugging, you can run the command again, >> generate core dump (using ulimit –c unlimited) and load it into gdb to >> figure out the details. >> What distribution and gcc and glibc are you using? >> >> y. >> >> >> >> On Wed, Jul 16, 2014 at 10:18 AM, Zibo Meng <mzb...@gm...> wrote: >> >>> Hi, >>> >>> After I created the lang directory, I used steps/train_mono.sh --nj 4 >>> data/train.1k data/lang exp/mono. But I got the error message as follows: >>> >>> steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono >>> steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono >>> vads = data/train.1k/split4/1/vad.scp >>> vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp >>> vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp >>> data/train.1k/split4/3/vad.scp >>> vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp >>> data/train.1k/split4/3/vad.scp data/train.1k/split4/4/vad.scp >>> steps/train_mono.sh: Initializing monophone system. >>> steps/train_mono.sh: Compiling training graphs >>> steps/train_mono.sh: Aligning data equally (pass 0) >>> steps/train_mono.sh: Pass 1 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 2 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 3 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 4 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 5 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 6 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 7 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 8 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 9 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 10 >>> steps/train_mono.sh: Aligning data >>> steps/train_mono.sh: Pass 11 >>> steps/train_mono.sh: Pass 12 >>> steps/train_mono.sh: Aligning data >>> *** Error in `gmm-acc-stats-ali': free(): corrupted unsorted chunks: >>> 0x0000000001e10e60 *** >>> ======= Backtrace: ========= >>> /lib64/libc.so.6[0x367887d0b8] >>> gmm-acc-stats-ali(_ZN5kaldi6VectorIfE7DestroyEv+0x27)[0x59f127] >>> gmm-acc-stats-ali(_ZN5kaldi6VectorIfED1Ev+0x19)[0x4da151] >>> >>> gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm14LogLikelihoodsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x1e6)[0x4fc156] >>> >>> gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm19ComponentPosteriorsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x10a)[0x4fc946] >>> >>> gmm-acc-stats-ali(_ZN5kaldi12AccumDiagGmm18AccumulateFromDiagERKNS_7DiagGmmERKNS_10VectorBaseIfEEf+0x118)[0x507410] >>> >>> gmm-acc-stats-ali(_ZN5kaldi14AccumAmDiagGmm16AccumulateForGmmERKNS_9AmDiagGmmERKNS_10VectorBaseIfEEif+0x9e)[0x4f2ec0] >>> gmm-acc-stats-ali(main+0x56c)[0x4d7aec] >>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x3678821b45] >>> gmm-acc-stats-ali[0x4d74b9] >>> >>> Can you please tell me what went wrong here? >>> >>> Thank you so much! >>> >>> Zibo >>> >>> >>> >>> On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng <mzb...@gm...> wrote: >>> >>>> Hi, >>>> >>>> I got another problem. >>>> >>>> When I tried make_mfcc.sh to create the feats.scp files it did not work. >>>> >>>> I checked the log file where it said some thing like: >>>> >>>> compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf >>>> scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- >>>> ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) WaveData: can read >>>> only PCM data, audio_format is not 1: 65534 >>>> WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:148) Exception >>>> caught in WaveHolder object (reading). >>>> WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) >>>> TableReader: failed to load object from 'test.wav' >>>> >>>> Then I checked the attributes of my test.wav file which were as follows: >>>> Input File : 'test.wav' >>>> Channels : 1 >>>> Sample Rate : 48000 >>>> Precision : 24-bit >>>> Duration : 00:03:30.09 = 10084224 samples ~ 15756.6 CDDA sectors >>>> File Size : 30.3M >>>> Bit Rate : 1.15M >>>> Sample Encoding: 24-bit Signed Integer PCM >>>> >>>> Can you tell me what should I modify to my audio files. Thank you so >>>> much! >>>> >>>> Best, >>>> >>>> Zibo >>>> >>>> >>>> >>>> On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am preparing the data for dnn training using my own data set. I >>>>> followed the instruction on >>>>> http://kaldi.sourceforge.net/data_prep.html. >>>>> >>>>> I created the file "text" as the first 3 lines: >>>>> S002-U-000300-000470 OH >>>>> S002-U-000470-000630 I'D >>>>> S002-U-000630-000870 LIKE >>>>> >>>>> the wav.scp file: >>>>> S002-U <path to the corresponding wav file> >>>>> S002-O <path to the corresponding wav file> >>>>> S003-U <path to the corresponding wav file> >>>>> >>>>> and the utt2spk file: >>>>> S002-U-000300-000470 002-U >>>>> S002-U-000470-000630 002-U >>>>> S002-U-000630-000870 002-U >>>>> >>>>> Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. >>>>> Everything went well until I tried to use the mak_mfcc.sh to create the >>>>> feats.scp file where I got the error message like: >>>>> >>>>> utils/validate_data_dir.sh: file data/utt2spk is not in sorted order >>>>> or has duplicates >>>>> >>>>> seems like my utt2spk file could not pass through the validation. >>>>> >>>>> Can any body help me out of here? Thank you so much. >>>>> >>>>> Best, >>>>> >>>>> Zibo >>>>> >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Want fast and easy access to all the code in your enterprise? Index and >>> search up to 200,000 lines of code with a free copy of Black Duck >>> Code Sight - the same software that powers the world's largest code >>> search on Ohloh, the Black Duck Open Hub! Try it now. >>> http://p.sf.net/sfu/bds >>> >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >>> >> > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Zibo M. <mzb...@gm...> - 2014-07-21 14:36:11
|
Hi Dan and Jan, Thanks for you help! I ran the bash script one more time and I got the error at the 19th pass. Since I don't know how to debug c++ program called from the shell script. So I take Jan's advice to run the ulimit -c unlimited before I ran the code. I got the core file when the core dump error occurred whose size is 303MB that can not be attached to this email. Please tell me what else I should do. BTW, I used the following script : steps/train_mono.sh --nj 10 data/train.1k data/lang exp/mono where I changed the number of job from 4 to 10 and now I am at the 39th pass without suffering a core dump as before. One more question, if I want to use run_nnet2.sh to do the training and testing, should I run all the scripts in the run.sh file first? Thank you very much. Best, Zibo On Wed, Jul 16, 2014 at 3:18 PM, Daniel Povey <dp...@gm...> wrote: > I think it's possible that this is caused by a bug in Kaldi itself. > The way I would debug this is to first figure out which of the log files > corresponds to the error (probably one of exp/mono/log/align.12.*.log), and > run the command line that you'll see at the top of the log file manually to > verify that you can reproduce the error by running it again. > (BTW, I'm a little confused here as normally the stderr of the job should > go to the log file, and this error is produced on the console. ). > > > If you can, then instead of running > <program> <args> > from the console, you'll run > gdb --args <program> <args> > and at the (gdb) prompt you'll type > r > Then hopefully it will run until you get an error. At that point you can > type > bt > to get a backtrace, which you'll show us. > > Dan > > > > On Wed, Jul 16, 2014 at 7:43 AM, Zibo Meng <mzb...@gm...> wrote: > >> Hi Jan, >> Thank you so much for your reply. >> >> Here is the information about my distribution, gcc and glibc: >> >> Fedora release 19 (Schrödinger’s Cat) >> NAME=Fedora >> VERSION="19 (Schrödinger’s Cat)" >> ID=fedora >> VERSION_ID=19 >> PRETTY_NAME="Fedora 19 (Schrödinger’s Cat)" >> ANSI_COLOR="0;34" >> CPE_NAME="cpe:/o:fedoraproject:fedora:19" >> Fedora release 19 (Schrödinger’s Cat) >> Fedora release 19 (Schrödinger’s Cat) >> >> gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7) >> >> ldd (GNU libc) 2.17 >> >> Thank you! >> >> Zibo >> >> >> >> On Wed, Jul 16, 2014 at 10:35 AM, Jan Trmal <af...@ce...> wrote: >> >>> This looks like a problem with your machine or the toolchain that was >>> used to compiled kaldi (especially the compiler and/or the glibc). >>> If you have experience with debugging, you can run the command again, >>> generate core dump (using ulimit –c unlimited) and load it into gdb to >>> figure out the details. >>> What distribution and gcc and glibc are you using? >>> >>> y. >>> >>> >>> >>> On Wed, Jul 16, 2014 at 10:18 AM, Zibo Meng <mzb...@gm...> wrote: >>> >>>> Hi, >>>> >>>> After I created the lang directory, I used steps/train_mono.sh --nj 4 >>>> data/train.1k data/lang exp/mono. But I got the error message as follows: >>>> >>>> steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono >>>> steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono >>>> vads = data/train.1k/split4/1/vad.scp >>>> vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp >>>> vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp >>>> data/train.1k/split4/3/vad.scp >>>> vads = data/train.1k/split4/1/vad.scp data/train.1k/split4/2/vad.scp >>>> data/train.1k/split4/3/vad.scp data/train.1k/split4/4/vad.scp >>>> steps/train_mono.sh: Initializing monophone system. >>>> steps/train_mono.sh: Compiling training graphs >>>> steps/train_mono.sh: Aligning data equally (pass 0) >>>> steps/train_mono.sh: Pass 1 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 2 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 3 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 4 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 5 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 6 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 7 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 8 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 9 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 10 >>>> steps/train_mono.sh: Aligning data >>>> steps/train_mono.sh: Pass 11 >>>> steps/train_mono.sh: Pass 12 >>>> steps/train_mono.sh: Aligning data >>>> *** Error in `gmm-acc-stats-ali': free(): corrupted unsorted chunks: >>>> 0x0000000001e10e60 *** >>>> ======= Backtrace: ========= >>>> /lib64/libc.so.6[0x367887d0b8] >>>> gmm-acc-stats-ali(_ZN5kaldi6VectorIfE7DestroyEv+0x27)[0x59f127] >>>> gmm-acc-stats-ali(_ZN5kaldi6VectorIfED1Ev+0x19)[0x4da151] >>>> >>>> gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm14LogLikelihoodsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x1e6)[0x4fc156] >>>> >>>> gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm19ComponentPosteriorsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x10a)[0x4fc946] >>>> >>>> gmm-acc-stats-ali(_ZN5kaldi12AccumDiagGmm18AccumulateFromDiagERKNS_7DiagGmmERKNS_10VectorBaseIfEEf+0x118)[0x507410] >>>> >>>> gmm-acc-stats-ali(_ZN5kaldi14AccumAmDiagGmm16AccumulateForGmmERKNS_9AmDiagGmmERKNS_10VectorBaseIfEEif+0x9e)[0x4f2ec0] >>>> gmm-acc-stats-ali(main+0x56c)[0x4d7aec] >>>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x3678821b45] >>>> gmm-acc-stats-ali[0x4d74b9] >>>> >>>> Can you please tell me what went wrong here? >>>> >>>> Thank you so much! >>>> >>>> Zibo >>>> >>>> >>>> >>>> On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng <mzb...@gm...> wrote: >>>> >>>>> Hi, >>>>> >>>>> I got another problem. >>>>> >>>>> When I tried make_mfcc.sh to create the feats.scp files it did not >>>>> work. >>>>> >>>>> I checked the log file where it said some thing like: >>>>> >>>>> compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf >>>>> scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- >>>>> ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) WaveData: can >>>>> read only PCM data, audio_format is not 1: 65534 >>>>> WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:148) Exception >>>>> caught in WaveHolder object (reading). >>>>> WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) >>>>> TableReader: failed to load object from 'test.wav' >>>>> >>>>> Then I checked the attributes of my test.wav file which were as >>>>> follows: >>>>> Input File : 'test.wav' >>>>> Channels : 1 >>>>> Sample Rate : 48000 >>>>> Precision : 24-bit >>>>> Duration : 00:03:30.09 = 10084224 samples ~ 15756.6 CDDA sectors >>>>> File Size : 30.3M >>>>> Bit Rate : 1.15M >>>>> Sample Encoding: 24-bit Signed Integer PCM >>>>> >>>>> Can you tell me what should I modify to my audio files. Thank you so >>>>> much! >>>>> >>>>> Best, >>>>> >>>>> Zibo >>>>> >>>>> >>>>> >>>>> On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am preparing the data for dnn training using my own data set. I >>>>>> followed the instruction on >>>>>> http://kaldi.sourceforge.net/data_prep.html. >>>>>> >>>>>> I created the file "text" as the first 3 lines: >>>>>> S002-U-000300-000470 OH >>>>>> S002-U-000470-000630 I'D >>>>>> S002-U-000630-000870 LIKE >>>>>> >>>>>> the wav.scp file: >>>>>> S002-U <path to the corresponding wav file> >>>>>> S002-O <path to the corresponding wav file> >>>>>> S003-U <path to the corresponding wav file> >>>>>> >>>>>> and the utt2spk file: >>>>>> S002-U-000300-000470 002-U >>>>>> S002-U-000470-000630 002-U >>>>>> S002-U-000630-000870 002-U >>>>>> >>>>>> Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. >>>>>> Everything went well until I tried to use the mak_mfcc.sh to create the >>>>>> feats.scp file where I got the error message like: >>>>>> >>>>>> utils/validate_data_dir.sh: file data/utt2spk is not in sorted order >>>>>> or has duplicates >>>>>> >>>>>> seems like my utt2spk file could not pass through the validation. >>>>>> >>>>>> Can any body help me out of here? Thank you so much. >>>>>> >>>>>> Best, >>>>>> >>>>>> Zibo >>>>>> >>>>> >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Want fast and easy access to all the code in your enterprise? Index and >>>> search up to 200,000 lines of code with a free copy of Black Duck >>>> Code Sight - the same software that powers the world's largest code >>>> search on Ohloh, the Black Duck Open Hub! Try it now. >>>> http://p.sf.net/sfu/bds >>>> >>>> _______________________________________________ >>>> Kaldi-users mailing list >>>> Kal...@li... >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>>> >>>> >>> >> >> >> ------------------------------------------------------------------------------ >> Want fast and easy access to all the code in your enterprise? Index and >> search up to 200,000 lines of code with a free copy of Black Duck >> Code Sight - the same software that powers the world's largest code >> search on Ohloh, the Black Duck Open Hub! Try it now. >> http://p.sf.net/sfu/bds >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> > |
|
From: Simon K. <sim...@gm...> - 2014-07-21 15:13:15
|
Hi, wasn't gcc 4.8.2. the buggy version? If I remember right, it had caused crashes before, sometimes not reproducible and at different stages of the process. Downgrading to 4.7 helped for me. @Dan: Would it be worth to add a warning (or even error) when running configure, if this version of gcc is used? It seems to be the standard compiler in some distros right now. That might avoid some of the problems on this list. All the best, Simon On 07/21/2014 02:36 PM, Zibo Meng wrote: > Hi Dan and Jan, > > Thanks for you help! > > I ran the bash script one more time and I got the error at the 19th > pass. Since I don't know how to debug c++ program called from the shell > script. So I take Jan's advice to run the ulimit -c unlimited before I > ran the code. I got the core file when the core dump error occurred > whose size is 303MB that can not be attached to this email. Please tell > me what else I should do. > > BTW, I used the following script : > steps/train_mono.sh --nj 10 data/train.1k data/lang exp/mono > where I changed the number of job from 4 to 10 and now I am at the 39th > pass without suffering a core dump as before. > > One more question, if I want to use run_nnet2.sh to do the training and > testing, should I run all the scripts in the run.sh file first? > > Thank you very much. > > Best, > > Zibo > > > On Wed, Jul 16, 2014 at 3:18 PM, Daniel Povey <dp...@gm... > <mailto:dp...@gm...>> wrote: > > I think it's possible that this is caused by a bug in Kaldi itself. > The way I would debug this is to first figure out which of the log > files corresponds to the error (probably one of > exp/mono/log/align.12.*.log), and run the command line that you'll > see at the top of the log file manually to verify that you can > reproduce the error by running it again. > (BTW, I'm a little confused here as normally the stderr of the job > should go to the log file, and this error is produced on the console. ). > > > If you can, then instead of running > <program> <args> > from the console, you'll run > gdb --args <program> <args> > and at the (gdb) prompt you'll type > r > Then hopefully it will run until you get an error. At that point > you can type > bt > to get a backtrace, which you'll show us. > > Dan > > > > On Wed, Jul 16, 2014 at 7:43 AM, Zibo Meng <mzb...@gm... > <mailto:mzb...@gm...>> wrote: > > Hi Jan, > Thank you so much for your reply. > > Here is the information about my distribution, gcc and glibc: > > Fedora release 19 (Schrödinger’s Cat) > NAME=Fedora > VERSION="19 (Schrödinger’s Cat)" > ID=fedora > VERSION_ID=19 > PRETTY_NAME="Fedora 19 (Schrödinger’s Cat)" > ANSI_COLOR="0;34" > CPE_NAME="cpe:/o:fedoraproject:fedora:19" > Fedora release 19 (Schrödinger’s Cat) > Fedora release 19 (Schrödinger’s Cat) > > gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7) > > ldd (GNU libc) 2.17 > > Thank you! > > Zibo > > > > On Wed, Jul 16, 2014 at 10:35 AM, Jan Trmal <af...@ce... > <mailto:af...@ce...>> wrote: > > This looks like a problem with your machine or the toolchain > that was used to compiled kaldi (especially the compiler > and/or the glibc). > If you have experience with debugging, you can run the > command again, generate core dump (using ulimit –c > unlimited) and load it into gdb to figure out the details. > What distribution and gcc and glibc are you using? > > y. > > > > On Wed, Jul 16, 2014 at 10:18 AM, Zibo Meng > <mzb...@gm... <mailto:mzb...@gm...>> wrote: > > Hi, > > After I created the lang directory, I used > steps/train_mono.sh --nj 4 data/train.1k data/lang > exp/mono. But I got the error message as follows: > > steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono > steps/train_mono.sh --nj 4 data/train.1k data/lang exp/mono > vads = data/train.1k/split4/1/vad.scp > vads = data/train.1k/split4/1/vad.scp > data/train.1k/split4/2/vad.scp > vads = data/train.1k/split4/1/vad.scp > data/train.1k/split4/2/vad.scp > data/train.1k/split4/3/vad.scp > vads = data/train.1k/split4/1/vad.scp > data/train.1k/split4/2/vad.scp > data/train.1k/split4/3/vad.scp > data/train.1k/split4/4/vad.scp > steps/train_mono.sh: Initializing monophone system. > steps/train_mono.sh: Compiling training graphs > steps/train_mono.sh: Aligning data equally (pass 0) > steps/train_mono.sh: Pass 1 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 2 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 3 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 4 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 5 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 6 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 7 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 8 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 9 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 10 > steps/train_mono.sh: Aligning data > steps/train_mono.sh: Pass 11 > steps/train_mono.sh: Pass 12 > steps/train_mono.sh: Aligning data > *** Error in `gmm-acc-stats-ali': free(): corrupted > unsorted chunks: 0x0000000001e10e60 *** > ======= Backtrace: ========= > /lib64/libc.so.6[0x367887d0b8] > gmm-acc-stats-ali(_ZN5kaldi6VectorIfE7DestroyEv+0x27)[0x59f127] > gmm-acc-stats-ali(_ZN5kaldi6VectorIfED1Ev+0x19)[0x4da151] > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm14LogLikelihoodsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x1e6)[0x4fc156] > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm19ComponentPosteriorsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x10a)[0x4fc946] > gmm-acc-stats-ali(_ZN5kaldi12AccumDiagGmm18AccumulateFromDiagERKNS_7DiagGmmERKNS_10VectorBaseIfEEf+0x118)[0x507410] > gmm-acc-stats-ali(_ZN5kaldi14AccumAmDiagGmm16AccumulateForGmmERKNS_9AmDiagGmmERKNS_10VectorBaseIfEEif+0x9e)[0x4f2ec0] > gmm-acc-stats-ali(main+0x56c)[0x4d7aec] > /lib64/libc.so.6(__libc_start_main+0xf5)[0x3678821b45] > gmm-acc-stats-ali[0x4d74b9] > > Can you please tell me what went wrong here? > > Thank you so much! > > Zibo > > > > On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng > <mzb...@gm... <mailto:mzb...@gm...>> wrote: > > Hi, > > I got another problem. > > When I tried make_mfcc.sh to create the feats.scp > files it did not work. > > I checked the log file where it said some thing like: > > compute-mfcc-feats --verbose=2 > --config=conf/mfcc.conf > scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- > ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) > WaveData: can read only PCM data, audio_format is > not 1: 65534 > WARNING > (compute-mfcc-feats:Read():feat/wave-reader.h:148) > Exception caught in WaveHolder object (reading). > WARNING > (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) > TableReader: failed to load object from 'test.wav' > > Then I checked the attributes of my test.wav file > which were as follows: > Input File : 'test.wav' > Channels : 1 > Sample Rate : 48000 > Precision : 24-bit > Duration : 00:03:30.09 = 10084224 samples ~ > 15756.6 CDDA sectors > File Size : 30.3M > Bit Rate : 1.15M > Sample Encoding: 24-bit Signed Integer PCM > > Can you tell me what should I modify to my audio > files. Thank you so much! > > Best, > > Zibo > > > > On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng > <mzb...@gm... <mailto:mzb...@gm...>> wrote: > > Hi, > > I am preparing the data for dnn training using > my own data set. I followed the instruction on > http://kaldi.sourceforge.net/data_prep.html. > > I created the file "text" as the first 3 lines: > S002-U-000300-000470 OH > S002-U-000470-000630 I'D > S002-U-000630-000870 LIKE > > the wav.scp file: > S002-U <path to the corresponding wav file> > S002-O <path to the corresponding wav file> > S003-U <path to the corresponding wav file> > > and the utt2spk file: > S002-U-000300-000470 002-U > S002-U-000470-000630 002-U > S002-U-000630-000870 002-U > > Then I used utt2spk_to_spk2utt.pl > <http://utt2spk_to_spk2utt.pl> to create the > spk2utt file. Everything went well until I tried > to use the mak_mfcc.sh to create the feats.scp > file where I got the error message like: > > utils/validate_data_dir.sh: file data/utt2spk is > not in sorted order or has duplicates > > seems like my utt2spk file could not pass > through the validation. > > Can any body help me out of here? Thank you so much. > > Best, > > Zibo > > > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your > enterprise? Index and > search up to 200,000 lines of code with a free copy of > Black Duck > Code Sight - the same software that powers the world's > largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? > Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > > > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Daniel P. <dp...@gm...> - 2014-07-21 17:46:08
|
Simon-- re checking for that version of gcc, that is a good idea. Perhaps you could prepare a patch for the "configure" script and send it to me separately. But see if you can check your email history for confirmation that it was 3.8.2. Zibo: you can do gdb <program-name> <core> and at the (gdb) prompt, type bt and show us the result. Dan On Mon, Jul 21, 2014 at 8:13 AM, Simon Klüpfel <sim...@gm...> wrote: > Hi, > > wasn't gcc 4.8.2. the buggy version? If I remember right, it had caused > crashes before, sometimes not reproducible and at different stages of > the process. > > Downgrading to 4.7 helped for me. > > @Dan: Would it be worth to add a warning (or even error) when running > configure, if this version of gcc is used? It seems to be the standard > compiler in some distros right now. That might avoid some of the > problems on this list. > > All the best, > > Simon > > > On 07/21/2014 02:36 PM, Zibo Meng wrote: > > Hi Dan and Jan, > > > > Thanks for you help! > > > > I ran the bash script one more time and I got the error at the 19th > > pass. Since I don't know how to debug c++ program called from the shell > > script. So I take Jan's advice to run the ulimit -c unlimited before I > > ran the code. I got the core file when the core dump error occurred > > whose size is 303MB that can not be attached to this email. Please tell > > me what else I should do. > > > > BTW, I used the following script : > > steps/train_mono.sh --nj 10 data/train.1k data/lang exp/mono > > where I changed the number of job from 4 to 10 and now I am at the 39th > > pass without suffering a core dump as before. > > > > One more question, if I want to use run_nnet2.sh to do the training and > > testing, should I run all the scripts in the run.sh file first? > > > > Thank you very much. > > > > Best, > > > > Zibo > > > > > > On Wed, Jul 16, 2014 at 3:18 PM, Daniel Povey <dp...@gm... > > <mailto:dp...@gm...>> wrote: > > > > I think it's possible that this is caused by a bug in Kaldi itself. > > The way I would debug this is to first figure out which of the log > > files corresponds to the error (probably one of > > exp/mono/log/align.12.*.log), and run the command line that you'll > > see at the top of the log file manually to verify that you can > > reproduce the error by running it again. > > (BTW, I'm a little confused here as normally the stderr of the job > > should go to the log file, and this error is produced on the > console. ). > > > > > > If you can, then instead of running > > <program> <args> > > from the console, you'll run > > gdb --args <program> <args> > > and at the (gdb) prompt you'll type > > r > > Then hopefully it will run until you get an error. At that point > > you can type > > bt > > to get a backtrace, which you'll show us. > > > > Dan > > > > > > > > On Wed, Jul 16, 2014 at 7:43 AM, Zibo Meng <mzb...@gm... > > <mailto:mzb...@gm...>> wrote: > > > > Hi Jan, > > Thank you so much for your reply. > > > > Here is the information about my distribution, gcc and glibc: > > > > Fedora release 19 (Schrödinger’s Cat) > > NAME=Fedora > > VERSION="19 (Schrödinger’s Cat)" > > ID=fedora > > VERSION_ID=19 > > PRETTY_NAME="Fedora 19 (Schrödinger’s Cat)" > > ANSI_COLOR="0;34" > > CPE_NAME="cpe:/o:fedoraproject:fedora:19" > > Fedora release 19 (Schrödinger’s Cat) > > Fedora release 19 (Schrödinger’s Cat) > > > > gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7) > > > > ldd (GNU libc) 2.17 > > > > Thank you! > > > > Zibo > > > > > > > > On Wed, Jul 16, 2014 at 10:35 AM, Jan Trmal <af...@ce... > > <mailto:af...@ce...>> wrote: > > > > This looks like a problem with your machine or the toolchain > > that was used to compiled kaldi (especially the compiler > > and/or the glibc). > > If you have experience with debugging, you can run the > > command again, generate core dump (using ulimit –c > > unlimited) and load it into gdb to figure out the details. > > What distribution and gcc and glibc are you using? > > > > y. > > > > > > > > On Wed, Jul 16, 2014 at 10:18 AM, Zibo Meng > > <mzb...@gm... <mailto:mzb...@gm...>> wrote: > > > > Hi, > > > > After I created the lang directory, I used > > steps/train_mono.sh --nj 4 data/train.1k data/lang > > exp/mono. But I got the error message as follows: > > > > steps/train_mono.sh --nj 4 data/train.1k data/lang > exp/mono > > steps/train_mono.sh --nj 4 data/train.1k data/lang > exp/mono > > vads = data/train.1k/split4/1/vad.scp > > vads = data/train.1k/split4/1/vad.scp > > data/train.1k/split4/2/vad.scp > > vads = data/train.1k/split4/1/vad.scp > > data/train.1k/split4/2/vad.scp > > data/train.1k/split4/3/vad.scp > > vads = data/train.1k/split4/1/vad.scp > > data/train.1k/split4/2/vad.scp > > data/train.1k/split4/3/vad.scp > > data/train.1k/split4/4/vad.scp > > steps/train_mono.sh: Initializing monophone system. > > steps/train_mono.sh: Compiling training graphs > > steps/train_mono.sh: Aligning data equally (pass 0) > > steps/train_mono.sh: Pass 1 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 2 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 3 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 4 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 5 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 6 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 7 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 8 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 9 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 10 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 11 > > steps/train_mono.sh: Pass 12 > > steps/train_mono.sh: Aligning data > > *** Error in `gmm-acc-stats-ali': free(): corrupted > > unsorted chunks: 0x0000000001e10e60 *** > > ======= Backtrace: ========= > > /lib64/libc.so.6[0x367887d0b8] > > > gmm-acc-stats-ali(_ZN5kaldi6VectorIfE7DestroyEv+0x27)[0x59f127] > > gmm-acc-stats-ali(_ZN5kaldi6VectorIfED1Ev+0x19)[0x4da151] > > > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm14LogLikelihoodsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x1e6)[0x4fc156] > > > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm19ComponentPosteriorsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x10a)[0x4fc946] > > > gmm-acc-stats-ali(_ZN5kaldi12AccumDiagGmm18AccumulateFromDiagERKNS_7DiagGmmERKNS_10VectorBaseIfEEf+0x118)[0x507410] > > > gmm-acc-stats-ali(_ZN5kaldi14AccumAmDiagGmm16AccumulateForGmmERKNS_9AmDiagGmmERKNS_10VectorBaseIfEEif+0x9e)[0x4f2ec0] > > gmm-acc-stats-ali(main+0x56c)[0x4d7aec] > > /lib64/libc.so.6(__libc_start_main+0xf5)[0x3678821b45] > > gmm-acc-stats-ali[0x4d74b9] > > > > Can you please tell me what went wrong here? > > > > Thank you so much! > > > > Zibo > > > > > > > > On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng > > <mzb...@gm... <mailto:mzb...@gm...>> wrote: > > > > Hi, > > > > I got another problem. > > > > When I tried make_mfcc.sh to create the feats.scp > > files it did not work. > > > > I checked the log file where it said some thing like: > > > > compute-mfcc-feats --verbose=2 > > --config=conf/mfcc.conf > > scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- > > ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) > > WaveData: can read only PCM data, audio_format is > > not 1: 65534 > > WARNING > > (compute-mfcc-feats:Read():feat/wave-reader.h:148) > > Exception caught in WaveHolder object (reading). > > WARNING > > > (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) > > TableReader: failed to load object from 'test.wav' > > > > Then I checked the attributes of my test.wav file > > which were as follows: > > Input File : 'test.wav' > > Channels : 1 > > Sample Rate : 48000 > > Precision : 24-bit > > Duration : 00:03:30.09 = 10084224 samples ~ > > 15756.6 CDDA sectors > > File Size : 30.3M > > Bit Rate : 1.15M > > Sample Encoding: 24-bit Signed Integer PCM > > > > Can you tell me what should I modify to my audio > > files. Thank you so much! > > > > Best, > > > > Zibo > > > > > > > > On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng > > <mzb...@gm... <mailto:mzb...@gm...>> > wrote: > > > > Hi, > > > > I am preparing the data for dnn training using > > my own data set. I followed the instruction on > > http://kaldi.sourceforge.net/data_prep.html. > > > > I created the file "text" as the first 3 lines: > > S002-U-000300-000470 OH > > S002-U-000470-000630 I'D > > S002-U-000630-000870 LIKE > > > > the wav.scp file: > > S002-U <path to the corresponding wav file> > > S002-O <path to the corresponding wav file> > > S003-U <path to the corresponding wav file> > > > > and the utt2spk file: > > S002-U-000300-000470 002-U > > S002-U-000470-000630 002-U > > S002-U-000630-000870 002-U > > > > Then I used utt2spk_to_spk2utt.pl > > <http://utt2spk_to_spk2utt.pl> to create the > > spk2utt file. Everything went well until I tried > > to use the mak_mfcc.sh to create the feats.scp > > file where I got the error message like: > > > > utils/validate_data_dir.sh: file data/utt2spk is > > not in sorted order or has duplicates > > > > seems like my utt2spk file could not pass > > through the validation. > > > > Can any body help me out of here? Thank you so > much. > > > > Best, > > > > Zibo > > > > > > > > > > > ------------------------------------------------------------------------------ > > Want fast and easy access to all the code in your > > enterprise? Index and > > search up to 200,000 lines of code with a free copy of > > Black Duck > > Code Sight - the same software that powers the world's > > largest code > > search on Ohloh, the Black Duck Open Hub! Try it now. > > http://p.sf.net/sfu/bds > > > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > <mailto:Kal...@li...> > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > > > > > > ------------------------------------------------------------------------------ > > Want fast and easy access to all the code in your enterprise? > > Index and > > search up to 200,000 lines of code with a free copy of Black Duck > > Code Sight - the same software that powers the world's largest > code > > search on Ohloh, the Black Duck Open Hub! Try it now. > > http://p.sf.net/sfu/bds > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > <mailto:Kal...@li...> > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Want fast and easy access to all the code in your enterprise? Index and > > search up to 200,000 lines of code with a free copy of Black Duck > > Code Sight - the same software that powers the world's largest code > > search on Ohloh, the Black Duck Open Hub! Try it now. > > http://p.sf.net/sfu/bds > > > > > > > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Jan T. <af...@ce...> - 2014-07-11 15:40:25
|
Hi, type 65534 is some microsoft specific type used to add some specific extensions and previously unsupported features to the wave format. you can try to use sox to convert/sanitize the wav. y. On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng <mzb...@gm...> wrote: > Hi, > > I got another problem. > > When I tried make_mfcc.sh to create the feats.scp files it did not work. > > I checked the log file where it said some thing like: > > compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf > scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- > ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) WaveData: can read > only PCM data, audio_format is not 1: 65534 > WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:148) Exception > caught in WaveHolder object (reading). > WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) > TableReader: failed to load object from 'test.wav' > > Then I checked the attributes of my test.wav file which were as follows: > Input File : 'test.wav' > Channels : 1 > Sample Rate : 48000 > Precision : 24-bit > Duration : 00:03:30.09 = 10084224 samples ~ 15756.6 CDDA sectors > File Size : 30.3M > Bit Rate : 1.15M > Sample Encoding: 24-bit Signed Integer PCM > > Can you tell me what should I modify to my audio files. Thank you so much! > > Best, > > Zibo > > > > On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: > >> Hi, >> >> I am preparing the data for dnn training using my own data set. I >> followed the instruction on http://kaldi.sourceforge.net/data_prep.html. >> >> I created the file "text" as the first 3 lines: >> S002-U-000300-000470 OH >> S002-U-000470-000630 I'D >> S002-U-000630-000870 LIKE >> >> the wav.scp file: >> S002-U <path to the corresponding wav file> >> S002-O <path to the corresponding wav file> >> S003-U <path to the corresponding wav file> >> >> and the utt2spk file: >> S002-U-000300-000470 002-U >> S002-U-000470-000630 002-U >> S002-U-000630-000870 002-U >> >> Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. Everything >> went well until I tried to use the mak_mfcc.sh to create the feats.scp file >> where I got the error message like: >> >> utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or >> has duplicates >> >> seems like my utt2spk file could not pass through the validation. >> >> Can any body help me out of here? Thank you so much. >> >> Best, >> >> Zibo >> > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Jan T. <af...@ce...> - 2014-07-11 15:48:24
|
BTW, it might be the case that you might need to downsample to 44.1kHz and 16bits per sample. I think you can do this without any fear about accuracy/performance of the recognizer. y. On Fri, Jul 11, 2014 at 11:40 AM, Jan Trmal <af...@ce...> wrote: > Hi, type 65534 is some microsoft specific type used to add some specific extensions and previously unsupported features to the wave format. > > you can try to use sox to convert/sanitize the wav. > > y. > > > > On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng <mzb...@gm...> wrote: > >> Hi, >> >> I got another problem. >> >> When I tried make_mfcc.sh to create the feats.scp files it did not work. >> >> I checked the log file where it said some thing like: >> >> compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf >> scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- >> ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) WaveData: can read >> only PCM data, audio_format is not 1: 65534 >> WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:148) Exception >> caught in WaveHolder object (reading). >> WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) >> TableReader: failed to load object from 'test.wav' >> >> Then I checked the attributes of my test.wav file which were as follows: >> Input File : 'test.wav' >> Channels : 1 >> Sample Rate : 48000 >> Precision : 24-bit >> Duration : 00:03:30.09 = 10084224 samples ~ 15756.6 CDDA sectors >> File Size : 30.3M >> Bit Rate : 1.15M >> Sample Encoding: 24-bit Signed Integer PCM >> >> Can you tell me what should I modify to my audio files. Thank you so much! >> >> Best, >> >> Zibo >> >> >> >> On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: >> >>> Hi, >>> >>> I am preparing the data for dnn training using my own data set. I >>> followed the instruction on http://kaldi.sourceforge.net/data_prep.html. >>> >>> I created the file "text" as the first 3 lines: >>> S002-U-000300-000470 OH >>> S002-U-000470-000630 I'D >>> S002-U-000630-000870 LIKE >>> >>> the wav.scp file: >>> S002-U <path to the corresponding wav file> >>> S002-O <path to the corresponding wav file> >>> S003-U <path to the corresponding wav file> >>> >>> and the utt2spk file: >>> S002-U-000300-000470 002-U >>> S002-U-000470-000630 002-U >>> S002-U-000630-000870 002-U >>> >>> Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. >>> Everything went well until I tried to use the mak_mfcc.sh to create the >>> feats.scp file where I got the error message like: >>> >>> utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or >>> has duplicates >>> >>> seems like my utt2spk file could not pass through the validation. >>> >>> Can any body help me out of here? Thank you so much. >>> >>> Best, >>> >>> Zibo >>> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> > |
|
From: Daniel P. <dp...@gm...> - 2014-07-11 16:22:46
|
BTW, 24 bits per sample is not supported by the reading code, only 8, 16 and 32. The wav format definition is very open-ended so it's hard to read- this has been a source of recurring problems. But the best solution is to use sox to convert it, like Yenda says- you can do this as part of a pipe, the wav.scp can be something like utterance-id sox [sox args] input.wav - | Dan On Fri, Jul 11, 2014 at 11:48 AM, Jan Trmal <af...@ce...> wrote: > BTW, it might be the case that you might need to downsample to 44.1kHz and > 16bits per sample. I think you can do this without any fear about > accuracy/performance of the recognizer. > y. > > > On Fri, Jul 11, 2014 at 11:40 AM, Jan Trmal <af...@ce...> wrote: > >> Hi, type 65534 is some microsoft specific type used to add some specific extensions and previously unsupported features to the wave format. >> >> you can try to use sox to convert/sanitize the wav. >> >> y. >> >> >> >> On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng <mzb...@gm...> wrote: >> >>> Hi, >>> >>> I got another problem. >>> >>> When I tried make_mfcc.sh to create the feats.scp files it did not work. >>> >>> I checked the log file where it said some thing like: >>> >>> compute-mfcc-feats --verbose=2 --config=conf/mfcc.conf >>> scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- >>> ERROR (compute-mfcc-feats:Read():wave-reader.cc:144) WaveData: can read >>> only PCM data, audio_format is not 1: 65534 >>> WARNING (compute-mfcc-feats:Read():feat/wave-reader.h:148) Exception >>> caught in WaveHolder object (reading). >>> WARNING (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) >>> TableReader: failed to load object from 'test.wav' >>> >>> Then I checked the attributes of my test.wav file which were as follows: >>> Input File : 'test.wav' >>> Channels : 1 >>> Sample Rate : 48000 >>> Precision : 24-bit >>> Duration : 00:03:30.09 = 10084224 samples ~ 15756.6 CDDA sectors >>> File Size : 30.3M >>> Bit Rate : 1.15M >>> Sample Encoding: 24-bit Signed Integer PCM >>> >>> Can you tell me what should I modify to my audio files. Thank you so >>> much! >>> >>> Best, >>> >>> Zibo >>> >>> >>> >>> On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng <mzb...@gm...> wrote: >>> >>>> Hi, >>>> >>>> I am preparing the data for dnn training using my own data set. I >>>> followed the instruction on http://kaldi.sourceforge.net/data_prep.html >>>> . >>>> >>>> I created the file "text" as the first 3 lines: >>>> S002-U-000300-000470 OH >>>> S002-U-000470-000630 I'D >>>> S002-U-000630-000870 LIKE >>>> >>>> the wav.scp file: >>>> S002-U <path to the corresponding wav file> >>>> S002-O <path to the corresponding wav file> >>>> S003-U <path to the corresponding wav file> >>>> >>>> and the utt2spk file: >>>> S002-U-000300-000470 002-U >>>> S002-U-000470-000630 002-U >>>> S002-U-000630-000870 002-U >>>> >>>> Then I used utt2spk_to_spk2utt.pl to create the spk2utt file. >>>> Everything went well until I tried to use the mak_mfcc.sh to create the >>>> feats.scp file where I got the error message like: >>>> >>>> utils/validate_data_dir.sh: file data/utt2spk is not in sorted order or >>>> has duplicates >>>> >>>> seems like my utt2spk file could not pass through the validation. >>>> >>>> Can any body help me out of here? Thank you so much. >>>> >>>> Best, >>>> >>>> Zibo >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >>> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Simon K. <sim...@gm...> - 2014-07-21 18:17:19
|
Hi Dan, I can verify from my old emails that I experienced problems in matrix-lib-test . I was then using gcc version 4.8.2 (Debian 4.8.2-1). Downgrading fixed the problems. In a later email, you wrote: On 04/22/2014 06:37 PM, Daniel Povey wrote: > I think this is a bug in gcc version 4.8.1 that causes random crashes in > "nth_element". > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58800 > This can't easily be worked around since nth_element is used in many > places. But gcc has since been fixed. > Dan So I am not completely sure, how deep one should check for the version, i.e. if it might be the simplest to just check for 4.8(.X) and throw a warning. I'm not sure, where it is best to add the check to the configure script, and what to check for. Also I am halfway in vacation. Perhaps someone else could take over to write a patch? All the best, Simon On 07/21/2014 05:46 PM, Daniel Povey wrote: > Simon-- re checking for that version of gcc, that is a good idea. > Perhaps you could prepare a patch for the "configure" script and send it > to me separately. But see if you can check your email history for > confirmation that it was 3.8.2. > > Zibo: > > you can do > gdb <program-name> <core> > and at the (gdb) prompt, type > bt > and show us the result. > > Dan > > > > On Mon, Jul 21, 2014 at 8:13 AM, Simon Klüpfel <sim...@gm... > <mailto:sim...@gm...>> wrote: > > Hi, > > wasn't gcc 4.8.2. the buggy version? If I remember right, it had caused > crashes before, sometimes not reproducible and at different stages of > the process. > > Downgrading to 4.7 helped for me. > > @Dan: Would it be worth to add a warning (or even error) when running > configure, if this version of gcc is used? It seems to be the standard > compiler in some distros right now. That might avoid some of the > problems on this list. > > All the best, > > Simon > > > On 07/21/2014 02:36 PM, Zibo Meng wrote: > > Hi Dan and Jan, > > > > Thanks for you help! > > > > I ran the bash script one more time and I got the error at the 19th > > pass. Since I don't know how to debug c++ program called from the > shell > > script. So I take Jan's advice to run the ulimit -c unlimited > before I > > ran the code. I got the core file when the core dump error occurred > > whose size is 303MB that can not be attached to this email. > Please tell > > me what else I should do. > > > > BTW, I used the following script : > > steps/train_mono.sh --nj 10 data/train.1k data/lang exp/mono > > where I changed the number of job from 4 to 10 and now I am at > the 39th > > pass without suffering a core dump as before. > > > > One more question, if I want to use run_nnet2.sh to do the > training and > > testing, should I run all the scripts in the run.sh file first? > > > > Thank you very much. > > > > Best, > > > > Zibo > > > > > > On Wed, Jul 16, 2014 at 3:18 PM, Daniel Povey <dp...@gm... > <mailto:dp...@gm...> > > <mailto:dp...@gm... <mailto:dp...@gm...>>> wrote: > > > > I think it's possible that this is caused by a bug in Kaldi > itself. > > The way I would debug this is to first figure out which of > the log > > files corresponds to the error (probably one of > > exp/mono/log/align.12.*.log), and run the command line that > you'll > > see at the top of the log file manually to verify that you can > > reproduce the error by running it again. > > (BTW, I'm a little confused here as normally the stderr of > the job > > should go to the log file, and this error is produced on the > console. ). > > > > > > If you can, then instead of running > > <program> <args> > > from the console, you'll run > > gdb --args <program> <args> > > and at the (gdb) prompt you'll type > > r > > Then hopefully it will run until you get an error. At that point > > you can type > > bt > > to get a backtrace, which you'll show us. > > > > Dan > > > > > > > > On Wed, Jul 16, 2014 at 7:43 AM, Zibo Meng > <mzb...@gm... <mailto:mzb...@gm...> > > <mailto:mzb...@gm... <mailto:mzb...@gm...>>> wrote: > > > > Hi Jan, > > Thank you so much for your reply. > > > > Here is the information about my distribution, gcc and glibc: > > > > Fedora release 19 (Schrödinger’s Cat) > > NAME=Fedora > > VERSION="19 (Schrödinger’s Cat)" > > ID=fedora > > VERSION_ID=19 > > PRETTY_NAME="Fedora 19 (Schrödinger’s Cat)" > > ANSI_COLOR="0;34" > > CPE_NAME="cpe:/o:fedoraproject:fedora:19" > > Fedora release 19 (Schrödinger’s Cat) > > Fedora release 19 (Schrödinger’s Cat) > > > > gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7) > > > > ldd (GNU libc) 2.17 > > > > Thank you! > > > > Zibo > > > > > > > > On Wed, Jul 16, 2014 at 10:35 AM, Jan Trmal > <af...@ce... <mailto:af...@ce...> > > <mailto:af...@ce... <mailto:af...@ce...>>> wrote: > > > > This looks like a problem with your machine or the > toolchain > > that was used to compiled kaldi (especially the compiler > > and/or the glibc). > > If you have experience with debugging, you can run the > > command again, generate core dump (using ulimit –c > > unlimited) and load it into gdb to figure out the > details. > > What distribution and gcc and glibc are you using? > > > > y. > > > > > > > > On Wed, Jul 16, 2014 at 10:18 AM, Zibo Meng > > <mzb...@gm... <mailto:mzb...@gm...> > <mailto:mzb...@gm... <mailto:mzb...@gm...>>> wrote: > > > > Hi, > > > > After I created the lang directory, I used > > steps/train_mono.sh --nj 4 data/train.1k data/lang > > exp/mono. But I got the error message as follows: > > > > steps/train_mono.sh --nj 4 data/train.1k > data/lang exp/mono > > steps/train_mono.sh --nj 4 data/train.1k > data/lang exp/mono > > vads = data/train.1k/split4/1/vad.scp > > vads = data/train.1k/split4/1/vad.scp > > data/train.1k/split4/2/vad.scp > > vads = data/train.1k/split4/1/vad.scp > > data/train.1k/split4/2/vad.scp > > data/train.1k/split4/3/vad.scp > > vads = data/train.1k/split4/1/vad.scp > > data/train.1k/split4/2/vad.scp > > data/train.1k/split4/3/vad.scp > > data/train.1k/split4/4/vad.scp > > steps/train_mono.sh: Initializing monophone system. > > steps/train_mono.sh: Compiling training graphs > > steps/train_mono.sh: Aligning data equally (pass 0) > > steps/train_mono.sh: Pass 1 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 2 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 3 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 4 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 5 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 6 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 7 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 8 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 9 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 10 > > steps/train_mono.sh: Aligning data > > steps/train_mono.sh: Pass 11 > > steps/train_mono.sh: Pass 12 > > steps/train_mono.sh: Aligning data > > *** Error in `gmm-acc-stats-ali': free(): corrupted > > unsorted chunks: 0x0000000001e10e60 *** > > ======= Backtrace: ========= > > /lib64/libc.so.6[0x367887d0b8] > > > gmm-acc-stats-ali(_ZN5kaldi6VectorIfE7DestroyEv+0x27)[0x59f127] > > > gmm-acc-stats-ali(_ZN5kaldi6VectorIfED1Ev+0x19)[0x4da151] > > > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm14LogLikelihoodsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x1e6)[0x4fc156] > > > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm19ComponentPosteriorsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x10a)[0x4fc946] > > > gmm-acc-stats-ali(_ZN5kaldi12AccumDiagGmm18AccumulateFromDiagERKNS_7DiagGmmERKNS_10VectorBaseIfEEf+0x118)[0x507410] > > > gmm-acc-stats-ali(_ZN5kaldi14AccumAmDiagGmm16AccumulateForGmmERKNS_9AmDiagGmmERKNS_10VectorBaseIfEEif+0x9e)[0x4f2ec0] > > gmm-acc-stats-ali(main+0x56c)[0x4d7aec] > > > /lib64/libc.so.6(__libc_start_main+0xf5)[0x3678821b45] > > gmm-acc-stats-ali[0x4d74b9] > > > > Can you please tell me what went wrong here? > > > > Thank you so much! > > > > Zibo > > > > > > > > On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng > > <mzb...@gm... <mailto:mzb...@gm...> > <mailto:mzb...@gm... <mailto:mzb...@gm...>>> wrote: > > > > Hi, > > > > I got another problem. > > > > When I tried make_mfcc.sh to create the feats.scp > > files it did not work. > > > > I checked the log file where it said some > thing like: > > > > compute-mfcc-feats --verbose=2 > > --config=conf/mfcc.conf > > scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- > > ERROR > (compute-mfcc-feats:Read():wave-reader.cc:144) > > WaveData: can read only PCM data, audio_format is > > not 1: 65534 > > WARNING > > > (compute-mfcc-feats:Read():feat/wave-reader.h:148) > > Exception caught in WaveHolder object (reading). > > WARNING > > > (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) > > TableReader: failed to load object from > 'test.wav' > > > > Then I checked the attributes of my test.wav file > > which were as follows: > > Input File : 'test.wav' > > Channels : 1 > > Sample Rate : 48000 > > Precision : 24-bit > > Duration : 00:03:30.09 = 10084224 samples ~ > > 15756.6 CDDA sectors > > File Size : 30.3M > > Bit Rate : 1.15M > > Sample Encoding: 24-bit Signed Integer PCM > > > > Can you tell me what should I modify to my audio > > files. Thank you so much! > > > > Best, > > > > Zibo > > > > > > > > On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng > > <mzb...@gm... > <mailto:mzb...@gm...> <mailto:mzb...@gm... > <mailto:mzb...@gm...>>> wrote: > > > > Hi, > > > > I am preparing the data for dnn training > using > > my own data set. I followed the > instruction on > > http://kaldi.sourceforge.net/data_prep.html. > > > > I created the file "text" as the first 3 > lines: > > S002-U-000300-000470 OH > > S002-U-000470-000630 I'D > > S002-U-000630-000870 LIKE > > > > the wav.scp file: > > S002-U <path to the corresponding wav file> > > S002-O <path to the corresponding wav file> > > S003-U <path to the corresponding wav file> > > > > and the utt2spk file: > > S002-U-000300-000470 002-U > > S002-U-000470-000630 002-U > > S002-U-000630-000870 002-U > > > > Then I used utt2spk_to_spk2utt.pl > <http://utt2spk_to_spk2utt.pl> > > <http://utt2spk_to_spk2utt.pl> to create the > > spk2utt file. Everything went well until > I tried > > to use the mak_mfcc.sh to create the > feats.scp > > file where I got the error message like: > > > > utils/validate_data_dir.sh: file > data/utt2spk is > > not in sorted order or has duplicates > > > > seems like my utt2spk file could not pass > > through the validation. > > > > Can any body help me out of here? Thank > you so much. > > > > Best, > > > > Zibo > > > > > > > > > > > ------------------------------------------------------------------------------ > > Want fast and easy access to all the code in your > > enterprise? Index and > > search up to 200,000 lines of code with a free > copy of > > Black Duck > > Code Sight - the same software that powers the > world's > > largest code > > search on Ohloh, the Black Duck Open Hub! Try it now. > > http://p.sf.net/sfu/bds > > > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > <mailto:Kal...@li...> > > <mailto:Kal...@li... > <mailto:Kal...@li...>> > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > > > > > > ------------------------------------------------------------------------------ > > Want fast and easy access to all the code in your enterprise? > > Index and > > search up to 200,000 lines of code with a free copy of > Black Duck > > Code Sight - the same software that powers the world's > largest code > > search on Ohloh, the Black Duck Open Hub! Try it now. > > http://p.sf.net/sfu/bds > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > <mailto:Kal...@li...> > > <mailto:Kal...@li... > <mailto:Kal...@li...>> > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Want fast and easy access to all the code in your enterprise? > Index and > > search up to 200,000 lines of code with a free copy of Black Duck > > Code Sight - the same software that powers the world's largest code > > search on Ohloh, the Black Duck Open Hub! Try it now. > > http://p.sf.net/sfu/bds > > > > > > > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > <mailto:Kal...@li...> > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Daniel P. <dp...@gm...> - 2014-07-21 18:30:09
|
OK, did it myself. Dan On Mon, Jul 21, 2014 at 11:17 AM, Simon Klüpfel <sim...@gm...> wrote: > Hi Dan, > > I can verify from my old emails that I experienced problems in > matrix-lib-test . I was then using gcc version 4.8.2 (Debian 4.8.2-1). > Downgrading fixed the problems. > > In a later email, you wrote: > > On 04/22/2014 06:37 PM, Daniel Povey wrote: > > I think this is a bug in gcc version 4.8.1 that causes random crashes in > > "nth_element". > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58800 > > This can't easily be worked around since nth_element is used in many > > places. But gcc has since been fixed. > > Dan > > So I am not completely sure, how deep one should check for the version, > i.e. if it might be the simplest to just check for 4.8(.X) and throw a > warning. > > I'm not sure, where it is best to add the check to the configure script, > and what to check for. Also I am halfway in vacation. > > Perhaps someone else could take over to write a patch? > > All the best, > > Simon > > > > On 07/21/2014 05:46 PM, Daniel Povey wrote: > > Simon-- re checking for that version of gcc, that is a good idea. > > Perhaps you could prepare a patch for the "configure" script and send it > > to me separately. But see if you can check your email history for > > confirmation that it was 3.8.2. > > > > Zibo: > > > > you can do > > gdb <program-name> <core> > > and at the (gdb) prompt, type > > bt > > and show us the result. > > > > Dan > > > > > > > > On Mon, Jul 21, 2014 at 8:13 AM, Simon Klüpfel <sim...@gm... > > <mailto:sim...@gm...>> wrote: > > > > Hi, > > > > wasn't gcc 4.8.2. the buggy version? If I remember right, it had > caused > > crashes before, sometimes not reproducible and at different stages of > > the process. > > > > Downgrading to 4.7 helped for me. > > > > @Dan: Would it be worth to add a warning (or even error) when running > > configure, if this version of gcc is used? It seems to be the > standard > > compiler in some distros right now. That might avoid some of the > > problems on this list. > > > > All the best, > > > > Simon > > > > > > On 07/21/2014 02:36 PM, Zibo Meng wrote: > > > Hi Dan and Jan, > > > > > > Thanks for you help! > > > > > > I ran the bash script one more time and I got the error at the > 19th > > > pass. Since I don't know how to debug c++ program called from the > > shell > > > script. So I take Jan's advice to run the ulimit -c unlimited > > before I > > > ran the code. I got the core file when the core dump error > occurred > > > whose size is 303MB that can not be attached to this email. > > Please tell > > > me what else I should do. > > > > > > BTW, I used the following script : > > > steps/train_mono.sh --nj 10 data/train.1k data/lang exp/mono > > > where I changed the number of job from 4 to 10 and now I am at > > the 39th > > > pass without suffering a core dump as before. > > > > > > One more question, if I want to use run_nnet2.sh to do the > > training and > > > testing, should I run all the scripts in the run.sh file first? > > > > > > Thank you very much. > > > > > > Best, > > > > > > Zibo > > > > > > > > > On Wed, Jul 16, 2014 at 3:18 PM, Daniel Povey <dp...@gm... > > <mailto:dp...@gm...> > > > <mailto:dp...@gm... <mailto:dp...@gm...>>> wrote: > > > > > > I think it's possible that this is caused by a bug in Kaldi > > itself. > > > The way I would debug this is to first figure out which of > > the log > > > files corresponds to the error (probably one of > > > exp/mono/log/align.12.*.log), and run the command line that > > you'll > > > see at the top of the log file manually to verify that you can > > > reproduce the error by running it again. > > > (BTW, I'm a little confused here as normally the stderr of > > the job > > > should go to the log file, and this error is produced on the > > console. ). > > > > > > > > > If you can, then instead of running > > > <program> <args> > > > from the console, you'll run > > > gdb --args <program> <args> > > > and at the (gdb) prompt you'll type > > > r > > > Then hopefully it will run until you get an error. At that > point > > > you can type > > > bt > > > to get a backtrace, which you'll show us. > > > > > > Dan > > > > > > > > > > > > On Wed, Jul 16, 2014 at 7:43 AM, Zibo Meng > > <mzb...@gm... <mailto:mzb...@gm...> > > > <mailto:mzb...@gm... <mailto:mzb...@gm...>>> > wrote: > > > > > > Hi Jan, > > > Thank you so much for your reply. > > > > > > Here is the information about my distribution, gcc and > glibc: > > > > > > Fedora release 19 (Schrödinger’s Cat) > > > NAME=Fedora > > > VERSION="19 (Schrödinger’s Cat)" > > > ID=fedora > > > VERSION_ID=19 > > > PRETTY_NAME="Fedora 19 (Schrödinger’s Cat)" > > > ANSI_COLOR="0;34" > > > CPE_NAME="cpe:/o:fedoraproject:fedora:19" > > > Fedora release 19 (Schrödinger’s Cat) > > > Fedora release 19 (Schrödinger’s Cat) > > > > > > gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7) > > > > > > ldd (GNU libc) 2.17 > > > > > > Thank you! > > > > > > Zibo > > > > > > > > > > > > On Wed, Jul 16, 2014 at 10:35 AM, Jan Trmal > > <af...@ce... <mailto:af...@ce...> > > > <mailto:af...@ce... <mailto:af...@ce...>>> > wrote: > > > > > > This looks like a problem with your machine or the > > toolchain > > > that was used to compiled kaldi (especially the > compiler > > > and/or the glibc). > > > If you have experience with debugging, you can run the > > > command again, generate core dump (using ulimit –c > > > unlimited) and load it into gdb to figure out the > > details. > > > What distribution and gcc and glibc are you using? > > > > > > y. > > > > > > > > > > > > On Wed, Jul 16, 2014 at 10:18 AM, Zibo Meng > > > <mzb...@gm... <mailto:mzb...@gm...> > > <mailto:mzb...@gm... <mailto:mzb...@gm...>>> wrote: > > > > > > Hi, > > > > > > After I created the lang directory, I used > > > steps/train_mono.sh --nj 4 data/train.1k data/lang > > > exp/mono. But I got the error message as follows: > > > > > > steps/train_mono.sh --nj 4 data/train.1k > > data/lang exp/mono > > > steps/train_mono.sh --nj 4 data/train.1k > > data/lang exp/mono > > > vads = data/train.1k/split4/1/vad.scp > > > vads = data/train.1k/split4/1/vad.scp > > > data/train.1k/split4/2/vad.scp > > > vads = data/train.1k/split4/1/vad.scp > > > data/train.1k/split4/2/vad.scp > > > data/train.1k/split4/3/vad.scp > > > vads = data/train.1k/split4/1/vad.scp > > > data/train.1k/split4/2/vad.scp > > > data/train.1k/split4/3/vad.scp > > > data/train.1k/split4/4/vad.scp > > > steps/train_mono.sh: Initializing monophone > system. > > > steps/train_mono.sh: Compiling training graphs > > > steps/train_mono.sh: Aligning data equally (pass > 0) > > > steps/train_mono.sh: Pass 1 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 2 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 3 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 4 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 5 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 6 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 7 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 8 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 9 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 10 > > > steps/train_mono.sh: Aligning data > > > steps/train_mono.sh: Pass 11 > > > steps/train_mono.sh: Pass 12 > > > steps/train_mono.sh: Aligning data > > > *** Error in `gmm-acc-stats-ali': free(): > corrupted > > > unsorted chunks: 0x0000000001e10e60 *** > > > ======= Backtrace: ========= > > > /lib64/libc.so.6[0x367887d0b8] > > > > > gmm-acc-stats-ali(_ZN5kaldi6VectorIfE7DestroyEv+0x27)[0x59f127] > > > > > gmm-acc-stats-ali(_ZN5kaldi6VectorIfED1Ev+0x19)[0x4da151] > > > > > > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm14LogLikelihoodsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x1e6)[0x4fc156] > > > > > > gmm-acc-stats-ali(_ZNK5kaldi7DiagGmm19ComponentPosteriorsERKNS_10VectorBaseIfEEPNS_6VectorIfEE+0x10a)[0x4fc946] > > > > > > gmm-acc-stats-ali(_ZN5kaldi12AccumDiagGmm18AccumulateFromDiagERKNS_7DiagGmmERKNS_10VectorBaseIfEEf+0x118)[0x507410] > > > > > > gmm-acc-stats-ali(_ZN5kaldi14AccumAmDiagGmm16AccumulateForGmmERKNS_9AmDiagGmmERKNS_10VectorBaseIfEEif+0x9e)[0x4f2ec0] > > > gmm-acc-stats-ali(main+0x56c)[0x4d7aec] > > > > > /lib64/libc.so.6(__libc_start_main+0xf5)[0x3678821b45] > > > gmm-acc-stats-ali[0x4d74b9] > > > > > > Can you please tell me what went wrong here? > > > > > > Thank you so much! > > > > > > Zibo > > > > > > > > > > > > On Fri, Jul 11, 2014 at 11:24 AM, Zibo Meng > > > <mzb...@gm... <mailto:mzb...@gm...> > > <mailto:mzb...@gm... <mailto:mzb...@gm...>>> wrote: > > > > > > Hi, > > > > > > I got another problem. > > > > > > When I tried make_mfcc.sh to create the > feats.scp > > > files it did not work. > > > > > > I checked the log file where it said some > > thing like: > > > > > > compute-mfcc-feats --verbose=2 > > > --config=conf/mfcc.conf > > > scp,p:exp/make_mfcc/train/wav_data.1.scp ark:- > > > ERROR > > (compute-mfcc-feats:Read():wave-reader.cc:144) > > > WaveData: can read only PCM data, > audio_format is > > > not 1: 65534 > > > WARNING > > > > > (compute-mfcc-feats:Read():feat/wave-reader.h:148) > > > Exception caught in WaveHolder object > (reading). > > > WARNING > > > > > (compute-mfcc-feats:LoadCurrent():util/kaldi-table-inl.h:232) > > > TableReader: failed to load object from > > 'test.wav' > > > > > > Then I checked the attributes of my test.wav > file > > > which were as follows: > > > Input File : 'test.wav' > > > Channels : 1 > > > Sample Rate : 48000 > > > Precision : 24-bit > > > Duration : 00:03:30.09 = 10084224 > samples ~ > > > 15756.6 CDDA sectors > > > File Size : 30.3M > > > Bit Rate : 1.15M > > > Sample Encoding: 24-bit Signed Integer PCM > > > > > > Can you tell me what should I modify to my > audio > > > files. Thank you so much! > > > > > > Best, > > > > > > Zibo > > > > > > > > > > > > On Thu, Jul 10, 2014 at 3:37 PM, Zibo Meng > > > <mzb...@gm... > > <mailto:mzb...@gm...> <mailto:mzb...@gm... > > <mailto:mzb...@gm...>>> wrote: > > > > > > Hi, > > > > > > I am preparing the data for dnn training > > using > > > my own data set. I followed the > > instruction on > > > http://kaldi.sourceforge.net/data_prep.html. > > > > > > I created the file "text" as the first 3 > > lines: > > > S002-U-000300-000470 OH > > > S002-U-000470-000630 I'D > > > S002-U-000630-000870 LIKE > > > > > > the wav.scp file: > > > S002-U <path to the corresponding wav > file> > > > S002-O <path to the corresponding wav > file> > > > S003-U <path to the corresponding wav > file> > > > > > > and the utt2spk file: > > > S002-U-000300-000470 002-U > > > S002-U-000470-000630 002-U > > > S002-U-000630-000870 002-U > > > > > > Then I used utt2spk_to_spk2utt.pl > > <http://utt2spk_to_spk2utt.pl> > > > <http://utt2spk_to_spk2utt.pl> to create > the > > > spk2utt file. Everything went well until > > I tried > > > to use the mak_mfcc.sh to create the > > feats.scp > > > file where I got the error message like: > > > > > > utils/validate_data_dir.sh: file > > data/utt2spk is > > > not in sorted order or has duplicates > > > > > > seems like my utt2spk file could not pass > > > through the validation. > > > > > > Can any body help me out of here? Thank > > you so much. > > > > > > Best, > > > > > > Zibo > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Want fast and easy access to all the code in your > > > enterprise? Index and > > > search up to 200,000 lines of code with a free > > copy of > > > Black Duck > > > Code Sight - the same software that powers the > > world's > > > largest code > > > search on Ohloh, the Black Duck Open Hub! Try it > now. > > > http://p.sf.net/sfu/bds > > > > > > _______________________________________________ > > > Kaldi-users mailing list > > > Kal...@li... > > <mailto:Kal...@li...> > > > <mailto:Kal...@li... > > <mailto:Kal...@li...>> > > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Want fast and easy access to all the code in your > enterprise? > > > Index and > > > search up to 200,000 lines of code with a free copy of > > Black Duck > > > Code Sight - the same software that powers the world's > > largest code > > > search on Ohloh, the Black Duck Open Hub! Try it now. > > > http://p.sf.net/sfu/bds > > > _______________________________________________ > > > Kaldi-users mailing list > > > Kal...@li... > > <mailto:Kal...@li...> > > > <mailto:Kal...@li... > > <mailto:Kal...@li...>> > > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Want fast and easy access to all the code in your enterprise? > > Index and > > > search up to 200,000 lines of code with a free copy of Black Duck > > > Code Sight - the same software that powers the world's largest > code > > > search on Ohloh, the Black Duck Open Hub! Try it now. > > > http://p.sf.net/sfu/bds > > > > > > > > > > > > _______________________________________________ > > > Kaldi-users mailing list > > > Kal...@li... > > <mailto:Kal...@li...> > > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > > > > > ------------------------------------------------------------------------------ > > Want fast and easy access to all the code in your enterprise? Index > and > > search up to 200,000 lines of code with a free copy of Black Duck > > Code Sight - the same software that powers the world's largest code > > search on Ohloh, the Black Duck Open Hub! Try it now. > > http://p.sf.net/sfu/bds > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > <mailto:Kal...@li...> > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > > > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |