Hi, I use pocketsphinx_continuous to decode long audio files. I run the
following command to decode 16 minute audio file with 16K sampling rate:
pocketsphinx_continuous -hmm /mnt/asr/ar-model/am/ar-ipsos -dict /mnt/asr/ar-
model/lm/ar-ipsos/arabic.dic -lm /mnt/asr/ar-model/lm/ar-
ipsos/arabic.ug.lm.DMP -adcin yes -samprate 16000 -hypseg aligned -infile
out.tmp.ready.wav
Then, it aborted with the following error:
...
INFO: ms_gauden.c(292): 6105 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /mnt/asr/ar-
model/am/ar-ipsos/variances
INFO: ms_gauden.c(292): 6105 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16x39
INFO: ms_gauden.c(354): 92 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights: /mnt/asr/ar-model/am
/ar-ipsos/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 6105 senones: 1 features x 16
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(122): The value of topn: 4
INFO: dict.c(306): Allocating 111223 * 32 bytes (3475 KiB) for word entries
INFO: dict.c(321): Reading main dictionary: /mnt/asr/ar-model/lm/ar-
ipsos/arabic.dic
INFO: dict.c(212): Allocated 983 KiB for strings, 2022 KiB for phones
INFO: dict.c(324): 107124 words read
INFO: dict.c(330): Reading filler dictionary: /mnt/asr/ar-model/am/ar-
ipsos/noisedict
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(333): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 35^3 * 2 bytes (83 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 29680 bytes (28 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 29680 bytes (28 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=65530, 2=564123, 3=867799
INFO: ngram_model_dmp.c(242): 65530 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(291): 564123 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(317): 867799 = LM.trigrams read
INFO: ngram_model_dmp.c(342): 8396 = LM.prob2 entries read
INFO: ngram_model_dmp.c(362): 8781 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(382): 3170 = LM.prob3 entries read
INFO: ngram_model_dmp.c(410): 1102 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 65530 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 543 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 14 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 14
single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 235750
INFO: ngram_search_fwdtree.c(338): after: 454 root, 235622 non-root channels,
12 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(367): pocketsphinx_continuous COMPILED ON: Sep 20 2011, AT:
11:20:36
INFO: ngram_search.c(466): Resized backpointer table to 10000 entries
INFO: ngram_search.c(474): Resized score stack to 200000 entries
INFO: ngram_search.c(466): Resized backpointer table to 20000 entries
INFO: ngram_search.c(474): Resized score stack to 400000 entries
INFO: ngram_search.c(466): Resized backpointer table to 40000 entries
INFO: ngram_search.c(474): Resized score stack to 800000 entries
INFO: ngram_search.c(466): Resized backpointer table to 80000 entries
INFO: ngram_search_fwdtree.c(951): cand_sf increased to 64 entries
INFO: ngram_search.c(474): Resized score stack to 1600000 entries
INFO: ngram_search.c(466): Resized backpointer table to 160000 entries
INFO: ngram_search.c(474): Resized score stack to 3200000 entries
INFO: ngram_search.c(466): Resized backpointer table to 320000 entries
INFO: ngram_search.c(474): Resized score stack to 6400000 entries
pocketsphinx_continuous: feat.c:362: feat_array_alloc: Assertion `nfr > 0'
failed.
Aborted
How can I solve this problem?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the problem was because of the audio file. now, it runs and terminates but I
have another problem now. I was using hypseg parameter with pocketsphinx_batch
to obtain the start frames of the words. Now even if I pass this argument to
pocketsphinx_continuous, it does not provide any output. -hyp parameter does
not work as well. It prints output to the standard output without start frames
which is useless for me. How can I get start frames then? Thanks for your kind
support.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Since I need the timings, then I guess I have to use pocketsphinx_batch. How
long audio files can pocketsphinx_batch can decode at most? I will be
splitting the audio files if their length exceed this upper bound.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I use pocketsphinx_continuous to decode long audio files. I run the
following command to decode 16 minute audio file with 16K sampling rate:
pocketsphinx_continuous -hmm /mnt/asr/ar-model/am/ar-ipsos -dict /mnt/asr/ar-
model/lm/ar-ipsos/arabic.dic -lm /mnt/asr/ar-model/lm/ar-
ipsos/arabic.ug.lm.DMP -adcin yes -samprate 16000 -hypseg aligned -infile
out.tmp.ready.wav
Then, it aborted with the following error:
...
INFO: ms_gauden.c(292): 6105 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /mnt/asr/ar-
model/am/ar-ipsos/variances
INFO: ms_gauden.c(292): 6105 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16x39
INFO: ms_gauden.c(354): 92 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights: /mnt/asr/ar-model/am
/ar-ipsos/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 6105 senones: 1 features x 16
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(122): The value of topn: 4
INFO: dict.c(306): Allocating 111223 * 32 bytes (3475 KiB) for word entries
INFO: dict.c(321): Reading main dictionary: /mnt/asr/ar-model/lm/ar-
ipsos/arabic.dic
INFO: dict.c(212): Allocated 983 KiB for strings, 2022 KiB for phones
INFO: dict.c(324): 107124 words read
INFO: dict.c(330): Reading filler dictionary: /mnt/asr/ar-model/am/ar-
ipsos/noisedict
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(333): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 35^3 * 2 bytes (83 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 29680 bytes (28 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 29680 bytes (28 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=65530, 2=564123, 3=867799
INFO: ngram_model_dmp.c(242): 65530 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(291): 564123 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(317): 867799 = LM.trigrams read
INFO: ngram_model_dmp.c(342): 8396 = LM.prob2 entries read
INFO: ngram_model_dmp.c(362): 8781 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(382): 3170 = LM.prob3 entries read
INFO: ngram_model_dmp.c(410): 1102 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 65530 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 543 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 14 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 14
single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 235750
INFO: ngram_search_fwdtree.c(338): after: 454 root, 235622 non-root channels,
12 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(367): pocketsphinx_continuous COMPILED ON: Sep 20 2011, AT:
11:20:36
INFO: ngram_search.c(466): Resized backpointer table to 10000 entries
INFO: ngram_search.c(474): Resized score stack to 200000 entries
INFO: ngram_search.c(466): Resized backpointer table to 20000 entries
INFO: ngram_search.c(474): Resized score stack to 400000 entries
INFO: ngram_search.c(466): Resized backpointer table to 40000 entries
INFO: ngram_search.c(474): Resized score stack to 800000 entries
INFO: ngram_search.c(466): Resized backpointer table to 80000 entries
INFO: ngram_search_fwdtree.c(951): cand_sf increased to 64 entries
INFO: ngram_search.c(474): Resized score stack to 1600000 entries
INFO: ngram_search.c(466): Resized backpointer table to 160000 entries
INFO: ngram_search.c(474): Resized score stack to 3200000 entries
INFO: ngram_search.c(466): Resized backpointer table to 320000 entries
INFO: ngram_search.c(474): Resized score stack to 6400000 entries
pocketsphinx_continuous: feat.c:362: feat_array_alloc: Assertion `nfr > 0'
failed.
Aborted
How can I solve this problem?
Please share the file
the sizes of model files are too large to be uploaded.
Hello
I only need the audio file, I do not need the models.
the problem was because of the audio file. now, it runs and terminates but I
have another problem now. I was using hypseg parameter with pocketsphinx_batch
to obtain the start frames of the words. Now even if I pass this argument to
pocketsphinx_continuous, it does not provide any output. -hyp parameter does
not work as well. It prints output to the standard output without start frames
which is useless for me. How can I get start frames then? Thanks for your kind
support.
This feature is not supported.
Since I need the timings, then I guess I have to use pocketsphinx_batch. How
long audio files can pocketsphinx_batch can decode at most? I will be
splitting the audio files if their length exceed this upper bound.
sorry for consuming your time, I should have searched more:)