the usual input for pocketsphinx is audio file and the audio information is
converted to MFCC format internally. Now I want to use pocketsphinx to process
MFCC format as input directly and I have found a function in the file list of
pocketsphinx:
POCKETSPHINX_EXPORT int ps_process_cep ( ps_decoder_t * ps,
mfcc_t ** data,
int n_frames,
int no_search,
int full_utt
)
The problem I have is that how can I match the MFCC information in a file(come
from DSR extraction) with mfcc_t?
In other words, how can I read the MFCC information from a file to mfcc_t?
I have tried the following:
include "pocketsphinx.h"
include <stdio.h></stdio.h>
include <stdlib.h></stdlib.h>
include <unistd.h></unistd.h>
include <fcntl.h></fcntl.h>
include <string.h></string.h>
include "config.h"
int main(int argc, char *argv)
{
if (argc<2)
{
printf("usage: cmd example1.wav example2.wav ...\n");
return 0;
}
while(1)
{
if(access(argv,00)==0)
{
printf("Wav cfile is: %s\n", argv);
FILE *file=fopen(FILE_SPHINX_OUTPUT,"a+");
if (file==NULL){printf("file open error!\n");return;}
Thanks for help me with the problem and I have tried your method but it
still....
1
+
Segmentation fault
And my code
#include "pocketsphinx.h"#include <stdio.h>#include <stdlib.h>#include <unistd.h>#include <fcntl.h>#include <string.h>#include "config.h"intmain(intargc,char*argv[]){if(argc<2){printf("usage: cmd example1.wav example2.wav ...\n");return0;}while(1){if(access(argv[1],00)==0){printf("Wav cfile is: %s\n",argv[1]);FILE*file=fopen(FILE_SPHINX_OUTPUT,"a+");if(file==NULL){printf("file open error!\n");return;}ps_decoder_t*ps;cmd_ln_t*config;FILE*fh;charconst*hyp,*uttid;//float32buf[512];mfcc_t*buf;intrv;int32score;intn_frames=40;buf=(mfcc_t*)malloc(sizeof(mfcc_t)*512);if(strcmp(argv[2],"num")==0){config=cmd_ln_init(NULL,ps_args(),TRUE,"-lm",FILE_DIC_LM_NUM,"-dict",FILE_DIC_DIC_NUM,NULL);}else{config=cmd_ln_init(NULL,ps_args(),TRUE,"-lm",FILE_DIC_LM,"-dict",FILE_DIC_DIC,NULL);}if(config==NULL)return1;ps=ps_init(config);if(ps==NULL)return1;fh=fopen(argv[1],"rb");if(fh==NULL){perror("Failed to open wav file!\n");return1;}//rv=ps_decode_raw(ps,fh,"sleep",-1);fseek(fh,0,SEEK_SET);rv=ps_start_utt(ps,"goforward");if(rv<0)return1;printf("1\n");while(!feof(fh)){size_tnsamp;nsamp=fread(buf,sizeof(mfcc_t),n_frames*13,fh);printf("+\n");rv=ps_process_cep(ps,&buf,n_frames,FALSE,FALSE);}printf("2\n");rv=ps_end_utt(ps);if(rv<0)return1;hyp=ps_get_hyp(ps,&score,&uttid);if(hyp==NULL)return1;printf("Recognized: %s\n",hyp);fclose(file);return0;}}return0;}
complete information that are displayed
smw@ubuntu:~/work/beagle/backends$ ./sphinx_handler
./my_wav/dsr_process_output.ftr_o num
Wav cfile is: ./my_wav/dsr_process_output.ftr_o
INFO: cmd_ln.c(512): Parsing command line:
\
-lm /home/smw/work/beagle/backends/dic/6185.lm \
-dict /home/smw/work/beagle/backends/dic/6185.dic
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/smw/work/beagle/backends/dic/6185.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /home/smw/work/beagle/backends/dic/6185.lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
Please note that you are reading MFC files, not MFCC files, MFC files only
contain cepstrum and not delta features. That makes them different from MFCC
files from HTK which usually contain features. Please also note that feature
extraction should match with the model feature extraction parameters.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have found any information about MFC files? Is that Microsoft Foundation
Classes or something else?
MFC files (Mel Frequency Cepstrum) store mel-frequency cepstrum values in
CMUSphinx. They can be created from wav files using sphinx_fe tool from
sphinxbase. They are raw sequence of floats with a 4-bytes header which
encodes number of floats in a file. Usually MFC files have extension .mfc.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But I need to know the definition of .mfc file, could you tell me where I can
find it?
Maybe I have to write a code to change the format to match the output of DSR.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the usual input for pocketsphinx is audio file and the audio information is
converted to MFCC format internally. Now I want to use pocketsphinx to process
MFCC format as input directly and I have found a function in the file list of
pocketsphinx:
POCKETSPHINX_EXPORT int ps_process_cep ( ps_decoder_t * ps,
mfcc_t ** data,
int n_frames,
int no_search,
int full_utt
)
The problem I have is that how can I match the MFCC information in a file(come
from DSR extraction) with mfcc_t?
In other words, how can I read the MFCC information from a file to mfcc_t?
I have tried the following:
include "pocketsphinx.h"
include <stdio.h></stdio.h>
include <stdlib.h></stdlib.h>
include <unistd.h></unistd.h>
include <fcntl.h></fcntl.h>
include <string.h></string.h>
include "config.h"
int main(int argc, char *argv)
{
if (argc<2)
{
printf("usage: cmd example1.wav example2.wav ...\n");
return 0;
}
while(1)
{
if(access(argv,00)==0)
{
printf("Wav cfile is: %s\n", argv);
FILE *file=fopen(FILE_SPHINX_OUTPUT,"a+");
if (file==NULL){printf("file open error!\n");return;}
ps_decoder_t ps;
cmd_ln_t config;
FILE fh;
char const hyp, uttid;
mfcc_t buf;
int rv;
int32 score;
buf = (mfcc_t)malloc(sizeof(mfcc_t)512);
if(strcmp(argv,"num")==0){
config = cmd_ln_init(NULL, ps_args(), TRUE,
"-lm", FILE_DIC_LM_NUM,
"-dict", FILE_DIC_DIC_NUM,
NULL);
}
else{
config = cmd_ln_init(NULL, ps_args(), TRUE,
"-lm", FILE_DIC_LM,
"-dict", FILE_DIC_DIC,
NULL);
}
if (config == NULL)
return 1;
ps = ps_init(config);
if (ps == NULL)
return 1;
fh = fopen(argv, "rb");
if (fh == NULL)
{
perror("Failed to open wav file!\n");
return 1;
}
//rv = ps_decode_raw(ps, fh, "sleep", -1);
fseek(fh, 0, SEEK_SET);
rv = ps_start_utt(ps, "goforward");
if (rv < 0)
return 1;
printf("1\n");
while (!feof(fh)) {
size_t nsamp;
//nsamp = fread(buf, sizeof(int), 512, fh);
nsamp = fread(buf, sizeof(mfcc_t), 512, fh);
printf("+\n");
//rv = ps_process_cep(ps, (mfcc_t **)(&buf), nsamp, FALSE, FALSE);
rv = ps_process_cep(ps, &buf, nsamp, FALSE, FALSE);
}
printf("2\n");
rv = ps_end_utt(ps);
if (rv < 0)
return 1;
hyp = ps_get_hyp(ps, &score, &uttid);
if (hyp == NULL)
return 1;
printf("Recognized: %s\n", hyp);
}
but it shows:
1
+
segmentation fault
n_frames here is a number of frames in mfcc buffer. Each frame is of ceplen
(13 by default) floats. So your code must be
Thanks for help me with the problem and I have tried your method but it
still....
1
+
Segmentation fault
And my code
complete information that are displayed
smw@ubuntu:~/work/beagle/backends$ ./sphinx_handler
./my_wav/dsr_process_output.ftr_o num
Wav cfile is: ./my_wav/dsr_process_output.ftr_o
INFO: cmd_ln.c(512): Parsing command line:
\
-lm /home/smw/work/beagle/backends/dic/6185.lm \
-dict /home/smw/work/beagle/backends/dic/6185.dic
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/smw/work/beagle/backends/dic/6185.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /home/smw/work/beagle/backends/dic/6185.lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(512): Parsing command line:
\
-nfilt 20 \
-lowerf 1 \
-upperf 4000 \
-wlen 0.025 \
-transform dct \
-round_filters no \
-remove_dc yes \
-svspec 0-12/13-25/26-38 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-cmninit 56,-3,1 \
-varnorm no
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 56,-3,1
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(848): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(520): Reading model definition:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef
file
INFO: bin_mdef.c(330): Reading binary model definition:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(508): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150
CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/sha
re/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
256x13 256x13 256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
256x13 256x13 256x13
INFO: ms_gauden.c(356): 0 variance values floored
INFO: s2_semi_mgau.c(897): Loading senones from dump file
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(921): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1016): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1293): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(294): Allocating 4148 * 20 bytes (81 KiB) for word entries
INFO: dict.c(306): Reading main dictionary:
/home/smw/work/beagle/backends/dic/6185.dic
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(309): 41 words read
INFO: dict.c(314): Reading filler dictionary:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 50^3 * 2 bytes (244 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(476): ngrams 1=32, 2=60, 3=30
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(515): 32 = #unigrams created
INFO: ngram_model_arpa.c(194): Reading bigrams
INFO: ngram_model_arpa.c(531): 60 = #bigrams created
INFO: ngram_model_arpa.c(532): 4 = #prob2 entries
INFO: ngram_model_arpa.c(539): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(291): Reading trigrams
INFO: ngram_model_arpa.c(552): 30 = #trigrams created
INFO: ngram_model_arpa.c(553): 2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 34 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 13 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 13
single-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 190
INFO: ngram_search_fwdtree.c(333): after: 34 root, 62 non-root channels, 12
single-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
1
+
Segmentation fault
I don't know why and how to solve the problem.
Thanks again
You need to allocate buffer appropriate, this part if obviosly wrong:
The proper allocation would be:
You can find a working example here:
http://dl.dropbox.com/u/26073448/test-process-
cep.tar.gz
Please note that you are reading MFC files, not MFCC files, MFC files only
contain cepstrum and not delta features. That makes them different from MFCC
files from HTK which usually contain features. Please also note that feature
extraction should match with the model feature extraction parameters.
Thanks a lot.
I have known where the problem is. The input data are from the front-end of my
project and the output is in the form of MFCC,
But I don't know what is MFC files? I have found any information about MFC
files? Is that Microsoft Foundation Classes or something else?
MFC files (Mel Frequency Cepstrum) store mel-frequency cepstrum values in
CMUSphinx. They can be created from wav files using sphinx_fe tool from
sphinxbase. They are raw sequence of floats with a 4-bytes header which
encodes number of floats in a file. Usually MFC files have extension .mfc.
Thanks again.
But I need to know the definition of .mfc file, could you tell me where I can
find it?
Maybe I have to write a code to change the format to match the output of DSR.
Sorry, I'm not sure what are you asking about. If you are interested in file
format, you can find it in this thread in the previous reply.