CMU Sphinx / Forums / Help: Accuracy for telephone call data

I have around 500 telephone calls that I need to get transcribed. I checked for the bandwidth and found it to be 8KHz. So I am using the 8KHz models. I am facing some problems.

1)First with cmusphinx-en-us-8khz-5.2 which is a continous model:-
Righ out of the box this model get some part of the transcript right but most of it is completely wrong. So I thought of adapting the model. I started with adding one call to the adaption data. I spli the call into chunks of 10s and created thetranscription to go along with it. I followed the tutorial for model adaptation and did the following:-

sphinx_fe -argfile en-us/feat.params \
        -samprate 16000 -c test.fileids \
       -di . -do . -ei wav -eo mfc -mswav yes

./bw \
 -hmmdir en-us \
 -moddeffn en-us/mdef.txt \
 -ts2cbfn .cont. \
 -feat 1s_c_d_dd \
 -lda en-us/feature_transform \
 -cmn current \
 -agc none \
 -dictfn cmudict-en-us.dict \
 -ctlfn test.fileids \
 -lsnfn test.transcription \
 -accumdir .

cp -a en-us en-us-adapt

./map_adapt \
    -moddeffn en-us/mdef.txt \
    -ts2cbfn .cont. \
    -meanfn en-us/means \
    -varfn en-us/variances \
    -mixwfn en-us/mixture_weights \
    -tmatfn en-us/transition_matrices \
    -accumdir . \
    -mapmeanfn en-us-adapt/means \
    -mapvarfn en-us-adapt/variances \
    -mapmixwfn en-us-adapt/mixture_weights \
    -maptmatfn en-us-adapt/transition_matrices

./mk_s2sendump \
    -pocketsphinx yes \
    -moddeffn en-us-adapt/mdef \
    -mixwfn en-us-adapt/mixture_weights \
    -sendumpfn en-us-adapt/sendump

Now I pointed the hmm to the new adapted directory. Now the generated transcript is completely wrong. I tested the same file I trained the model on. It transcribes something about war an all where as the audio had no mention of it. Can you tell me if I am doing something wrong or if my understanding is wrong. Since these are patient calls it would be difficult for me to share it here but for sample I have the following type of audio in my training data:-

~~and so your pharmacy where you get it filled out that they can make duplicate labels if necessary so that you can have a prescription label on everything and then your sharps container should be~~ (chunk43)
https://drive.google.com/file/d/1zLx8Tl5EN3aYtLwFSIW01HI3Rm3S8whr/view?usp=sharing (sample train file in google drive).

Please let me know what I am doing wrong. As far as I know I did not get any error during the model adaption.

2) Second issue is when I try to use the ptm model:-
cmusphinx-en-us-ptm-5.2 I get he following error:-

INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from Train/en-us-adapt/feat.params
Current configuration:
[NAME]                  [DEFLT]         [VALUE]
-agc                    none            none
-agcthresh              2.0             2.000000e+00
-allphone
-allphone_ci            no              no
-alpha                  0.97            9.700000e-01
-ascale                 20.0            2.000000e+01
-aw                     1               1
-backtrace              no              no
-beam                   1e-48           1.000000e-48
-bestpath               yes             yes
-bestpathlw             9.5             9.500000e+00
-ceplen                 13              13
-cmn                    live            current
-cmninit                40,3,-1         40,3,-1
-compallsen             no              no
-debug                                  0
-dict                                   Train/cmudict-en-us.dict
-dictcase               no              no
-dither                 no              no
-doublebw               no              no
-ds                     1               1
-fdict
-feat                   1s_c_d_dd       1s_c_d_dd
-featparams
-fillprob               1e-8            1.000000e-08
-frate                  100             100
-fsg
-fsgusealtpron          yes             yes
-fsgusefiller           yes             yes
-fwdflat                yes             yes
-fwdflatbeam            1e-64           1.000000e-64
-fwdflatefwid           4               4
-fwdflatlw              8.5             8.500000e+00
-fwdflatsfwin           25              25
-fwdflatwbeam           7e-29           7.000000e-29
-fwdtree                yes             yes
-hmm                                    Train/en-us-adapt
-input_endian           little          little
-jsgf
-keyphrase
-kws
-kws_delay              10              10
-kws_plp                1e-1            1.000000e-01
-kws_threshold          1               1.000000e+00
-latsize                5000            5000
-lda
-ldadim                 0               0
-lifter                 0               22
-lm                                     pocketsphinx/model/en-us/en-us.lm.bin
-lmctl
-lmname
-logbase                1.0001          1.000100e+00
-logfn
-logspec                no              no
-lowerf                 133.33334       1.300000e+02
-lpbeam                 1e-40           1.000000e-40
-lponlybeam             7e-29           7.000000e-29
-lw                     6.5             6.500000e+00
-maxhmmpf               30000           30000
-maxwpf                 -1              -1
-mdef
-mean
-mfclogdir
-min_endfr              0               0
-mixw
-mixwfloor              0.0000001       1.000000e-07
-mllr
-mmap                   yes             yes
-ncep                   13              13
-nfft                   512             512
-nfilt                  40              20
-nwpen                  1.0             1.000000e+00
-pbeam                  1e-48           1.000000e-48
-pip                    1.0             1.000000e+00
-pl_beam                1e-10           1.000000e-10
-pl_pbeam               1e-10           1.000000e-10
-pl_pip                 1.0             1.000000e+00
-pl_weight              3.0             3.000000e+00
-pl_window              5               5
-rawlogdir
-remove_dc              no              no
-remove_noise           yes             yes
-remove_silence         yes             yes
-round_filters          yes             yes
-samprate               16000           1.600000e+04
-seed                   -1              -1
-sendump
-senlogdir
-senmgau
-silprob                0.005           5.000000e-03
-smoothspec             no              no
-svspec                                 0-12/13-25/26-38
-tmat
-tmatfloor              0.0001          1.000000e-04
-topn                   4               4
-topn_beam              0               0
-toprule
-transform              legacy          dct
-unit_area              yes             yes
-upperf                 6855.4976       3.700000e+03
-uw                     1.0             1.000000e+00
-vad_postspeech         50              50
-vad_prespeech          20              20
-vad_startspeech        10              10
-vad_threshold          2.0             2.000000e+00
-var
-varfloor               0.0001          1.000000e-04
-varnorm                no              no
-verbose                no              no
-warp_params
-warp_type              inverse_linear  inverse_linear
-wbeam                  7e-29           7.000000e-29
-wip                    0.65            6.500000e-01
-wlen                   0.025625        2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: acmod.c(156): Reading linear feature transformation from Train/en-us-adapt/feature_transform
INFO: acmod.c(166): Using subvector specification 0-12/13-25/26-38
ERROR: "feat.c", line 311: Total dimensionality of subvector specification 39 > feature dimensionality 36
Traceback (most recent call last):
  File "work.py", line 19, in <module>
    decoder = Decoder(config)
  File "/usr/lib64/python2.7/site-packages/pocketsphinx/pocketsphinx.py", line 226, in __init__
    this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1

Any help is highly appreciated.

I have around 500 telephone calls that I need to get transcribed.

For 500 calls it is easier to use the commercial service.

I checked for the bandwidth and found it to be 8KHz. So I am using the 8KHz models. I am facing some problems.

Actually it is about 12khz, higher than 8. I also doubt your phones were recorded on the phone.

Righ out of the box this model get some part of the transcript right but most of it is completely wrong.

Our tutorial recommends to compute the word error rate instead.

Can you tell me if I am doing something wrong or if my understanding is wrong.

First of all you need to get audio of better quality. The one you have provided is never going to work just because audio is bad.

Then you can try more advanced toolkits like Kaldi, but they will not probably work out of box. To get reasonable results you will have to train the model.

Second issue is when I try to use the ptm model: cmusphinx-en-us-ptm-5.2 I get he following error:-

You forgot to cleanup the feature_tranform file from the previous model when you unpacked the new one. In any case, ptm model is less accurate.

Accuracy for telephone call data

Speech Recognition Toolkit

Forums

Help

Accuracy for telephone call data document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Accuracy for telephone call data