Menu

Attempting to use telephone model, get error Upper frequency 6800.0 is higher than samprate/2 (4000.0)

Help
Eric Bunch
2016-03-08
2016-03-08
  • Eric Bunch

    Eric Bunch - 2016-03-08

    Hello,

    I'm trying to use pocketsphinx to transcribe some recorded phone calls, and running into some trouble, and am hoping for some help. I have an audio file out.wav that I beleive meets the format criteria: the output of sox --i out.wav is

    Input File : 'out.wav' Channels : 1 Sample Rate : 8000 Precision : 16-bit Duration : 00:00:48.04 = 384320 samples ~ 3603 CDDA sectors File Size : 769k Bit Rate : 128k Sample Encoding: 16-bit Signed Integer PCM.

    I am using pocketsphinx-python, with the following script

    `
    modeldir = "/Users/eric.bunch/psphinx/pocketsphinx/model"

    config = Decoder.default_config()
    config.set_string('-hmm', path.join(modeldir, 'en-us/en-us'))
    config.set_string('-lm', path.join(modeldir, 'en-us/en-us-phone.lm.bin'))
    config.set_string('-dict', path.join(modeldir, 'en-us/cmudict-en-us.dict'))
    config.set_float("-samprate", 8000.0)

    decoder = Decoder(config)
    decoder.start_utt()

    stream = open(path.join("/Users/eric.bunch/Downloads", 'out.wav'), 'rb')
    while True:
    buf = stream.read(4096)
    if buf:
    decoder.process_raw(buf, False, False)
    else:
    break
    decoder.end_utt()
    hypothesis = decoder.hyp()

    print hypothesis.hypstr`

    My current configuration is as follows:

    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 40,3,-1
    -compallsen no no
    -debug 0
    -dict /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/cmudict-en-us.dict
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/en-us/noisedict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/en-us/feat.params
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/en-us
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 22
    -lm /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/en-us-phone.lm.bin
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.300000e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/en-us/mdef
    -mean /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/en-us/means
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 8.000000e+03
    -seed -1 -1
    -sendump /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/en-us/sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -tmat /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/en-us/transition_matrices
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 50
    -vad_prespeech 20 20
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var /Users/eric.bunch/psphinx/pocketsphinx/model/en-us/en-us/variances
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02.

    The error that I am receiving is

    ERROR: "fe_interface.c", line 274: Upper frequency 6800.0 is higher than samprate/2 (4000.0).

    Any help is much appreciated.

    Best,
    Eric

     

    Last edit: Eric Bunch 2016-03-08
    • Nickolay V. Shmyrev

      en-us provided with pocketsphinx is 16khz model. 8khz model is available in downloads, you need to use it instead.

       
  • Eric Bunch

    Eric Bunch - 2016-03-08

    Thank you, that's working. I do have a follow-up question because I think I may be misunderstanding something. My script is now

    modeldir = "/Users/eric.bunch/psphinx/pocketsphinx/model"

    config = Decoder.default_config()
    config.set_string('-hmm', '/Users/eric.bunch/Downloads/en-us-8khz/')
    config.set_string('-lm', path.join(modeldir, 'en-us/en-us.lm.bin'))
    config.set_string('-dict', path.join(modeldir, 'en-us/cmudict-en-us.dict'))
    config.set_float("-samprate", 8000.0)

    decoder = Decoder(config)
    decoder.start_utt()

    stream = open(path.join("/Users/eric.bunch/Downloads", 'out.wav'), 'rb')
    while True:
    buf = stream.read(4096)
    if buf:
    decoder.process_raw(buf, False, False)
    else:
    break
    decoder.end_utt()
    hypothesis = decoder.hyp()

    print hypothesis.hypstr

    But it gives no output when I use the language model en-us-phone.lm.bin, which was unexpected for me. Is en-us-phone.lm.bin not the language model trained on phone conversations, or am I mistaken?

     
    • Nickolay V. Shmyrev

      Is en-us-phone.lm.bin not the language model trained on phone conversations, or am I mistaken?

      It is a model for phonetic recognizer.

       
  • Eric Bunch

    Eric Bunch - 2016-03-08

    Ah, ok. Thank you very much!

     

Log in to post a comment.