Menu

pocketsphinx mdef is no binary

Help
nevo
2012-05-21
2012-09-22
  • nevo

    nevo - 2012-05-21

    Hi

    my tries to get pocketsphinx to work with a german language and acoustic model
    are currently blocked. It seems like my mdef is not a binary and therefore not
    read properly. I get error-messages claiming that the phones aren't existing
    in the am. But as i can clearly read it, they are all there, so i guess it's
    obviously the binary thing. I did the training on a virtual linux system, with
    no access to audio input, so proper testing right there is slightly
    impossible.

    Actually my Question is this:
    Why is my mdef not binary? Did i mess up some configs during the training? Is
    it a result of the virtual linux? Is this in any way connected with my macbook
    that displays all AM-files as executables instead of documents (like the files
    from an example project)?

    thanks

     
  • Nickolay V. Shmyrev

    Why is my mdef not binary?

    Training produces text file. The information about binary file is just an
    information, it's not even a warning. It's fine to have a text mdef file.

    Did i mess up some configs during the training?

    Most likely you messed input data

    Is it a result of the virtual linux? Is this in any way connected with my
    macbook that displays all AM-files as executables instead of documents (like
    the files from an example project)?

    No, it's not connected

    If the message says that you don't have specific phones in your acoustic model
    it's exactly the case. You need to look in your dictionary in order to find
    which phones are you missing. Most common confusion is uppercase/lowercase
    issue, it's most likely the issue you have.

    You always have an option to provide your training folder in order to get more
    detailed help. For more information please see the acoustic model training
    tutorial

    http://cmusphinx.sourceforge.net/wiki/tutorialam

     
  • nevo

    nevo - 2012-05-21

    Thanks for the quick answer. But I'm pretty sure the phones are there, which
    is why I thought theres something wrong with the file itself...

    This is the kind of message I get

    ERROR: "dict.c", line 193: Line 1: Phone 'qq' is mising in the acoustic
    model; word 'ADRESSE' ignored

    This is an extraction of the mdef

    qq - - - n/a 28 140 141 142 143 144 N

    a qq b i n/a 2 232 238 240 244 249 N
    a qq d i n/a 2 232 238 240 244 250 N
    a qq g i n/a 2 232 238 240 244 249 N
    a qq k b n/a 2 232 238 240 244 250 N
    a qq k i n/a 2 232 238 240 244 250 N
    a qq l i n/a 2 232 238 242 246 248 N
    a qq m i n/a 2 232 238 242 246 247 N
    a qq n i n/a 2 232 238 242 246 247 N
    a qq nn i n/a 2 232 238 242 246 247 N
    a qq p i n/a 2 232 238 240 244 249 N
    a qq r i n/a 2 232 238 240 245 249 N
    a qq t i n/a 2 232 238 240 244 250 N
    a qq x b n/a 2 232 238 240 245 249 N
    a qq x i n/a 2 232 238 240 245 249 N
    a qq z i n/a 2 232 238 240 244 249 N

    I used Voxforges german data to create the AM following the tutorial you
    posted and several others until i found a way that seemed to work for me. I
    even had an acceptable WER, when I did the decoding.
    After the training I stripped the dict and lm to have small, usable size for
    pocketsphinx.
    I'm trying to use the model with the OpenEars library for iOS
    (http://www.politepix.com/openears). It's
    not crashing or anything, pocketsphinx is just not able to recognize anything,
    as I get the upper error message for all words in my dict.

    here is the am, lm and dict if it helps
    http://depositfiles.com/files/iu7n6ifkj

     
  • Nickolay V. Shmyrev

    It seems you didn't configure your pocketsphinx to load new mdef properly. I
    think it tries to use English model instead. Provide a full pocketsphinx log.

    Pocketsphinx works fine with your model here, though there are other small
    issue. For example your feat.params inside the model doesn't contain mandatory
    -feat option. It seems you hand-edited it or currupted it some other way.

     
  • nevo

    nevo - 2012-05-21

    I fear you're right, somehow he isn't eating the files I'm feeding him...
    now I see that there are differences between the parsed feat.params and my
    feat.params. Sorry for not noticing that by myself and thanks for the hint.

    For example your feat.params inside the model doesn't contain mandatory
    -feat option. It seems you hand-edited it or currupted it some other way.

    Yes, I had to. I got errors during the training that this was an unknown
    parameter and i had to remove it from the feat.params. That's what i meant
    when I said, I read several tutorials and I had to find a way that worked for
    me. Maybe I should start from scratch...

     
  • nevo

    nevo - 2012-05-23

    OK, I did a bloody beginners mistake with x-code which made it ignore the hmm-
    files. The mdef file is now being read, but as you already noticed, I was
    missing the -feat parameter, which I had to take out for the training.

    ERROR: "s2_semi_mgau.c", line 1361: Number of streams does not match: 4 != 1

    after adding "-feat s2_4x" in the feat params, It gets to this:

    ERROR: "s2_semi_mgau.c", line 1361: Number of streams does not match: 3 != 4

    so i got from 1 to 4 streams, but now he wants 3?!

    full log: (mixed with the logging of Openears)

    2012-05-23 10:43:52.924 DSF_2012_Research OPENEARSLOGGING: The audio session
    has never been initialized so we will do that now.
    2012-05-23 10:43:52.933 DSF_2012_Research OPENEARSLOGGING: Checking and
    resetting all audio session settings.
    2012-05-23 10:43:52.946 DSF_2012_Research OPENEARSLOGGING: audioCategory is
    incorrect, we will change it.
    2012-05-23 10:43:52.955 DSF_2012_Research OPENEARSLOGGING: audioCategory is
    now on the correct setting of kAudioSessionCategory_PlayAndRecord.
    2012-05-23 10:43:52.962 DSF_2012_Research OPENEARSLOGGING: bluetoothInput is
    incorrect, we will change it.
    2012-05-23 10:43:52.966 DSF_2012_Research OPENEARSLOGGING: bluetooth input is
    now on the correct setting of 1.
    2012-05-23 10:43:52.971 DSF_2012_Research OPENEARSLOGGING:
    categoryDefaultToSpeaker is incorrect, we will change it.
    2012-05-23 10:43:52.975 DSF_2012_Research OPENEARSLOGGING:
    CategoryDefaultToSpeaker is now on the correct setting of 1.
    2012-05-23 10:43:52.979 DSF_2012_Research OPENEARSLOGGING: preferredBufferSize
    is incorrect, we will change it.
    2012-05-23 10:43:52.983 DSF_2012_Research OPENEARSLOGGING: PreferredBufferSize
    is now on the correct setting of 0.128000.
    2012-05-23 10:43:52.987 DSF_2012_Research OPENEARSLOGGING:
    preferredSampleRateCheck is incorrect, we will change it.
    2012-05-23 10:43:52.991 DSF_2012_Research OPENEARSLOGGING: preferred hardware
    sample rate is now on the correct setting of 16000.000000.
    2012-05-23 10:43:53.124 DSF_2012_Research OPENEARSLOGGING: AudioSessionManager
    startAudioSession has reached the end of the initialization.
    2012-05-23 10:43:53.148 DSF_2012_Research OPENEARSLOGGING: Exiting
    startAudioSession.
    2012-05-23 10:43:53.164 DSF_2012_Research OPENEARSLOGGING: Recognition loop
    has started
    INFO: cmd_ln.c(697): Parsing command line:
    \
    -lm /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-D0C48C54AC9B/DSF_2012_Research.app/OpenEars1.languagemodel \
    -dict /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-D0C48C54AC9B/DSF_2012_Research.app/voxforge_de_sphinx.dic \
    -hmm /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-D0C48C54AC9B/DSF_2012_Research.app \
    -lw 6.500000 \
    -maxhmmpf 3000

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -bghist no no
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-D0C48C54AC9B/DSF_2012_Research.app/voxforge_de_sphinx.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-D0C48C54AC9B/DSF_2012_Research.app
    -input_endian little little
    -jsgf
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lextreedump 0 0
    -lifter 0 0
    -lm /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-D0C48C54AC9B/DSF_2012_Research.app/OpenEars1.languagemodel
    -lmctl
    -lmname default default
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf -1 3000
    -maxnewoov 20 20
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-5 1.000000e-05
    -pl_window 0 0
    -rawlogdir
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -usewdphones no no
    -uw 1.0 1.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(697): Parsing command line:
    \
    -alpha 0.97 \
    -dither yes \
    -doublebw no \
    -nfilt 31 \
    -ncep 13 \
    -lowerf 200.00 \
    -upperf 3500.00 \
    -nfft 512 \
    -wlen 0.0256 \
    -transform legacy \
    -samprate 8000.0 \
    -agc none \
    -cmn current \
    -varnorm no \
    -feat s2_4x

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -dither no yes
    -doublebw no no
    -feat 1s_c_d_dd s2_4x
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 2.000000e+02
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 31
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 8.000000e+03
    -seed -1 -1
    -smoothspec no no
    -svspec
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 3.500000e+03
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.560000e-02

    INFO: acmod.c(250): Parsed model-specific feature parameters from
    /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-
    D0C48C54AC9B/DSF_2012_Research.app/feat.params
    INFO: fe_interface.c(292): You are using the internal mechanism to generate
    the seed.
    INFO: feat.c(713): Initializing feature stream to type: 's2_4x', ceplen=13,
    CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: mdef.c(517): Reading model definition: /var/mobile/Applications/FD2F1617
    -0D4B-4A38-A38F-D0C48C54AC9B/DSF_2012_Research.app/mdef
    INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef
    file
    INFO: bin_mdef.c(336): Reading binary model definition:
    /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-
    D0C48C54AC9B/DSF_2012_Research.app/mdef
    2012-05-23 10:43:53.371 DSF_2012_Research OPENEARSLOGGING: Audio route has
    changed for the following reason:
    2012-05-23 10:43:53.389 DSF_2012_Research OPENEARSLOGGING: There has been a
    change of category
    2012-05-23 10:43:53.398 DSF_2012_Research OPENEARSLOGGING: The previous audio
    route was Speaker
    2012-05-23 10:43:53.408 DSF_2012_Research OPENEARSLOGGING: This is not a case
    in which OpenEars performs a route change voluntarily. At the close of this
    function, the audio route is SpeakerAndMicrophone
    INFO: bin_mdef.c(513): 70 CI-phone, 65021 CD-phone, 3 emitstate/phone, 210 CI-
    sen, 5210 Sen, 11271 Sen-Seq
    INFO: tmat.c(205): Reading HMM transition probability matrices:
    /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-
    D0C48C54AC9B/DSF_2012_Research.app/transition_matrices
    INFO: acmod.c(125): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-
    D0C48C54AC9B/DSF_2012_Research.app/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-
    D0C48C54AC9B/DSF_2012_Research.app/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(354): 0 variance values floored
    ERROR: "s2_semi_mgau.c", line 1361: Number of streams does not match: 3 != 4
    INFO: acmod.c(127): Attempting to use PTHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-
    D0C48C54AC9B/DSF_2012_Research.app/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-
    D0C48C54AC9B/DSF_2012_Research.app/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: ptm_mgau.c(863): Number of codebooks doesn't match number of ciphones,
    doesn't look like PTM: 1 != 70
    INFO: acmod.c(129): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-
    D0C48C54AC9B/DSF_2012_Research.app/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /var/mobile/Applications/FD2F1617-0D4B-4A38-A38F-
    D0C48C54AC9B/DSF_2012_Research.app/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(354): 0 variance values floored
    ERROR: "ms_mgau.c", line 106: Number of streams does not match: 3 != 4
    2012-05-23 10:43:53.600 DSF_2012_Research OPENEARSLOGGING: Starting
    openAudioDevice on the device.
    2012-05-23 10:43:53.605 DSF_2012_Research OPENEARSLOGGING: Audio unit wrapper
    successfully created.
    2012-05-23 10:43:53.622 DSF_2012_Research OPENEARSLOGGING: Set audio route to
    SpeakerAndMicrophone
    2012-05-23 10:43:53.626 DSF_2012_Research OPENEARSLOGGING: Checking and
    resetting all audio session settings.
    2012-05-23 10:43:53.633 DSF_2012_Research OPENEARSLOGGING: audioCategory is
    correct, we will leave it as it is.
    2012-05-23 10:43:53.637 DSF_2012_Research OPENEARSLOGGING: bluetoothInput is
    correct, we will leave it as it is.
    2012-05-23 10:43:53.641 DSF_2012_Research OPENEARSLOGGING:
    categoryDefaultToSpeaker is correct, we will leave it as it is.
    2012-05-23 10:43:53.645 DSF_2012_Research OPENEARSLOGGING: preferredBufferSize
    is correct, we will leave it as it is.
    2012-05-23 10:43:53.649 DSF_2012_Research OPENEARSLOGGING:
    preferredSampleRateCheck is correct, we will leave it as it is.
    2012-05-23 10:43:53.653 DSF_2012_Research OPENEARSLOGGING: Setting the
    variables for the device and starting it.
    2012-05-23 10:43:53.657 DSF_2012_Research OPENEARSLOGGING: Looping through
    ringbuffer sections and pre-allocating them.
    2012-05-23 10:43:54.400 DSF_2012_Research OPENEARSLOGGING: Started audio
    output unit.
    2012-05-23 10:43:54.404 DSF_2012_Research OPENEARSLOGGING: Calibration has
    started
    2012-05-23 10:43:54.406 DSF_2012_Research calibrating
    2012-05-23 10:43:56.636 DSF_2012_Research OPENEARSLOGGING: Calibration has
    completed
    2012-05-23 10:43:56.644 DSF_2012_Research OPENEARSLOGGING: Project has these
    words in its dictionary:
    ADRESSE
    AN
    BEVOR
    COMPUTER
    DATEN
    EIN
    ELEPHANTEN
    FREIHEIT
    GEHEIM
    GENAU
    HALT
    HOLZ
    IRGENDWAS
    JETZT
    KLARTEXT
    LINKS
    MITTE
    NULL
    PASST
    QUADRAT
    RECHTS
    SENDEN
    TEILEN
    UNTEN
    VERIFIZIEREN
    WIRELESS
    ZEIT
    2012-05-23 10:43:56.648 DSF_2012_Research OPENEARSLOGGING: Listening.
    2012-05-23 10:43:56.653 DSF_2012_Research listening...

     
  • Nickolay V. Shmyrev

    I was missing the -feat parameter, which I had to take out for the training.

    You shouldn't do that

    "ERROR: "s2_semi_mgau.c", line 1361: Number of streams does not match: 3 !=
    4"

    Most likely you used some old version of pocketsphinx. Please try the latest
    version

    Proper feature type must be same as in training 1s_c_d_dd as in
    sphinx_train.cfg file

     

Log in to post a comment.