Menu

Different recognition results on ARPA and DMP models

Help
2015-04-20
2015-04-20
  • Vsevolod Pelipas

    Hello,

    I'm using Pocketsphinx with the latest Russian models downloaded from this site (zero_ru_cont_8k_v3) and observing that source (ARPA) and DMP model build from it are returning different results.

    E.g. on the bundled decoder-test.wav sample, ARPA model returns expected "илья ильф евгений петров золотой телёнок".

    When I tried to convert the model to DMP:
    sphinx_lm_convert -i ru.lm -o ru.lm.dmp)
    and use DMP model:
    pocketsphinx_continuous \
    -samprate 8000 \
    -lm ru.lm.dmp \
    -dict ru.dic \
    -hmm zero_ru.cd_cont_4000 \
    -logfn /dev/null \
    -remove_noise no \
    -infile decoder-test.wav
    the result was a bit surprising for me: "илья киев евгения петров золотой телёнок".

    Is it by design or I did something wrong in the conversion scenario?

     
    • Nickolay V. Shmyrev

      DMP format supports only 64k words in vocabulary, Russian model does not fit that. You need to use arpa format or wait till we merge sphinxbase-trie branch (couple of weeks) with the new binary format allowing unlimited vocabulary.

       
  • Vsevolod Pelipas

    Nickolay, thanks for your answer!

     

Log in to post a comment.