Different recognition results on ARPA and DMP models

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Different recognition results on ARPA and DMP models

Forum: Help

Creator: Vsevolod Pelipas

Created: 2015-04-20

Updated: 2015-04-20

Vsevolod Pelipas - 2015-04-20

Hello,

I'm using Pocketsphinx with the latest Russian models downloaded from this site (zero_ru_cont_8k_v3) and observing that source (ARPA) and DMP model build from it are returning different results.

E.g. on the bundled decoder-test.wav sample, ARPA model returns expected "илья ильф евгений петров золотой телёнок".

When I tried to convert the model to DMP:
sphinx_lm_convert -i ru.lm -o ru.lm.dmp)
and use DMP model:
pocketsphinx_continuous \
-samprate 8000 \
-lm ru.lm.dmp \
-dict ru.dic \
-hmm zero_ru.cd_cont_4000 \
-logfn /dev/null \
-remove_noise no \
-infile decoder-test.wav
the result was a bit surprising for me: "илья киев евгения петров золотой телёнок".

Is it by design or I did something wrong in the conversion scenario?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-04-20
  
  DMP format supports only 64k words in vocabulary, Russian model does not fit that. You need to use arpa format or wait till we merge sphinxbase-trie branch (couple of weeks) with the new binary format allowing unlimited vocabulary.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vsevolod Pelipas - 2015-04-20

Nickolay, thanks for your answer!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.