Menu

Sphinx3 Dictionary file generation

Help
KBob
2010-03-26
2012-09-22
  • KBob

    KBob - 2010-03-26

    I am using Sphinx3 and have gotten cmuclmtk to generate the language model but
    have not found a way to generate the .dict file. Since I am doing a small
    phrase list with compound words, that may or may not be real words, I need a
    way to generate this file in c++ code. I know the online version of lmtool
    works but I need it on my local system.

     
  • Nickolay V. Shmyrev

    For english model you can use -lts_mismatch no option to use internal g2p
    code. For other languages or phoneset, you need to implement g2p code
    yourself. You can use various g2p implementations to do that like sequiturg2p,
    g2p from flite, fst-based g2p.

     
  • KBob

    KBob - 2010-03-26

    On which program is this option used? So far I am using English only.

     
  • Nickolay V. Shmyrev

    On any, it's a configuration of decoder. And of course it must be
    -lts_mismatch yes. You can try with sphinx3_decode for example or with
    sphinx3_continuous. Also, I really suggest you to try pocketsphinx instead of
    sphinx3. You can find details about that on the website.

     
  • KBob

    KBob - 2010-03-26

    Doesn't pocketsphinx also need the dic file?
    I'm trying to get the LM and dic to be as small as possible to increase
    accuracy.
    What it comes down to is I need a c++ call that given a list of words/compound
    words I get a lm and dic file.
    This list will be under 100 words in size. (Basically a list of currently
    available commands, which can be changed on the fly)

    (from the nightly build of pocketsphinx)
    -hmm ../../../model/hmm/wsj0
    -lm ../../../model/lm/turtle/turtle.lm.DMP
    -dict ../../../model/lm/turtle/turtle.dic
    -ctl ../../../model/lm/turtle/turtle.ctl
    -cepdir ../../../model/lm/turtle
    -cepext .16k
    -adcin TRUE

     
  • Nickolay V. Shmyrev

    The reason of choosing pocketsphinx is not the requirement to have a
    dictionary (all decoders need the dictionary, you can't avoid that). The
    reason is that pocketsphinx is supported software with frequent bugfix
    releases, documented API and good tested performance. With sphinx3 there are
    no guarantees.

     
  • KBob

    KBob - 2010-03-27

    I still need to generate the dictionary is either case, any suggestions?

    In testing pocketsphinx I had too many words recognized for just making some
    nonsense sounds. (model was 206 words with most compound) Sphinx3 properly
    ignored the sounds.

     
  • Nickolay V. Shmyrev

    I still need to generate the dictionary is either case, any suggestions?

    No, see lts_mismatch above

    In testing pocketsphinx I had too many words recognized for just making some
    nonsense sounds. (model was 206 words with most compound) Sphinx3 properly
    ignored the sounds.

    That can be fixed if you'll provide more info about problem.

     
  • KBob

    KBob - 2010-03-28

    If I was using one of the two programs then I could but I am integrating
    Sphinx3 into an ocx. I will try and trace what that option does.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.