Menu

How do you generate ".dic" d...

Help
luckman
2012-06-20
2012-09-22
  • luckman

    luckman - 2012-06-20

    Hi, I’m trying to write an iOS5 app that employs speech recognition using Open
    Ears. To that end, I wish to have a ".dmp" (or a ".lm" Language Model file)
    and a ".dic" Dictionary file. Essentially, I'd like the same output as the
    online CMU lmtool. I can't use the online tool because the number of unique
    tokens I have exceeds the 5000 limit. My only choice is to use an offline
    tool.

    I'm doing the Language modeling on a Windows 7 machine.

    So far, I've managed to generate a ".dmp" file using the CMU Sphinx toolkit
    tools. Specifically, I have:
    (1) Created a text file with a number of sentences denoted by tags to
    use as reference text.
    (2) Generated a Vocabulary file (.vocab) using “text2wfreq.exe” based on that
    text file.
    (3) Generated an “.idngram” file from the “.vocab” file using
    “text2idngram.exe”.
    (4) Generated an “.arpa” file from the “.idngram” file using “idngram21m.exe”.
    (5) Generated a “.dmp” file from the “.arpa” file using
    “sphinx_lm_convert.exe”.

    The snag is that I don’t know how to create a ".dic" Dictionary file.

    I've read the tutorial page at: http://cmusphinx.sourceforge.net/wiki/tutoria
    ldict.

    I've downloaded the package at: http://cmusphinx.svn.sourceforge.net/viewvc/c
    musphinx/trunk/logios.

    Within the CMUSphinx "logios" package, there's an .exe called "pronounce.exe".
    Is this what is used to generate a ".dic" file? If so, what syntax must I use?

    From the Windows command-line, I've tried "pronounce.exe -i sentences.txt -o
    myDictionary.dic" . However, I just end up getting a fatal error: "WARN>
    lexddata/ resources not found; only dictionary lookup possible." and "WARN>
    cannot open dictionary file ./lib/dict/Current_Directory".

    My guess is that I need to include the .dmp or other output somehow.

    I'm sorry that this must seem such a simple question (and my command-line
    skills are embarrassing) but I've been trying to create a Dictionary file for
    2 days now without success. I'd really appreciate any help, however, basic.

    Many thanks in advance!

     
  • Nickolay V. Shmyrev

    I'm sorry that this must seem such a simple question (and my command-line
    skills are embarrassing) but I've been trying to create a Dictionary file for
    2 days now without success. I'd really appreciate any help, however, basic.

    Checkout g2p branch, it contains a new implementation of the g2p tool. It
    requires openfst and opengram libraries though.

    Or you can use phonetisaurus. The link is in tutorial.

     
  • toneemy

    toneemy - 2012-06-22

    first , what i understand from your question , is tthat you want a way to
    create .dic file , i do not know the logios package you speeke about , but
    there link to logios tool i use it to create .dic file for english words
    http://www.speech.cs.cmu.edu/tools/lextool.html
    and if the file is large divide it into many files then combine the results in
    one file
    but if you want to create .dic file for another language like italy , arabic
    ,etc
    you should follow one of 2 ways
    first way is to write all words in its spelling in english letters , and then
    use the link i put to generate .dic files , and internally in code convert
    each word from spelling in dictionary into its origin letters in your language
    Second way is to write words in your language , and write its spelling in your
    language beside it ,
    finally , if you need help in create dic file for english or arabic languages
    ,you will find me any time.

     

Log in to post a comment.