Hi, I’m trying to write an iOS5 app that employs speech recognition using Open
Ears. To that end, I wish to have a ".dmp" (or a ".lm" Language Model file)
and a ".dic" Dictionary file. Essentially, I'd like the same output as the
online CMU lmtool. I can't use the online tool because the number of unique
tokens I have exceeds the 5000 limit. My only choice is to use an offline
tool.
I'm doing the Language modeling on a Windows 7 machine.
So far, I've managed to generate a ".dmp" file using the CMU Sphinx toolkit
tools. Specifically, I have:
(1) Created a text file with a number of sentences denoted by tags to
use as reference text.
(2) Generated a Vocabulary file (.vocab) using “text2wfreq.exe” based on that
text file.
(3) Generated an “.idngram” file from the “.vocab” file using
“text2idngram.exe”.
(4) Generated an “.arpa” file from the “.idngram” file using “idngram21m.exe”.
(5) Generated a “.dmp” file from the “.arpa” file using
“sphinx_lm_convert.exe”.
The snag is that I don’t know how to create a ".dic" Dictionary file.
Within the CMUSphinx "logios" package, there's an .exe called "pronounce.exe".
Is this what is used to generate a ".dic" file? If so, what syntax must I use?
From the Windows command-line, I've tried "pronounce.exe -i sentences.txt -o
myDictionary.dic" . However, I just end up getting a fatal error: "WARN>
lexddata/ resources not found; only dictionary lookup possible." and "WARN>
cannot open dictionary file ./lib/dict/Current_Directory".
My guess is that I need to include the .dmp or other output somehow.
I'm sorry that this must seem such a simple question (and my command-line
skills are embarrassing) but I've been trying to create a Dictionary file for
2 days now without success. I'd really appreciate any help, however, basic.
Many thanks in advance!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm sorry that this must seem such a simple question (and my command-line
skills are embarrassing) but I've been trying to create a Dictionary file for
2 days now without success. I'd really appreciate any help, however, basic.
Checkout g2p branch, it contains a new implementation of the g2p tool. It
requires openfst and opengram libraries though.
Or you can use phonetisaurus. The link is in tutorial.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
first , what i understand from your question , is tthat you want a way to
create .dic file , i do not know the logios package you speeke about , but
there link to logios tool i use it to create .dic file for english words http://www.speech.cs.cmu.edu/tools/lextool.html
and if the file is large divide it into many files then combine the results in
one file
but if you want to create .dic file for another language like italy , arabic
,etc
you should follow one of 2 ways
first way is to write all words in its spelling in english letters , and then
use the link i put to generate .dic files , and internally in code convert
each word from spelling in dictionary into its origin letters in your language
Second way is to write words in your language , and write its spelling in your
language beside it ,
finally , if you need help in create dic file for english or arabic languages
,you will find me any time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I’m trying to write an iOS5 app that employs speech recognition using Open
Ears. To that end, I wish to have a ".dmp" (or a ".lm" Language Model file)
and a ".dic" Dictionary file. Essentially, I'd like the same output as the
online CMU lmtool. I can't use the online tool because the number of unique
tokens I have exceeds the 5000 limit. My only choice is to use an offline
tool.
I'm doing the Language modeling on a Windows 7 machine.
So far, I've managed to generate a ".dmp" file using the CMU Sphinx toolkit
tools. Specifically, I have:
(1) Created a text file with a number of sentences denoted by
tags touse as reference text.
(2) Generated a Vocabulary file (.vocab) using “text2wfreq.exe” based on that
text file.
(3) Generated an “.idngram” file from the “.vocab” file using
“text2idngram.exe”.
(4) Generated an “.arpa” file from the “.idngram” file using “idngram21m.exe”.
(5) Generated a “.dmp” file from the “.arpa” file using
“sphinx_lm_convert.exe”.
The snag is that I don’t know how to create a ".dic" Dictionary file.
I've read the tutorial page at: http://cmusphinx.sourceforge.net/wiki/tutoria
ldict.
I've downloaded the package at: http://cmusphinx.svn.sourceforge.net/viewvc/c
musphinx/trunk/logios.
Within the CMUSphinx "logios" package, there's an .exe called "pronounce.exe".
Is this what is used to generate a ".dic" file? If so, what syntax must I use?
From the Windows command-line, I've tried "pronounce.exe -i sentences.txt -o
myDictionary.dic" . However, I just end up getting a fatal error: "WARN>
lexddata/ resources not found; only dictionary lookup possible." and "WARN>
cannot open dictionary file ./lib/dict/Current_Directory".
My guess is that I need to include the .dmp or other output somehow.
I'm sorry that this must seem such a simple question (and my command-line
skills are embarrassing) but I've been trying to create a Dictionary file for
2 days now without success. I'd really appreciate any help, however, basic.
Many thanks in advance!
Checkout g2p branch, it contains a new implementation of the g2p tool. It
requires openfst and opengram libraries though.
Or you can use phonetisaurus. The link is in tutorial.
first , what i understand from your question , is tthat you want a way to
create .dic file , i do not know the logios package you speeke about , but
there link to logios tool i use it to create .dic file for english words
http://www.speech.cs.cmu.edu/tools/lextool.html
and if the file is large divide it into many files then combine the results in
one file
but if you want to create .dic file for another language like italy , arabic
,etc
you should follow one of 2 ways
first way is to write all words in its spelling in english letters , and then
use the link i put to generate .dic files , and internally in code convert
each word from spelling in dictionary into its origin letters in your language
Second way is to write words in your language , and write its spelling in your
language beside it ,
finally , if you need help in create dic file for english or arabic languages
,you will find me any time.