All I want to do to learn how to get a handle on the Configuration file (config.xml) is change the Dictionary from TIDIGITS to WSJ - it builds fine but when I run it, I get errors saying that the resource could not be found - what gives? I want to learn how to manage the Dictionary, Acoustic Model, and Grammar - but if I can't make a simple change like that ... sheesh. Otherwise, things are executing like a charm with my own test files.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2006-02-16
First of all, a caveat: it has been 9 months since I left my job where I worked with Sphinx-4, and I no longer have access to my working files and documents. Therefore, any advice is subject to being incomplete and/or wrong!
As I recall, in order to specify a different acoustic model in the config file, you must do so in twolaces, in the acoustic model component as well as the dictionary component. See a posting by Paul Lamere in the Sphinx4 Open Discussion forum, thread title "8k sampling rate", which is dated 2005-03-09.
In a private e-mail to me, you said, "that was a phone call that I recorded using voice modemn ... I up-sampled the file [from 8 kHz] to 16khz in order to match with the requirements of the wjs or tidigits. But it just returns bad result."
Please see Sphinx4 Open Discussion threads "Ipaq transcriber wav format" (2005-02-22) and/or "8 bit i KHz woes (urgent)" (2005-10-23). The bottom line is that you can't upsample an 8 kHz signal to 16 kHz and expect it to act like a genuine 16 kHz signal. If your file was originally sampled at 8, then you'll need to use a valid 8 kHz acoustic model.
In addition, there may be other questions regarding the format of this audio data.
-- is it 16-bit linear PCM or 8-bit mu-law or a-law? Sphinx-4 currently handles only 16-bit linear, not mu/A-law data (see Sphinx-4 Open Discussion forum).
-- if it's 16-bit linear, be sure that the big/little-endianness is correct.
Note that the 8 kHz WSJ acoustic model in the Sphinx-4 distribution was produced by artificially downsampling a 16 kHz dataset to 8 kHz and then training with that data. That's only an approximation to actual telephony data. It should work reasonably well, but it's not a true telephone-data model.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hmm. From your description, we probably can't give you any concrete advices. Would you send us the log of your errors?
It seems to me you want to change the dictionary or the acoustic model or the grammar. Changing each of these, depends on the degree of detail, require different level of skill. Probably you could also give us slightly more detail account of what you want to do.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Arthur ... Just to let you know .. I'm an idiot .. that's what I get for post while suffering from a fever .. it's the Manifest file ... I forgot to change it ....
NEVER EVER TRY TO CODE WHILE ILL ... Lesson learned .. thank you for your patience.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Still getting the problem even after changing the manifest file .. but I'm going to bed ... this should be an easy problem I am sure....
thanx for your patience...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you'll bear with me since I'm not sure how this forum will display the tagging information. The original pristine demo of config.xml for the Transcriber Dictionary is:
====================================================
The Dictionary entry I'm trying to use is:
<!-- ******** -->
<!-- The Dictionary configuration -->
<!-- ******** -->
============================================================================================================================================================
Everything builds fine when I do an 'ant clean' and then an 'ant' ...
But when I do an 'java -jar bin/Transcriber.jar' , I get this:
============================================================================================================================================================
$ java -jar bin/Transcriber.jar
Problem configuring Transcriber: Property Exception component:'dictionary' prope rty:'dictionaryPath' - Can't locate resource:/edu.cmu.sphinx.model.acoustic.WSJ_ 8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
Property Exception component:'dictionary' property:'dictionaryPath' - Can't loca te resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800H z.Model ..... at demo.sphinx.transcriber.Transcriber.main(Transcriber.java:52)
=============================================
What gives?? Shouldn't I be able to use the the basic Dictionary information used in the HellWorld Demo and use it n the Transcriber Demo?? I did successfully add my wife's name to the cmudict.0.6d file so I am making progress in understanding other parts of this thing. Thanks for your prompt feedback.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, I finally got what I wanted ... after I went to bed and got some rest ...
Created a new directory and new demo project (copied the transcriber demo directory to a new name actually).
Created a new entry in demo.xml for ant to build
This forced me to change a few things in the .java and .Manifest files to get my new project working just like the original transcriber demo
Did a line by line comparison of HelloWorld and Transcriber config.xml files and converted Transcriber to use the WSJ models where appropriate - it soon became apparent that the Dictionary is NOT mutually exclusive from the Acoustic Model as is portrayed by the Architectural Diagram. I had to change both. Perhaps the diagram should be changed to have the Acoustic Model and Dictionary in a single Rectangle Box with a dashed line separating the two?
Once I changed both the Acoustic Model and the Dictionary to WSJ, and modified the Grammar to accept a few words, the modified transcriber demo worked like a champ. Many thanks Arthur.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Darren, Arthur,
I am trying to modify the Transcriber to read in my test file using the WSJ model. Following this thread I was able to make necessary changes, but, the result was so poor, if not useless. Because all the reading was wrong. Is there any specific change should I make to make the app recognize file record through a telephony device? Your help is much appreciated.
Regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2006-02-14
If I remember correctly, the Transcriber demo uses a 16 kHz acoustic model -- that is, it uses speech sampled at 16 kHz and front-end signal processing with parameters appropriate for this signal rate and bandwidth. When you say "record[ed] through a telephony device", it makes me wonder if this file was sampled at 16 kHz or some other rate (since telephony normally uses 8 kHz sampling).
If your new data isn't full-bandwidth 16 khz sampled, then this mismatch will indeed cause poor recognition.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, the wav file was recorded at 8khz. And also, I changed the dict model to WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar (is this the right one to use - I'm not sure) And did some other changes such as:
-- modify this component (according to thread http://sourceforge.net/forum/forum.php?thread_id=1409893&forum_id=382337)
All I want to do to learn how to get a handle on the Configuration file (config.xml) is change the Dictionary from TIDIGITS to WSJ - it builds fine but when I run it, I get errors saying that the resource could not be found - what gives? I want to learn how to manage the Dictionary, Acoustic Model, and Grammar - but if I can't make a simple change like that ... sheesh. Otherwise, things are executing like a charm with my own test files.
First of all, a caveat: it has been 9 months since I left my job where I worked with Sphinx-4, and I no longer have access to my working files and documents. Therefore, any advice is subject to being incomplete and/or wrong!
As I recall, in order to specify a different acoustic model in the config file, you must do so in twolaces, in the acoustic model component as well as the dictionary component. See a posting by Paul Lamere in the Sphinx4 Open Discussion forum, thread title "8k sampling rate", which is dated 2005-03-09.
In a private e-mail to me, you said, "that was a phone call that I recorded using voice modemn ... I up-sampled the file [from 8 kHz] to 16khz in order to match with the requirements of the wjs or tidigits. But it just returns bad result."
Please see Sphinx4 Open Discussion threads "Ipaq transcriber wav format" (2005-02-22) and/or "8 bit i KHz woes (urgent)" (2005-10-23). The bottom line is that you can't upsample an 8 kHz signal to 16 kHz and expect it to act like a genuine 16 kHz signal. If your file was originally sampled at 8, then you'll need to use a valid 8 kHz acoustic model.
In addition, there may be other questions regarding the format of this audio data.
-- is it 16-bit linear PCM or 8-bit mu-law or a-law? Sphinx-4 currently handles only 16-bit linear, not mu/A-law data (see Sphinx-4 Open Discussion forum).
-- if it's 16-bit linear, be sure that the big/little-endianness is correct.
Note that the 8 kHz WSJ acoustic model in the Sphinx-4 distribution was produced by artificially downsampling a 16 kHz dataset to 8 kHz and then training with that data. That's only an approximation to actual telephony data. It should work reasonably well, but it's not a true telephone-data model.
cheers,
jerry
Hmm. From your description, we probably can't give you any concrete advices. Would you send us the log of your errors?
It seems to me you want to change the dictionary or the acoustic model or the grammar. Changing each of these, depends on the degree of detail, require different level of skill. Probably you could also give us slightly more detail account of what you want to do.
Arthur
Arthur ... Just to let you know .. I'm an idiot .. that's what I get for post while suffering from a fever .. it's the Manifest file ... I forgot to change it ....
NEVER EVER TRY TO CODE WHILE ILL ... Lesson learned .. thank you for your patience.
Still getting the problem even after changing the manifest file .. but I'm going to bed ... this should be an easy problem I am sure....
thanx for your patience...
Let's just say you are the one who solve the problem. In this case, it seems that my patience help. :-)
Don't hesitate to ask us more questions. We will be there. -Arthur
If you'll bear with me since I'm not sure how this forum will display the tagging information. The original pristine demo of config.xml for the Transcriber Dictionary is:
====================================================
The Dictionary entry I'm trying to use is:
<!-- ******** -->
<!-- The Dictionary configuration -->
<!-- ******** -->
============================================================================================================================================================
Everything builds fine when I do an 'ant clean' and then an 'ant' ...
But when I do an 'java -jar bin/Transcriber.jar' , I get this:
============================================================================================================================================================
$ java -jar bin/Transcriber.jar
Problem configuring Transcriber: Property Exception component:'dictionary' prope rty:'dictionaryPath' - Can't locate resource:/edu.cmu.sphinx.model.acoustic.WSJ_ 8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
Property Exception component:'dictionary' property:'dictionaryPath' - Can't loca te resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800H z.Model ..... at demo.sphinx.transcriber.Transcriber.main(Transcriber.java:52)
=============================================
What gives?? Shouldn't I be able to use the the basic Dictionary information used in the HellWorld Demo and use it n the Transcriber Demo?? I did successfully add my wife's name to the cmudict.0.6d file so I am making progress in understanding other parts of this thing. Thanks for your prompt feedback.
Well, I finally got what I wanted ... after I went to bed and got some rest ...
Once I changed both the Acoustic Model and the Dictionary to WSJ, and modified the Grammar to accept a few words, the modified transcriber demo worked like a champ. Many thanks Arthur.
Hi Darren, Arthur,
I am trying to modify the Transcriber to read in my test file using the WSJ model. Following this thread I was able to make necessary changes, but, the result was so poor, if not useless. Because all the reading was wrong. Is there any specific change should I make to make the app recognize file record through a telephony device? Your help is much appreciated.
Regards
If I remember correctly, the Transcriber demo uses a 16 kHz acoustic model -- that is, it uses speech sampled at 16 kHz and front-end signal processing with parameters appropriate for this signal rate and bandwidth. When you say "record[ed] through a telephony device", it makes me wonder if this file was sampled at 16 kHz or some other rate (since telephony normally uses 8 kHz sampling).
If your new data isn't full-bandwidth 16 khz sampled, then this mismatch will indeed cause poor recognition.
cheers,
jerry
Thanks for your response,
Yes, the wav file was recorded at 8khz. And also, I changed the dict model to WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar (is this the right one to use - I'm not sure) And did some other changes such as:
-- modify this component (according to thread http://sourceforge.net/forum/forum.php?thread_id=1409893&forum_id=382337)
<component name="melFilterBank"
type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">
<property name="numberFilters" value="31"/>
<property name="minimumFrequency" value="200"/>
<property name="maximumFrequency" value="3500"/>
</component>
and still get errors. Could you please give me some direction to get this solved.
Regards