Im currently using Sphinx4 as part of my final year project in Uni. Overall, I've found it a lot of fun to work with (once I got my head around a few things that is). I've been getting pretty good results from using hub4. However, part of what I want to do is to be able to deal gracefully with out-of-vocab words. For this, I will be providing UI methods to allow the user to enter a correction. I want to then incorporate these improvements in future recognition tasks. To do this I was thinking (well, my supervisor gave me the idea) to add the entries into the dictionary file. First of all, I want to confirm, would this have the desired result? If I add a word to the dictionary (with correct phoneme mapping), will it be recognised if I were to run the same file again (at least in theory), or have I totally misunderstood the purpose of the dictionary? If I have misunderstood this, how could I go about adding that functionality? Secondly, I figure its much better practice to keep user-added words separate from the main body of the hub4 cmu dictionary, so I was really pleased when I saw the Addenda feature in the FastDictionary class. However, I cant for the life of me figure out how to set it up!! I get various different errors such as "unregistered component" whatever I try! Is there something I am missing? I would be really grateful if someone could shed some light on it. Thanks for your time!
Chris
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> add the entries into the dictionary file. First of all, I want to
> confirm, would this have the desired result?
It depends on the type grammar you're using: If it is a SimpleWordListGrammar adding the items to the dictionary should be sufficient. If this is not the case and you're using a rule-grammar, you should ensure that the added entries make it into one of the grammar-rules. Otherwise they will be just ignored. If you're using a statistical grammar (aka n-gram-model) your new items will get a kind of 'default'-appearance-probability but which is quite low.
When changing the dictionary online you need to ensure in any case that the search-space is either assembled online or that it becomes rebuild after you've added the new words.
> addenda feature in the FastDictionary class. However, I cant for
> the life of me figure out how to set it up!! I get various different
> errors such as "unregistered component"
Could you post the according snippet of your configuration-file?
Best regards,
Holger
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the quick response! I'm using an n-gram model (LargeTrigramModel) for my grammar, so would there be any way to increase this default appearance probability? I'd be happy getting my hands dirty in the S4 interiors if that were required.
"When changing the dictionary online you need to ensure in any case that the search-space is either assembled online or that it becomes rebuild after you've added the new words."
So, I assume I could do this with allocate()?
Here is the extract from my config. The bit that doesnt work is obviously the "addenda" property. I'm probably just going about this in the wrong way.
> would there be any way to increase this default appearance probability
sure, but I think it's anything but trivial. I would recommend to try the default-probabilities first.
> 'addenda' - Attempt to set unregistered property"
No idea yet. The configuration seems to be OK. Are you working with beta1 or the latest svn-version?
-Holger
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>> 'addenda' - Attempt to set unregistered property"
>No idea yet. The configuration seems to be OK. Are you working with beta1 or the latest svn-version?
Well, I was using a nightly snapshot from about 4 months ago, so I updated to the latest, but I still get the same problem. I will try adding the addenda setting to a property sheet, and then adding that to the dictionary at run-time as a temporary workaround.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, I think I have identified my problem. Could it be that I'm using an older version of hub4? Initially when I first started using Sphinx about 6 months ago, I was using beta1, then I started using a nightly build about 4 months ago, but I (for whatever reason... cant remember why!) linked the hub4 in the beta1 project folder. When I run it with that, I get the aforemention "Attempt to set unregistered property" exception, it works fine without the addenda property though. However, if I link it to the hub4 jar I have in my more very recent nightly build project folder, I get the following...
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class edu.cmu.sphinx.util.props.PropertySheet, but interface was expected
at edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model.newProperties(Model.java:158)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.newProperties(LexTreeLinguist.java:241)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:173)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:42)
at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:31)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:79)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:144)
Is it possible the latest version of S4 has some incompatibility with the Model.class file in the hub4 jar?
I'm getting a bit confused myself now! :D
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Is it possible the latest version of S4 has some incompatibility
> with the Model.class file in the hub4 jar?
Yep, we've rewritten the complete configuration backend since beta1. Mixing old and new code won't work in any case. It's confusing and a new release should end such problems.
-Holger
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Guys,
Im currently using Sphinx4 as part of my final year project in Uni. Overall, I've found it a lot of fun to work with (once I got my head around a few things that is). I've been getting pretty good results from using hub4. However, part of what I want to do is to be able to deal gracefully with out-of-vocab words. For this, I will be providing UI methods to allow the user to enter a correction. I want to then incorporate these improvements in future recognition tasks. To do this I was thinking (well, my supervisor gave me the idea) to add the entries into the dictionary file. First of all, I want to confirm, would this have the desired result? If I add a word to the dictionary (with correct phoneme mapping), will it be recognised if I were to run the same file again (at least in theory), or have I totally misunderstood the purpose of the dictionary? If I have misunderstood this, how could I go about adding that functionality? Secondly, I figure its much better practice to keep user-added words separate from the main body of the hub4 cmu dictionary, so I was really pleased when I saw the Addenda feature in the FastDictionary class. However, I cant for the life of me figure out how to set it up!! I get various different errors such as "unregistered component" whatever I try! Is there something I am missing? I would be really grateful if someone could shed some light on it. Thanks for your time!
Chris
Apologies for not replying sooner, I was away for a few days. Thanks, that fixed my problem!
Hi Chris,
> add the entries into the dictionary file. First of all, I want to
> confirm, would this have the desired result?
It depends on the type grammar you're using: If it is a SimpleWordListGrammar adding the items to the dictionary should be sufficient. If this is not the case and you're using a rule-grammar, you should ensure that the added entries make it into one of the grammar-rules. Otherwise they will be just ignored. If you're using a statistical grammar (aka n-gram-model) your new items will get a kind of 'default'-appearance-probability but which is quite low.
When changing the dictionary online you need to ensure in any case that the search-space is either assembled online or that it becomes rebuild after you've added the new words.
> addenda feature in the FastDictionary class. However, I cant for
> the life of me figure out how to set it up!! I get various different
> errors such as "unregistered component"
Could you post the according snippet of your configuration-file?
Best regards,
Holger
Hi Holger,
Thanks for the quick response! I'm using an n-gram model (LargeTrigramModel) for my grammar, so would there be any way to increase this default appearance probability? I'd be happy getting my hands dirty in the S4 interiors if that were required.
"When changing the dictionary online you need to ensure in any case that the search-space is either assembled online or that it becomes rebuild after you've added the new words."
So, I assume I could do this with allocate()?
Here is the extract from my config. The bit that doesnt work is obviously the "addenda" property. I'm probably just going about this in the wrong way.
The resulting Exception is: "Property Exception component:'dictionary' property:'addenda' - Attempt to set unregistered property"
Thanks again!
Chris
Hi Chris,
> would there be any way to increase this default appearance probability
sure, but I think it's anything but trivial. I would recommend to try the default-probabilities first.
> 'addenda' - Attempt to set unregistered property"
No idea yet. The configuration seems to be OK. Are you working with beta1 or the latest svn-version?
-Holger
>> 'addenda' - Attempt to set unregistered property"
>No idea yet. The configuration seems to be OK. Are you working with beta1 or the latest svn-version?
Well, I was using a nightly snapshot from about 4 months ago, so I updated to the latest, but I still get the same problem. I will try adding the addenda setting to a property sheet, and then adding that to the dictionary at run-time as a temporary workaround.
Ok, I think I have identified my problem. Could it be that I'm using an older version of hub4? Initially when I first started using Sphinx about 6 months ago, I was using beta1, then I started using a nightly build about 4 months ago, but I (for whatever reason... cant remember why!) linked the hub4 in the beta1 project folder. When I run it with that, I get the aforemention "Attempt to set unregistered property" exception, it works fine without the addenda property though. However, if I link it to the hub4 jar I have in my more very recent nightly build project folder, I get the following...
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class edu.cmu.sphinx.util.props.PropertySheet, but interface was expected
at edu.cmu.sphinx.model.acoustic.HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.Model.newProperties(Model.java:158)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.newProperties(LexTreeLinguist.java:241)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.newProperties(WordPruningBreadthFirstSearchManager.java:173)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:42)
at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:31)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:270)
at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:79)
at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:420)
at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:144)
Is it possible the latest version of S4 has some incompatibility with the Model.class file in the hub4 jar?
I'm getting a bit confused myself now! :D
Hi Chris,
> Is it possible the latest version of S4 has some incompatibility
> with the Model.class file in the hub4 jar?
Yep, we've rewritten the complete configuration backend since beta1. Mixing old and new code won't work in any case. It's confusing and a new release should end such problems.
-Holger
Yes, you must extract model from jar and rebuild it:
http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels.html
Search forum, this topic was discussed already.