I was trying to create a simple tutorial on using Sphinx4 and discovered a strange thing...
I have a simple setup based on the an4 database. A configuration that uses a trivial unigram model provides me with 78% of accuracy. For comparison, I created another setup that uses a trivial JSGF grammar (word loop, indeed). With that, I have 56%, accompanied with numerous "Falling back to non-recursive partition" messages. It is really strange, given ~100 words in vocabulary.
Just wondering, what it might be, and how to cope with that. The two configurations are almost equal. I guess, both unigram and JSGF should provide comparable performance. Maybe, problem is with FlatLinguist?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
XML config was an attempt to put some code logic into xml. Our users are not qualified enough to edit decoder internals through configuration files and don't usually understand or account important dependencies between configuration file objects.
Our current work on high-level API allows users to access s4 features though API, though not perfect, but focused on specific usecases. I hope s4 will be more straightforward to use in the future due to that.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried to switch to the new API instead of using configs, but cannot figure out how to perform a recognition of multiple input files in a raw (using StreamSpeechRecognizer) with the same acoustic and language models.
As I understood, new input file can be set in startRecognition method only, which allocates the recognizer. Does it mean that I should allocate-deallocate recognizer for every single file?
Ok, with no respect to a certain way of usage: what does the message 'Falling back to non-recursive partition" mean? What might be the reason? Can it be caused by a slow laptop running the recognizer?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As I understood, new input file can be set in startRecognition method only, which allocates the recognizer. Does it mean that I should allocate-deallocate recognizer for every single file?
That has to be changed
Ok, with no respect to a certain way of usage: what does the message 'Falling back to non-recursive partition" mean? What might be the reason? Can it be caused by a slow laptop running the recognizer?
There should be no such message in recent sources, it has been removed quite some time ago, it was for debugging. The reason of this message is that active list pruning is implemented with recursive quicksort-like algorithm and if there are too many similar scores in the beam window then this algorithm goes out of stack. In that case algorithm switches to linear sorting which this debug message is notifying about.
Overall it's not a good situation when you have this message. It means something is wrong either with your model or with input features.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I was trying to create a simple tutorial on using Sphinx4 and discovered a strange thing...
I have a simple setup based on the an4 database. A configuration that uses a trivial unigram model provides me with 78% of accuracy. For comparison, I created another setup that uses a trivial JSGF grammar (word loop, indeed). With that, I have 56%, accompanied with numerous "Falling back to non-recursive partition" messages. It is really strange, given ~100 words in vocabulary.
Just wondering, what it might be, and how to cope with that. The two configurations are almost equal. I guess, both unigram and JSGF should provide comparable performance. Maybe, problem is with FlatLinguist?
I attach here the files you might want to look into.
We do not recommend custom sphinx4 configs anymore, in particular you shouldn't use them in any tutorials.
Out of curiocity, what is the reason behind this?
XML config was an attempt to put some code logic into xml. Our users are not qualified enough to edit decoder internals through configuration files and don't usually understand or account important dependencies between configuration file objects.
Our current work on high-level API allows users to access s4 features though API, though not perfect, but focused on specific usecases. I hope s4 will be more straightforward to use in the future due to that.
What do you mean under 'custom'? Each particular ASR task requires a config to be written...
Does it mean that there exists a standard config for my task?
No, there is a default.config.xml now, but it will probably be replaced by the code in the future too.
I'm not sure what your task is, but we'd like to see the support for most of the practical tasks in API, not in the configuration.
I tried to switch to the new API instead of using configs, but cannot figure out how to perform a recognition of multiple input files in a raw (using StreamSpeechRecognizer) with the same acoustic and language models.
As I understood, new input file can be set in startRecognition method only, which allocates the recognizer. Does it mean that I should allocate-deallocate recognizer for every single file?
Ok, with no respect to a certain way of usage: what does the message 'Falling back to non-recursive partition" mean? What might be the reason? Can it be caused by a slow laptop running the recognizer?
That has to be changed
There should be no such message in recent sources, it has been removed quite some time ago, it was for debugging. The reason of this message is that active list pruning is implemented with recursive quicksort-like algorithm and if there are too many similar scores in the beam window then this algorithm goes out of stack. In that case algorithm switches to linear sorting which this debug message is notifying about.
Overall it's not a good situation when you have this message. It means something is wrong either with your model or with input features.