Hi,
I am using pocketsphinx 0.7 and trying to get nbest list using language model
search
I had a clip which had the words "GO FORWARD"
It was decoded as
OPEN FORWARD (n00000000 -12100)
In the nbestdir a file got created as n00000000.hyp
This file had the following results
OPEN FORWARD -4886
LAST FORWARD -5057
NEW FORWARD -5116
NEXT FORWARD -5155
How do I interpret the nbest results. Are the scores given in the nbest list
the acoustic scores or the posterior probabilities. Can these scores be
treated as approximate confidence scores.
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does the hyp result given by the ps_get_hyp corresponds to the result with the
best score given in the nbest list.
If yes, then why in the above example the score of ps_get_hyp result OPEN
FORWARD (n00000000 -12100) doesn't matches with any of the results given in
the nbest list.
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This issue requires investigation. One need to dump the lattice first for a
small grammar and check if bestpath in a lattice is indeed the one returned by
n-best. Then we need to compare lattice bestpath with the results of the
ps_get_hyp. It might be a bug.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could you suggest what minimal additional calculations will be required for
measuring confidence scores (FSG search). Is it possible to measure confidence
scores by performing calculations on the word lattice at the application level
only without touching the code related to the history entry generation.
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Unfortunatly a good confidence score for FSG search is a subject of extensive
research. One can implement several methods which aren't based on posteriour
for example acoustic-score-based confidence or background garbage model
confidence. There was a good confidence estimation implemented in sphinx2, see
the function search_hyp_conf in sphinx2 sources. It's not yet implemented in
pocketsphinx.
Most methods will require modification of the search algorithm.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
With my limited test data I have observed that most of the times when recognition is correct posterior probability is zero, which is very logical. But sometimes even though recognition is correct and all nbest results are also identical but the posterior probability is non zero and there are a few instances when posterior probability is zero but the recognition is not correct. So what do posterior probabilities indicate at? I was thinking of usually the posterior probabilities as a crude confidence score.
I am thinking of trying to port search_hyp_conf from sphinx2 to pocketsphinx. What complexities might be there? Will it require any change in the way the history entries are created, i.e will it to require a modification of code inside fsg_search_step function.
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For small grammar posterior probability reported has little meaning unless you
include pronunciation variants and confusable words in grammar.
What complexities might be there? Will it require any change in the way the
history entries are created, i.e will it to require a modification of code
inside fsg_search_step function.
It has separate phone loop search which has to be implemented.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is an auxiliary phone loop search in pocketsphinx which gets enabled
when -pl_window is specified. Is this the same phone_loop_search which you are
referring to.
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is this the same phone_loop_search which you are referring to.
It's similar phone loop search but s2 version accumulates more statistics
about active phones and uses this statistics to derive per-phone confidence
and per-word confidence in the result.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sphinx2 gives the confidence scores for an individual word. What would be the
confidence score for an utterance consisting of multiple words. Will it be the
sum or product of the confidence scores of the individual words.
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am using pocketsphinx 0.7 and trying to get nbest list using language model
search
I had a clip which had the words "GO FORWARD"
It was decoded as
OPEN FORWARD (n00000000 -12100)
In the nbestdir a file got created as n00000000.hyp
This file had the following results
OPEN FORWARD -4886
LAST FORWARD -5057
NEW FORWARD -5116
NEXT FORWARD -5155
How do I interpret the nbest results. Are the scores given in the nbest list
the acoustic scores or the posterior probabilities. Can these scores be
treated as approximate confidence scores.
Regards
Pankaj
Hello
The score returned is an typical path score used in ps_get_hyp. It is a sum of
acoustic score and language score.
It has little use as is I think, you need to do additional calculations to
work with it.
Hi,
Does the hyp result given by the ps_get_hyp corresponds to the result with the
best score given in the nbest list.
If yes, then why in the above example the score of ps_get_hyp result OPEN
FORWARD (n00000000 -12100) doesn't matches with any of the results given in
the nbest list.
Regards
Pankaj
This issue requires investigation. One need to dump the lattice first for a
small grammar and check if bestpath in a lattice is indeed the one returned by
n-best. Then we need to compare lattice bestpath with the results of the
ps_get_hyp. It might be a bug.
Hi Nicole
Could you suggest what minimal additional calculations will be required for
measuring confidence scores (FSG search). Is it possible to measure confidence
scores by performing calculations on the word lattice at the application level
only without touching the code related to the history entry generation.
Regards
Pankaj
Hello
Unfortunatly a good confidence score for FSG search is a subject of extensive
research. One can implement several methods which aren't based on posteriour
for example acoustic-score-based confidence or background garbage model
confidence. There was a good confidence estimation implemented in sphinx2, see
the function search_hyp_conf in sphinx2 sources. It's not yet implemented in
pocketsphinx.
Most methods will require modification of the search algorithm.
Hi Nicole,
With my limited test data I have observed that most of the times when recognition is correct posterior probability is zero, which is very logical. But sometimes even though recognition is correct and all nbest results are also identical but the posterior probability is non zero and there are a few instances when posterior probability is zero but the recognition is not correct. So what do posterior probabilities indicate at? I was thinking of usually the posterior probabilities as a crude confidence score.
I am thinking of trying to port search_hyp_conf from sphinx2 to pocketsphinx. What complexities might be there? Will it require any change in the way the history entries are created, i.e will it to require a modification of code inside fsg_search_step function.
Regards
Pankaj
For small grammar posterior probability reported has little meaning unless you
include pronunciation variants and confusable words in grammar.
It has separate phone loop search which has to be implemented.
Hi Nicole,
There is an auxiliary phone loop search in pocketsphinx which gets enabled
when -pl_window is specified. Is this the same phone_loop_search which you are
referring to.
Regards
Pankaj
It's similar phone loop search but s2 version accumulates more statistics
about active phones and uses this statistics to derive per-phone confidence
and per-word confidence in the result.
Hi,
Sphinx2 gives the confidence scores for an individual word. What would be the
confidence score for an utterance consisting of multiple words. Will it be the
sum or product of the confidence scores of the individual words.
Pankaj
Geometrical mean of probs or average of logprobs.