How can be the "goodness" of lattices or confusion networks(sausages)
evaluated? Say I have two lattices (or sausages) generated from different
recognizers or from the same recognizer but with different parameters/models.
What metric I can use to test the quality of the WFSAs and prefer one over the
other if I know the true transcription?
Maybe the posterior probability of the "oracle" path through the lattice?
What if the oracle path is not perfectly present in the lattices (sausage) due
to insertions/deletions/substitutions of some of the words? Is there a metric
that can be used in this case?
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are mulitple parameters - 1-best accuracy, n-best accuracy, oracle WER,
entropy. For example you might want to use smaller entropy lattice even if it
has a bit lower WER.
I suggest you to look on "wlat-stats" script from SRILM.
As for LWER, David said one day
LWER calcuation is composition with a Levenshtein transducer then path
search you can of course replace the Levenshtein transducer with any other
error model.
The code for this thing is available in
Sphinxtrain/python/cmusphinx/lattice_error.py and lattice_error_fst.py
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you Nickolay - you are a living SR encyclopedia :)
There was a discussion about the lattices produced by S4 on the development ML
some months ago. One of the problems was for example that the hypotheses
produced were unballanced with more alternatives toward the end of the
lattice. Did you find the reason for this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One of the problems was for example that the hypotheses produced were
unballanced with more alternatives toward the end of the lattice. Did you find
the reason for this?
No, this is still pending, one should establish a public test set for the
confidence quality first of all. Something like
Hi,
How can be the "goodness" of lattices or confusion networks(sausages)
evaluated? Say I have two lattices (or sausages) generated from different
recognizers or from the same recognizer but with different parameters/models.
What metric I can use to test the quality of the WFSAs and prefer one over the
other if I know the true transcription?
Maybe the posterior probability of the "oracle" path through the lattice?
What if the oracle path is not perfectly present in the lattices (sausage) due
to insertions/deletions/substitutions of some of the words? Is there a metric
that can be used in this case?
Thanks!
There are mulitple parameters - 1-best accuracy, n-best accuracy, oracle WER,
entropy. For example you might want to use smaller entropy lattice even if it
has a bit lower WER.
I suggest you to look on "wlat-stats" script from SRILM.
As for LWER, David said one day
The code for this thing is available in
Sphinxtrain/python/cmusphinx/lattice_error.py and lattice_error_fst.py
Thank you Nickolay - you are a living SR encyclopedia :)
There was a discussion about the lattices produced by S4 on the development ML
some months ago. One of the problems was for example that the hypotheses
produced were unballanced with more alternatives toward the end of the
lattice. Did you find the reason for this?
No, this is still pending, one should establish a public test set for the
confidence quality first of all. Something like
http://www.cs.cmu.edu/~rongz/eurospeech_2005.pdf