I am working with the RM dataset and online2 decoder.
I want the list of words that were misrecognized. The current script uses compute-wer to give the number of INS,DEL and SUB but does not list which words were recognized incorrectly. Does kaldi provide any utilities for such a task?
I know that there are tools like SCTK which give detailed analysis of the errors. I have seen that there are scripts for other datasets that use SCTK, RM however does not. Is it because SCTK requires references in the STM format?
What are my options here?
Thanks,
Yash
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, sctk requires stm-format references and it gets complex quickly.
But look at the usage message of compute-wer; you can run it in a mode
where it will output more detailed information and if you pipe it into sort
| uniq -c, I think you get something quite usable.
Dan
I am working with the RM dataset and online2 decoder.
I want the list of words that were misrecognized. The current script uses
compute-wer to give the number of INS,DEL and SUB but does not list which
words were recognized incorrectly. Does kaldi provide any utilities for
such a task?
I know that there are tools like SCTK which give detailed analysis of the
errors. I have seen that there are scripts for other datasets that use
SCTK, RM however does not. Is it because SCTK requires references in the
STM format?
Thanks for you prompt response yesterday. Now, I have the list of words that were recognized incorrectly. I was wondering if there is a way to output more stats for errors using compute-wer, like the files in which these errors occur?
For my purpose, I want to associate each of these errors with a test file.
Thanks,
Yash
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for you prompt response yesterday. Now, I have the list of words
that were recognized incorrectly. I was wondering if there is a way to
output more stats for errors using compute-wer, like the files in which
these errors occur?
For my purpose, I want to associate each of these errors with a test file.
Hi,
I am working with the RM dataset and online2 decoder.
I want the list of words that were misrecognized. The current script uses compute-wer to give the number of INS,DEL and SUB but does not list which words were recognized incorrectly. Does kaldi provide any utilities for such a task?
I know that there are tools like SCTK which give detailed analysis of the errors. I have seen that there are scripts for other datasets that use SCTK, RM however does not. Is it because SCTK requires references in the STM format?
What are my options here?
Thanks,
Yash
Yes, sctk requires stm-format references and it gets complex quickly.
But look at the usage message of compute-wer; you can run it in a mode
where it will output more detailed information and if you pipe it into sort
| uniq -c, I think you get something quite usable.
Dan
On Wed, Oct 29, 2014 at 4:16 PM, Yash y91@users.sf.net wrote:
Hi Dan,
Thanks for you prompt response yesterday. Now, I have the list of words that were recognized incorrectly. I was wondering if there is a way to output more stats for errors using compute-wer, like the files in which these errors occur?
For my purpose, I want to associate each of these errors with a test file.
Thanks,
Yash
That isn't possible, I'm afraid. We should probably implement a more
verbose mode at some point.
Dan
On Thu, Oct 30, 2014 at 4:08 PM, Yash y91@users.sf.net wrote: