my goal is to produce the ROC curve for the decoder in KWS mode.
There are obviously many parameters to take into account. To sort them into categories:
- config of the decoder itself
- config of the KWS mode
- keyword choice based on the database
For the set of keywords, I will take the 30~ most likely keyword in the language model created from the audio corpus on which the test is run.
For the KWS config, I was thinking about varying kws_threshold around its default value of 1 for a starter.
Is there anything else I should look at in the decoder configuration - that should not be left to default?
(I am guessing these default values have been set through a number of tests in the development of the decoder.)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
my goal is to produce the ROC curve for the decoder in KWS mode.
This would be interesting to check.
For the KWS config, I was thinking about varying kws_threshold around its default value of 1 for a starter.
Values up to 1e-50 are reasonable. Values below will require you to change the beam which also affects detection. For clarity it's probably better to set the beam to 1e-200 and play just with threshold.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hej,
my goal is to produce the ROC curve for the decoder in KWS mode.
There are obviously many parameters to take into account. To sort them into categories:
- config of the decoder itself
- config of the KWS mode
- keyword choice based on the database
For the set of keywords, I will take the 30~ most likely keyword in the language model created from the audio corpus on which the test is run.
For the KWS config, I was thinking about varying kws_threshold around its default value of 1 for a starter.
Is there anything else I should look at in the decoder configuration - that should not be left to default?
(I am guessing these default values have been set through a number of tests in the development of the decoder.)
This would be interesting to check.
Values up to 1e-50 are reasonable. Values below will require you to change the beam which also affects detection. For clarity it's probably better to set the beam to 1e-200 and play just with threshold.
I got some results with the Voxforge database, and can share some if it is of interest. Like precision and recall for a specific list of keywords.
Is there a preferred place to store the information?
Hi toine. I am interested in your work and I would like to see the results. Can you share your reults with us?
Sure, it is interesting.
No, there is no place for performance tests yet. You can just share with dropbox.