Hello everyone, I am working on an application that must recognize words or combinations of words, not phrases or dictations, according to what I can understand, the best options are to search for "keywords" or a grammar.
Due to the amount of words a grammar would be complicated and extensive, so I am trying with the "keywords" the results are not as expected yet, could someone help me understand how it works?
1-Regarding the threshold:
I have read that the value determines the level of detection vs false positives, but I do not understand how it works
how is a threshold value interpreted?
/ 1e-20 / is it some kind of range? (from "1e" to "20")
of what? (world length? audio lengh? other?)
is it a unique value "1e-20"? of what?
What do the values "1e" or "20" mean? are they hexa values? is the mathematical constant "e" with an exponent?
2-Regarding how the search is processed
as an example of my case:
list of keywords and their threshold
engage / 1e-20 /
search target / 1e-30 /
aim / 1e-10 /
aiming target / 1e-40 /
target / 1e-20 /
dictionary (dic file, not copying the phonems, but they are there):
engage
search
target
aim
aiming
weapon
enemy
How is the search processed? repeating "target" is a problem?
What does pocketsphinx do with detections that are in the dictionary, but not in the "keywords" list?
3-Regarding the "utterance"
in the case of continuous speech, how do pocketsphinx determine where to parse / stop?
in my application if I keep a continuous speech I have a crash, some overflow that I am not handling.
thanks in advance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello everyone, I am working on an application that must recognize words or combinations of words, not phrases or dictations, according to what I can understand, the best options are to search for "keywords" or a grammar.
Due to the amount of words a grammar would be complicated and extensive, so I am trying with the "keywords" the results are not as expected yet, could someone help me understand how it works?
1-Regarding the threshold:
I have read that the value determines the level of detection vs false positives, but I do not understand how it works
how is a threshold value interpreted?
/ 1e-20 / is it some kind of range? (from "1e" to "20")
of what? (world length? audio lengh? other?)
is it a unique value "1e-20"? of what?
What do the values "1e" or "20" mean? are they hexa values? is the mathematical constant "e" with an exponent?
2-Regarding how the search is processed
as an example of my case:
list of keywords and their threshold
engage / 1e-20 /
search target / 1e-30 /
aim / 1e-10 /
aiming target / 1e-40 /
target / 1e-20 /
dictionary (dic file, not copying the phonems, but they are there):
engage
search
target
aim
aiming
weapon
enemy
How is the search processed? repeating "target" is a problem?
What does pocketsphinx do with detections that are in the dictionary, but not in the "keywords" list?
3-Regarding the "utterance"
in the case of continuous speech, how do pocketsphinx determine where to parse / stop?
in my application if I keep a continuous speech I have a crash, some overflow that I am not handling.
thanks in advance.