I'm interested in word spotting by using SPHINX3, and I want to use garbage filler model to do it.
But i have no idea about how to training a garbage filler model.
Because their are no garbage in transcription.
Everybody who can tell me something Or is there some material I can read.
THANKS
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In my opinion keyword spotting is very different task, it requires modified search algorithm at least and something like filler model won't help, the task is too different from usual decoding.
But since this is very essential task we probably have to discuss the ways it should be implemented. It really not that complicated I think and current structure should help a lot, but search algorithm must be updated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
asfaik common approaches for keyword spotting are not that different from closed set ASR: Instead of having only dictionary-word-models in the search space, keyword-spotters mostly add a generic phone-model (with a penalized entry point). To improve perfomance this subwordmodel is often constrained using special subword-grammars which encode the phonotactic constraints of the language.
A series of recent experiments was done by Bazzi&Glass (try "Modeling Out-of-vocabulary words for robust speech recognition" as an entry-point)
Anayway, instead of modeling your own OOV-model you could also use S4 which comes along with the necessary phone-models and which is already able to perform keyword-spotting (Search for CIPhoneLoop in the codebase)
Best regards,
Holger
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
"... you could also use S4 which comes along with the necessary phone-models and which is already able to perform keyword-spotting (Search for CIPhoneLoop in the codebase)"
I need some enlightenment please. I'm using S4 but I can't seem to be able to perform keyword-spotting just by referencing the available methods ... do I need a modified FlatLinguist.java/CIPhoneLoop.java?
I'm basically looking into ways that will allow me to bring back not only the "<unk>" value but also the words/phones that ended up being a good match.
For example, let's say my grammar only contains the words "car" and "blue" and my recording says "the car is blue", I'm trying to get this type of result.
<unk> car <unk> blue
From what I gathered/tested so far, I always get excellent results when trying to match my recordings against a list of pre-defined sentences but if the speaker says something a little bit off, I can't seem to be able to track/capture that.
Here's an example. If I create a single grammar rule "the car isn't blue", it still matches my recording even though there was a slightly variation in the sentence(is versus isn't) ... one thing that would help me would be an additional piece of info stating how good was the word match so I could eliminate words that didn't quite match the recording.
Something like...
the [score:0.99] car [score:0.89] isn't [score:0.29] blue [score:0.82]
so even though I would get a full match based on the grammar, I would still be able to apply my own decision on whether it's a good match or not.
Is it possible to accomplish any of this?
Thank you!
Andre
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there a way of recognizing parts of an utterance as "out of grammar" and parts as words from the vocabulary ?
E.g. my vocabulary is very small (about 30 words), but when decoding a whole sentence with many unknown words in it, the whole result will be <unk>, instead of only parts of it.
Look at the other thread for details please ...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your advice.
But i want to start with garbage model training. I have transcribed data. I don't know how to train a garbage filler model from these data. Everybody who can tell me something Or is there some material I can read.
Thanks again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm interested in word spotting by using SPHINX3, and I want to use garbage filler model to do it.
But i have no idea about how to training a garbage filler model.
Because their are no garbage in transcription.
Everybody who can tell me something Or is there some material I can read.
THANKS
In my opinion keyword spotting is very different task, it requires modified search algorithm at least and something like filler model won't help, the task is too different from usual decoding.
But since this is very essential task we probably have to discuss the ways it should be implemented. It really not that complicated I think and current structure should help a lot, but search algorithm must be updated.
Hi,
asfaik common approaches for keyword spotting are not that different from closed set ASR: Instead of having only dictionary-word-models in the search space, keyword-spotters mostly add a generic phone-model (with a penalized entry point). To improve perfomance this subwordmodel is often constrained using special subword-grammars which encode the phonotactic constraints of the language.
A series of recent experiments was done by Bazzi&Glass (try "Modeling Out-of-vocabulary words for robust speech recognition" as an entry-point)
Anayway, instead of modeling your own OOV-model you could also use S4 which comes along with the necessary phone-models and which is already able to perform keyword-spotting (Search for CIPhoneLoop in the codebase)
Best regards,
Holger
"... you could also use S4 which comes along with the necessary phone-models and which is already able to perform keyword-spotting (Search for CIPhoneLoop in the codebase)"
I need some enlightenment please. I'm using S4 but I can't seem to be able to perform keyword-spotting just by referencing the available methods ... do I need a modified FlatLinguist.java/CIPhoneLoop.java?
I'm basically looking into ways that will allow me to bring back not only the "<unk>" value but also the words/phones that ended up being a good match.
For example, let's say my grammar only contains the words "car" and "blue" and my recording says "the car is blue", I'm trying to get this type of result.
<unk> car <unk> blue
From what I gathered/tested so far, I always get excellent results when trying to match my recordings against a list of pre-defined sentences but if the speaker says something a little bit off, I can't seem to be able to track/capture that.
Here's an example. If I create a single grammar rule "the car isn't blue", it still matches my recording even though there was a slightly variation in the sentence(is versus isn't) ... one thing that would help me would be an additional piece of info stating how good was the word match so I could eliminate words that didn't quite match the recording.
Something like...
the [score:0.99] car [score:0.89] isn't [score:0.29] blue [score:0.82]
so even though I would get a full match based on the grammar, I would still be able to apply my own decision on whether it's a good match or not.
Is it possible to accomplish any of this?
Thank you!
Andre
There's a similar thread about this topic, using the CIPhoneLoop in S4 here:
http://sourceforge.net/forum/forum.php?thread_id=1915464&forum_id=382337
Is there a way of recognizing parts of an utterance as "out of grammar" and parts as words from the vocabulary ?
E.g. my vocabulary is very small (about 30 words), but when decoding a whole sentence with many unknown words in it, the whole result will be <unk>, instead of only parts of it.
Look at the other thread for details please ...
"S4 which comes along with the necessary phone-models and which is already able to perform keyword-spotting"
When you say S4 do you mean Sphinx 4?
yes
Thanks for your advice.
But i want to start with garbage model training. I have transcribed data. I don't know how to train a garbage filler model from these data. Everybody who can tell me something Or is there some material I can read.
Thanks again.