As per my understanding of a speech recognition system (like PS for example),
three basic things that a system needs are: An acoustic model for a particular accent of that language. A dictionary that contains all the words (along with pronunciations) that the system can recognize.
* A language model or a grammar for specifying the possible word sequences.
My question is regarding some of the application out there that seem to
understand all kinds of accents of a language on the fly
and even recognize proper nouns like names of non-english origins correctly
(An iPhone app from Vlingo for example).
Any idea on following ?
Q01. How can these applications recognize any word including non english names
(Huge dictionary or something else)?
Q02. How can they correctly recognize all the accents of a language on the fly
?
Q03. Is the heart of such application a system like pocketsphinx or they use
something entirely different ?
Thanks and regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, As per my understanding of a speech recognition system (like PS for
example), three basic things that a system needs are: * An acoustic model for
a particular accent of that language.
Good acoustic model is good for many accents, moreover, modern recognizers can
decode using multiple acoustic models.
Q01. How can these applications recognize any word including non english
names (Huge dictionary or something else)?
Huge dictionary
Q02. How can they correctly recognize all the accents of a language on the
fly ?
They use good acoustic models but it's not true that they can correctly
recognize everything, they still have errors
Q03. Is the heart of such application a system like pocketsphinx or they use
something entirely different ? Thanks and regards,
Core algorithms are the same.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
As per my understanding of a speech recognition system (like PS for example),
three basic things that a system needs are:
An acoustic model for a particular accent of that language.
A dictionary that contains all the words (along with pronunciations) that the system can recognize.
* A language model or a grammar for specifying the possible word sequences.
My question is regarding some of the application out there that seem to
understand all kinds of accents of a language on the fly
and even recognize proper nouns like names of non-english origins correctly
(An iPhone app from Vlingo for example).
Any idea on following ?
Q01. How can these applications recognize any word including non english names
(Huge dictionary or something else)?
Q02. How can they correctly recognize all the accents of a language on the fly
?
Q03. Is the heart of such application a system like pocketsphinx or they use
something entirely different ?
Thanks and regards,
Good acoustic model is good for many accents, moreover, modern recognizers can
decode using multiple acoustic models.
Huge dictionary
They use good acoustic models but it's not true that they can correctly
recognize everything, they still have errors
Core algorithms are the same.