I used the online LM Tool to generate a language model and a dictionary from a set of ~5K phrases.
Due to the nature of the domain, my phrases contain words from four different languages: English, French, Italian and Spanish. In many cases the dic file is correct in their pronunciation, but some of the words are completely incorrect (e.g. when a word looks like an English word, but it's actually French).
The current WER on a random subset is 25%. I would like to improve this number to 50-70%, if possible.
For example I could tweak the dictionary for commonly misrecognized words. How much improvement could I expect from this? And would I need to do anything else other than tweak the dic file?
I found a pronunciation dictionary for French words (the one by LIUM), which I would like to use, but it has a different set of phonetic symbols, e.g.:
Sphinx:
PAPE P EY P
LIUM:
pape pp aa pp
pape(2) pp aa pp ee
What's the best way to translate the LIUM to Sphinx notation? And is it actually possible to combine a set of English and French phonemes in one dictionary? Specifically French phonemes like "nn yy" in Bretagne, or the French "rr".
Thanks a lot,
Aly
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The current WER on a random subset is 25%. I would like to improve this number to 50-70%, if possible
50% WER is worse than 25% WER. You probably meant something else
And would I need to do anything else other than tweak the dic file?
Ideally if you want to recognize several languages at once you need to construct a joined acoustic model for them, you can't expect US English acoustic model to be good at recognizing French sounds
And is it actually possible to combine a set of English and French phonemes in one dictionary?
It's easy to edit the dictionary but beside the dictionary phonemes must be present in the model. That's the issue
To learn more about interoperation between the dictionary, the acoustic model and the language model please read the tutorial
Or is there a way of combining the models you list here into one?
CMUSphinx doesn't provide such tools.
Or could I get away with adaptation?
Its hard to give you an accuracy advice without seeing your test data. To get help on accuracy you need to share a test set.
Most of the users will be English speakers, and my guess is that their majority won't be able to pronounce the French phonemes exactly anyway.
Then why do you need French sounds at all? If most of your speakers are English your task has nothing to do with different languages.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2014-01-11
I'm not sure about this, but my hope is that to get the performance level I'm after, I don't need the French sounds.
What I need though is the pronunciation of French words. In the example above, "pape" can be pronounced the English way (P EY P), which is what the LM tool put into the dic file, or the French way (PP AA PP), as in the LIUM dictionary.
So, from your reply I understood that if I want to combine French and English phonemes in one model, I need to create my own model.
However, could I not simply combine two pronunciation dictionaries? E.g.:
a) create a phoneme dictionary for , e.g. "PP" = "P", "AA" = "AH"
b) select words that are French and should be corrected (probably manually)
c) copy pronunciation from the LIUM dictionary into my dic file, while translating the phonemes according to a).
Does this make sense?
Any advice on how to approach point a) best?
Thank you!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In the example above, "pape" can be pronounced the English way (P EY P), which is what the LM tool put into the dic file, or the French way (PP AA PP), as in the LIUM dictionary.
There is no such thing as English way or French way, you can not consider the dictionary alone. Dictionary is considered in a tight connection to the acoustic model and maps sounds to the detectors in the acoustic model. So PP AA PP is a transcription for the LIUM French acoustic model, not the French way.
However, could I not simply combine two pronunciation dictionaries? E.g.:
If you want to fix pronunciation of certain words in CMUDict phonetic dictionary using CMUDict phoneset you can do that. It is not a combination process, you just update the pronunciation using CMUDict phone inventory available.
You can use a mapping from LIUM French phoneset to CMUDict phoneset for that.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2014-01-11
You mean there is a mapping between these phonesets done by somebody else or I need to come up my own?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I used the online LM Tool to generate a language model and a dictionary from a set of ~5K phrases.
Due to the nature of the domain, my phrases contain words from four different languages: English, French, Italian and Spanish. In many cases the dic file is correct in their pronunciation, but some of the words are completely incorrect (e.g. when a word looks like an English word, but it's actually French).
The current WER on a random subset is 25%. I would like to improve this number to 50-70%, if possible.
For example I could tweak the dictionary for commonly misrecognized words. How much improvement could I expect from this? And would I need to do anything else other than tweak the dic file?
I found a pronunciation dictionary for French words (the one by LIUM), which I would like to use, but it has a different set of phonetic symbols, e.g.:
Sphinx:
PAPE P EY P
LIUM:
pape pp aa pp
pape(2) pp aa pp ee
What's the best way to translate the LIUM to Sphinx notation? And is it actually possible to combine a set of English and French phonemes in one dictionary? Specifically French phonemes like "nn yy" in Bretagne, or the French "rr".
Thanks a lot,
Aly
50% WER is worse than 25% WER. You probably meant something else
Ideally if you want to recognize several languages at once you need to construct a joined acoustic model for them, you can't expect US English acoustic model to be good at recognizing French sounds
It's easy to edit the dictionary but beside the dictionary phonemes must be present in the model. That's the issue
To learn more about interoperation between the dictionary, the acoustic model and the language model please read the tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialconcepts
Thanks for the quick answer, Nickolay.
You are right, I meant the Accuracy output of the WER script. It looks like this:
TOTAL Words: 1762 Correct: 697 Errors: 1309
TOTAL Percent correct = 39.56% Error = 74.29% Accuracy = 25.71%
When you say "construct", do you mean create a new one, as explained here:
http://cmusphinx.sourceforge.net/wiki/tutorialam
Or could I get away with adaptation?
Or is there a way of combining the models you list here into one?
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
Would it make sense to express the French phonemes using their closest English phonemes in the current dictionary?
Most of the users will be English speakers, and my guess is that their majority won't be able to pronounce the French phonemes exactly anyway.
Thanks a lot,
Aly
Yes.
CMUSphinx doesn't provide such tools.
Its hard to give you an accuracy advice without seeing your test data. To get help on accuracy you need to share a test set.
Then why do you need French sounds at all? If most of your speakers are English your task has nothing to do with different languages.
I'm not sure about this, but my hope is that to get the performance level I'm after, I don't need the French sounds.
What I need though is the pronunciation of French words. In the example above, "pape" can be pronounced the English way (P EY P), which is what the LM tool put into the dic file, or the French way (PP AA PP), as in the LIUM dictionary.
So, from your reply I understood that if I want to combine French and English phonemes in one model, I need to create my own model.
However, could I not simply combine two pronunciation dictionaries? E.g.:
a) create a phoneme dictionary for , e.g. "PP" = "P", "AA" = "AH"
b) select words that are French and should be corrected (probably manually)
c) copy pronunciation from the LIUM dictionary into my dic file, while translating the phonemes according to a).
Does this make sense?
Any advice on how to approach point a) best?
Thank you!
There is no such thing as English way or French way, you can not consider the dictionary alone. Dictionary is considered in a tight connection to the acoustic model and maps sounds to the detectors in the acoustic model. So PP AA PP is a transcription for the LIUM French acoustic model, not the French way.
If you want to fix pronunciation of certain words in CMUDict phonetic dictionary using CMUDict phoneset you can do that. It is not a combination process, you just update the pronunciation using CMUDict phone inventory available.
You can use a mapping from LIUM French phoneset to CMUDict phoneset for that.
You mean there is a mapping between these phonesets done by somebody else or I need to come up my own?
You have to create your own mapping.