I'm working on my term project (related to accessibility, ATs for the visually impaired on Linux), and I'm trying espeak with some classmates and friends of mine. However, we've been experiencing some difficulties trying to understand most of the synthetized text (well, *I*'m not really, since I'm the one telling espeak what to say :) , but they are). I tried to lower the speed, but it seems no much easier to understand, even so, and this is worrying me, since I'll work mostly with children during this year (who will probably have even more trouble to get everything that is said by espeak).
So, how could I make this voice more understandable? Where should I start? How can I contact, perhaps, the author of the original voice, to get more info on his/her work?
Thank you very much for your attention,
> how could I make this voice more understandable?
The Brazilian voice was made by me, with some feedback from a native speaker who made some suggestions. I don't speak Portuguese myself, so I'm dependent on feedback comments.
Try to identify what are the errors which cause the problems. Are they errors in the spelling-to-phonemes translation or the sounds of the phonemes themselves? Portuguese has fairly good spelling-to-phoneme rules, but if the position of the stressed syllable in a word is wrong, or if eSpeak uses the wrong choice of open or close "e" and "o", then that will affect intelligibility.
Perhaps there are some sounds which eSpeak makes badly. (I know that the "r" sounds are not good).
Are some vowels wrong or not distinct enough? Should it use a nasal vowel rather than vowel+n? Is the rhythm/cadence causing problems?
The voice can probably be improved gradually by a number of small improvements. People are generally not used to listening carefully to speech to identify exactly what the sounds are, but that's what's needed.
If you want to send me some example recordings of the correct pronunciation of words, then I can compare them with eSpeak and see what is the difference.
Hi Jonathan (and all),
thank you very much for your (quick) answer :)
I'm trying to identify what exactly the problems are (and asked my friends also, so their answer should come soon too), and yes, "r" is an issue (you can listen two files, one spoken by espeak and another by me; it says the path to get to the "calculator" menu option in xfce: "menu, accessories, calculator", in portuguese). Is it a known behavior? Is there a way to make it a little more "soft", perhaps? You see, in portuguese we have two "main" (if I can call this way) "r" sounds. The first is when "r" is alone, between two other letters (vowels or consonants), such as in "trabalho" ("work", n./v.) ou "pássaro" ("bird", n.). The other case is when it begins the word, or is double, "rr", as "carro" ("car", n.), "correr" ("to run", v.). Espeak seems to speak both "r"'s the same way, like the second case. (I'm not a language expert, however, so I may be missing one or other detail; anyway, that's how we're taught at school to write, so I guess it's pretty accurate :) )
Besides that, there is this (nasal) "ã" sound that doesn't show up when in the end of the word. "amanhã" ("tomorrow", adv.) sounds like "amãen"; "manhã" ("morning", n.) sounds like "mãe" (which is "mother" n., a real word in portuguese). I think this one may be easier to fix, perhaps.
I hope I gave you enough detail to be able to figure out what the problems are. If I was too subjective in one or other detail, please let me know and I'll try to explain it better. There's probably more tiny details, but I couldn't give you real data, unless the "feeling that looks wrong" :/ I'll talk to my friends, and get back here to tell you what they thought of it. :)
Thanks for your attention,
PS: I recorded a few others (both in espeak and by myself) that sound differently, so if you wanna take a look, they're here: "correio" ("mail", n.), "alheio" ("unaware", adj.), "forte" ("strong", adj., which should be open) -- "fortaleza" ("fortress", n.), however, is perfect) -- , "costa" ("coast", n., which should be open also) -- "costeira" ("coastal"(?), adj.) is very good also.
PS2: I also have the files in the wav format, if it's better. If you want them, please ask me (or just change the ogg for the wav extension in the urls, I used the same filenames).
 "menu, acessórios, calculadora" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070222-1/03.ogg>.
 "menu, acessórios, calculadora" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/03-human.ogg>.
 "trabalho" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/trabalho-espeak.ogg>.
"trabalho" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/trabalho-human.ogg>.
 "pássaro" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/passaro-espeak.ogg>.
"pássaro" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/passaro-human.ogg>.
 "carro" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/carro-espeak.ogg>.
"carro" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/carro-human.ogg>.
 "correr" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/correr-espeak.ogg>.
"correr" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/correr-human.ogg>.
 "amanhã" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/amanha-espeak.ogg>.
"amanhã" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/amanha-human.ogg>.
 "manhã" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/manha-espeak.ogg>.
"manhã" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/manha-human.ogg>.
 "mãe" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/mae-espeak.ogg>.
"mãe" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/mae-human.ogg>.
 "correio" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/correio-espeak.ogg>.
"correio" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/correio-human.ogg>.
 "alheio" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/alheio-espeak.ogg>.
"alheio" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/alheio-human.ogg>.
 "forte" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/forte-espeak.ogg>.
"forte" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/forte-human.ogg>.
 "fortaleza" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/fortaleza-espeak.ogg>.
"fortaleza" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/fortaleza-human.ogg>.
 "costa" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/costa-espeak.ogg>.
"costa" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/costa-human.ogg>.
 "costeira" (espeak): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/costeira-espeak.ogg>.
"costeira" (human): <http://mirian.is.dreaming.org/classes/tc-2007/20070223-1/costeira-human.ogg>.
I haven't looked at everything closely yet, but here are some comments:
1). 03.ogg. I'm not sure what problem this shows. I notice that eSpeak puts the stress in "menu" on the final syllable, but you stress the first syllable. http://en.wikipedia.org/wiki/Portuguese_language#Stress says:
"There is a partial correlation between the position of the stress and the final vowel; for example,
the final syllable is usually stressed when it contains a nasal phoneme, a diphthong, or a close vowel.
The orthography of Portuguese takes advantage of this correlation to minimize the number of diacritics."
This suggests that a final "u" should be stressed, unless there is an accent on another vowel. Is "menu" an exception to this rule?
The other difference which I notice is that eSpeak's "u" vowel is more "close" than yours. Perhaps it would sound better with a less close "u"? Or is the closer "u" more clear?
2). amanhã and manhã. This is very strange. eSpeak misses the final vowel. This problem doesn't happen when I try it here! What text output do you get when you use the -X option to show a translation log?:
espeak -Xvpt "amanhã"
To hear what eSpeak would say if the spelling-to-phoneme translation was correct, you could specify phoneme mnemonics directly (note that [&~] is the nasal-A phoneme):
espeak -vpt "[[&~m&~n^'&~]]"
... which puts the stress on the final "ã". However your recording has the stress on the second syllable:
espeak -vpt "[[&~m'&~n^&~]]"
... which disagrees with the rule (in the Wikipedia article) that "the final syllable is usually stressed when it contains a nasal phoneme".
You could try the latest development version at http://home.clara.net/jsd/espeak. Is that different?
3). costa/costeira, forte/fortaleza. Open/close "o". How can eSpeak know when you use an open "o"? Must it have a list of all the Portuguese words with open "o", or are there some rules which can reduce the number of exceptions? The dictsource/pt_rules file already includes some exceptions (in the ".group o" section). Similarly for open/close "e".
> amanhã and manhã. This is very strange. eSpeak misses the final vowel. This problem doesn't happen when I try it here!
I've found the fault and fixed it in eSpeak 1.20.07, available at http://home.clara.net/jsd/espeak/
I've also changed "forte" and "costa" to use an open "o", but I expect there will be many for words that need this.
The problem with the missing final "ã" happened because you were using ISO-8859-1 text (where rather "ã" is encoded as a single byte: 0xe3) rather than UFT8 encoding (which uses two bytes: 0xc3 0xa3). eSpeak should detect automatically which you are using. If it finds a character sequence which is not valid UFT8, then it switches to ISO-8859 (-1 for Portuguese, but other variants for other languages). This detection didn't work if the first non-ascii (accented) character was the last character of the next. This is now fixed. Thanks for discovering the problem!
Hello again :)
On the open "o" (and "e") matter, is there any way of users change it locally, while they appear (and after it sending "patches", of course)? Or the correct way of doing it is really reporting it in this forum?
> On the open "o" (and "e") matter, is there any way of users change it locally
Yes. The Portuguese spelling-to-phoneme rules are in dictsource/pt_rules. The exceptions list is dictsource/pt_list. The syntax of these is described in docs/dictionary.html. They use UTF8 encoding.
pt_rules contains some rules about when to use open "e" and "o" (phonemes [E] and [O]) and close "e" and "o" (phonemes [e] and [o]). For example, there's a rule which says that the word ending "ebe" has an open "e" before the "b". The pronunciations of individual words can be added to pt_list.
If you make changes to pt_rules and pt_list, you can compile them to produce a new espeak-data/pt_dict for eSpeak to use. Do the command:
from inside the dictsource directory. If your espeak-data directory is in /usr/share (rather than in the top level of your user's Home directory) you will need root access to be able to update it.
So you can experiment and make corrections. You can email these to me directly if you wish, rather than on this forum.
To change the sound of the Portuguese phonemes is more complicated and needs the espeakedit program. Changing vowel sounds is fairly simple, but making some consonants, such as "r", is difficult.
In pt_rules there are two unusual phonemes, [&/] and [i/]. These are used to produce "elide" effects. [i/] is used for unstressed "e" at the end of a word. It behaves as [i] except if the next word starts with a vowel, in which case it becomes a [j]. The [&/] phoneme is used when a word ends in an "a" and the next word also starts with an "a". In this case the first "a" becomes nothing. The question is: are these rules good because they make the speech flow better and sound more natural, or are they bad because they make it more difficult to identify the words? Does that make sense?
Thanks for the guidance in creating/changing sounds in espeak, I'll try it out when I feel it's necessary and send the changes to you.
On your last question, I'm not sure what you meant... you see, I'm not into this area long enough, but these two goals --both the "natural speech" and "intelligible speech"-- can be achieved together, can't they? (Or were you referring to this specific case of a's?)
(and sorry for the delay in answering your last posts!)
1) actually, I was not referring to "menu"; here in Brasil we have, as I know most countries must have also, different accents, and "menu" did no harm in this case, everybody got it right; most problems was referring to the last word, "calculadora", and I guess is that "r" thing you mentioned in your previous post;
2) yes, using the new version (and changing my terminal encoding from ISO-8859-1 to UTF-8, also) fixed the issue! thanks! :)
3) I really don't know how it works in theory, sorry. we just know it's not like we use to hear, but nothing more. :( anyway, I've thought a lot about this specific issue, and I've realized that in some regions of Brasil (not the case of southern Brasil, where I live) people usually use open "o" and "e" for any occurence of "o" and "e". so, I don't really know what could be a good approach for that. (I don't know if there is any other brasilian portuguese speaker monitoring this forum that could help a bit more? hello?)
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.