Re: [Espeak-general] espeak-zh multiple tone 3s

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Added a Native 5 to the data below.

If we treat "wǒ xiǎng" and "gěi nǐ" as
single 2-syllable words (as per Native 4's
previous suggestion), and then use the existing
selection algorithm but re-start it after every
2-syllable word (i.e. treat the end of a
2-syllable word as a phrase break for the
purposes of tone selection), then we'll have
acceptable renditions of everything, except
case 7 (because 可有 isn't really a word, but
for this case I wonder if the natives'
responses would have been different if I hadn't
hyphenated it - that hyphen was my mistake, but
the natives might have thought I'd got it from
some book and let it override their intuition).

A variant would be to do the above but
additionally check for the case of a 2-syllable
2-third-tones word followed by a single 3rd
tone, and if so then set both syllables of the
2-syllable word to tone 2.  This catches the
case described in Wikipedia's "Tone sandhi
rules at a glance", and also some of the
"alternative" versions that some of the natives
chose below.  However I'm not 100% convinced
that this more complex variant is worth coding,
given that the simpler version above is also
acceptable (and is probably so in more cases).

We will need to know about all 2-syllable words
that end in a 3rd tone, even the ones that use
their component characters' default pronunciations.
There are over 6000 of these in the version of
CEDICT on my hard disk. If we're getting hanzi
input then we could just hack this into
zh_listx (along with the special extra cases
"wǒ xiǎng" and "gěi nǐ"), and assume the
resulting pinyin is appropriately word-spaced
for our purposes.  But if we're getting pinyin
input then things get more difficult -
sometimes pinyin is written with a space after
EVERY syllable, and other times two or more
words are strung together into one. OK there is
some reduction in the word count when you're
using only pinyin, due to the fact that there's
more than one way to write some pinyin words as
hanzi, but we're still looking at over 5600
pinyin compounds (2800 if we count just the
ones whose first syllable is tone 3 or tone 4,
and I'm not sure that this is correct). So I
guess either eSpeak is going to need a special
data file for this, or we'll need to say that
pinyin input should be spaced right if you want
it to do correct 3rd-tone-sandhi blocking (and
we'd still need to look out for "wo3 xiang3" and
"gei3 ni3", and their equivalents like "wǒ
xiǎng" and "gěi nǐ", because we can't rely
on the user to remember NOT to space these).

What do people think?  I could easily change my
zh_listx-generating script to make sure the
relevant compounds come out with appropriate spacing,
but I might need some help getting eSpeak to use it.

Thanks.

Silas

(1) 我想请您 wǒ xiǎng qǐng nín
Natives 1 and 2: wó xiáng qǐng nín
Natives 3, 4, 5 and SinoVoice: wó xiǎng qǐng nín

(2) 美语补习班 Měiyǔ bǔxíbān
Natives 1, 3, 4, 5 and SinoVoice: Méiyǔ bǔxíbān
(4 says Měiyú bǔxíbān is also acceptable)
Native 2: Méiyú bǔxíbān

(3) 可以讨论 kěyǐ tǎolùn
Natives 1, 3, 4, 5 and SinoVoice: kéyǐ tǎolùn

(4) 次力量给你保持忍耐 ..gěi nǐ bǎochí
Natives 1, 3, 4, 5 and SinoVoice: ..géi nǐ bǎochí

(5) 教导你使你得益处 jiàodǎo nǐ shǐ nǐ dé yìchu
Natives 1, 3, 4, 5 and SinoVoice (and previous
group): jiàodǎo nǐ shí nǐ dé yìchu

(6) 只有少数人 zhǐyǒu shǎoshùrén
Natives 1, 3, 4, 5 and SinoVoice: zhíyǒu shǎoshùrén

(7) 可有可无 kěyǒu-kěwú
Native 1 and SinoVoice: kéyóu-kěwú
Natives 3, 4 and 5: kéyǒu-kěwú

(8) 至少有两个 zhìshǎo yǒu liǎng ge
Natives 1, 3, 4, 5 and SinoVoice: zhìshǎo
yóu liǎng ge

(9) 令人难以理解的 lìngrén nányǐ lǐjiě de
Natives 1, 4 and SinoVoice: lìngrén nányǐ líjiě de
Natives 3 and 5: lìngrén nányí líjiě de

(10) 可以怎样效法 kěyǐ zěnyàng
Native 1: kéyí zěnyàng
Natives 3, 4, 5 and SinoVoice: kéyǐ zěnyàng

(11) 什么方法可以改善 shénme fāngfǎ
kěyǐ gǎishàn
Natives 1, 3, 4 and 5 (and SinoVoice although it
seems to fault on the "me"): shénme fāngfǎ kéyǐ gǎishàn
Native 2 and previous group: shénme fāngfǎ kéyí gǎishàn

(12) 可以改善 kěyǐ gǎishàn
Native 1: kéyí gǎishàn
Natives 3, 4 and SinoVoice: kéyǐ gǎishàn

-- 
Silas S Brown http://people.pwf.cam.ac.uk/ssb22