Decode wordlist with numbers

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Decode wordlist with numbers

Forum: Help

Creator: Carlo Benussi

Created: 2017-05-15

Updated: 2017-05-15

Carlo Benussi - 2017-05-15

I am using g2p-seq2seq python scripts to decode a wordlist (in English) into its phonetic transcription. I would like to produce the transcriptions for the first 100 numbers, but I don't know how to correctly give (as a string) the numbers above 20.

I tried feeding to the g2p script the numbers written plain (f.i. 'twentyone', 'sixtyeight', etc.) but controlling the output, I discovered several transcription errors (as expected, since for a machine 'twentyone' should be really strange as a pronounciation). The correct way to give them would be with the '-' character, but it is not accepted from the script.

Has any of you met the same problem? How did you bypassed it?

Thanks in advance

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-05-15
  
  You need to split the complex phrase on words (tokens) including expansion of the numbers to words and convert each token separately. 21 must be expanded to "twenty one" and you need to separately convert "twenty" and "one". seq2seq does not properly handle complex sequencies.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.