Menu

Decode wordlist with numbers

Help
2017-05-15
2017-05-15
  • Carlo Benussi

    Carlo Benussi - 2017-05-15

    I am using g2p-seq2seq python scripts to decode a wordlist (in English) into its phonetic transcription. I would like to produce the transcriptions for the first 100 numbers, but I don't know how to correctly give (as a string) the numbers above 20.

    I tried feeding to the g2p script the numbers written plain (f.i. 'twentyone', 'sixtyeight', etc.) but controlling the output, I discovered several transcription errors (as expected, since for a machine 'twentyone' should be really strange as a pronounciation). The correct way to give them would be with the '-' character, but it is not accepted from the script.

    Has any of you met the same problem? How did you bypassed it?

    Thanks in advance

     
    • Nickolay V. Shmyrev

      You need to split the complex phrase on words (tokens) including expansion of the numbers to words and convert each token separately. 21 must be expanded to "twenty one" and you need to separately convert "twenty" and "one". seq2seq does not properly handle complex sequencies.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.