Menu

Training a language model from synthesized speech

2013-02-02
2013-02-05
  • Harri Pasanen

    Harri Pasanen - 2013-02-02

    I wonder what sort of results would one expect by training a model solely by synthesized speech? Say targeting something like 500-1000 word vocabulary.

    One could use multiple available tts voices, and simulate more by applying effects like changing pitch and/or tempo etc.

    Basically trying to outsource all the tedious work to computers...

     
  • Nickolay V. Shmyrev

    I wonder what sort of results would one expect by training a model solely by synthesized speech? Say targeting something like 500-1000 word vocabulary.
    One could use multiple available tts voices, and simulate more by applying effects like changing pitch and/or tempo etc.
    Basically trying to outsource all the tedious work to computers...

    There are more natural ways to obtain large amount of speech data required for training. The one we pursue in CMUSphinx is the automatic alignment and training on the transcribed recordings. There are many public transcribed recordings avialable, so we can easily reuse transcriptions to build very accurate models.

    A proper tools have to be implemented to support that, but they are in development right now and will reach production state quite soon. Your help is welcome.

    It's worth to notice that just data will not help to build an accurate recognizer. Advanced algorithms and features have to be implemented too.

     
  • Harri Pasanen

    Harri Pasanen - 2013-02-04

    I noticed you neatly sidestepped answering the question ;)

    What I have in mind right now are limited vocabulary language models for command and control, using pocketsphinx, not restricted to English only.

    Another approach and question: Say I were to crowd source the training to smartphone users. Would audio compression via opus or mp3 vbr introduce significant degradation to training material?

    As to request for help, I'll nibble at the bait... What kind of help would be required? Is there a list of tasks somewhere?

     
  • Nickolay V. Shmyrev

    I noticed you neatly sidestepped answering the question ;)

    I'm trying to explain you the right way

    Another approach and question: Say I were to crowd source the training to smartphone users. Would audio compression via opus or mp3 vbr introduce significant degradation to training material?

    No, it doesn't really matter

    As to request for help, I'll nibble at the bait... What kind of help would be required?

    Software development

    Is there a list of tasks somewhere?

    The task is listed as I named it "Make sure that automatic alignment works".

     

Log in to post a comment.