Menu

Data preperation for training - requirements ?

Help
Peter
2018-03-20
2019-01-28
  • Peter

    Peter - 2018-03-20

    As the training I need to do is already recorded, I took a 10 minute
    WAV file, then ran it through webrtcvad
    ( https://github.com/wiseman/py-webrtcvad ) to split the audio into
    many small files (over 200).

    With creating the transcripts for each file, I'm assuming that I should
    attempt to write a word exactly as I hear it, is that correct. Here is
    an example:

    Spoken word --> because --> written word --> because
    Spoken word --> cause --> written word --> 'cause
    Spoken word --> dont --> written word --> don't
    Spoken word --> its --> written word --> it's
    Spoken word --> the um er --> written word --> the um er
    Spoken word --> the the --> written word --> the the

    Is it okay to put minor puntuation in the transcripts ??

    Am now going through each file (some are just noise or empty so these
    are deleted), listening to it, then recording the transcript in a text
    file. Have now gone through those (83) WAV files and the total duration
    is 5 min 48 seconds. The range is 00.63 seconds to 13.44 seconds, yet
    most of the audios are from between 1 second and 6 seconds.

    I seem to remember reading somewhere that it was recommended to have at
    least 1 hour of audio in preparation for the training. Will the 5 min
    48 seconds be sufficient ?

     
    • Nickolay V. Shmyrev

      Will the 5 min 48 seconds be sufficient ?

      No

       
      • Peter

        Peter - 2018-03-20

        On Tue, 20 Mar 2018 20:40:01 -0000
        "Nickolay V. Shmyrev" nshmyrev@users.sourceforge.net wrote:

        Will the 5 min 48 seconds be sufficient ?

        No

        Thanks Nickolay. Is 1 hr of audio the minimum required ?

        Is it okay to put minor puntuation in the transcripts ??

        Peter

         
  • Marc Wilhelm

    Marc Wilhelm - 2019-01-28

    Hi,
    I've got currently 50minutes audio input. But it is still to few to train context-dependent models.

    Does someone knows the minimum audio input length for a context-dependent model?

    BR
    Marc

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.