Menu

Train and test data

Help
2017-03-22
2017-03-22
  • Tania Mendonca

    Tania Mendonca - 2017-03-22

    what is an efficient way of choosing the audios for training and testting?
    for now I have 1267 audios in female voice as well as 1267 audios in male voice with same transcriptions. i have taken the same 1013(wich equals to 2026 total training audio's) audio's as train from female and male and the remaining as the test data. Please let me know if this is the right way?(80% train and 20% for test). The data is nearly around 5.8hrs.
    initially my audio was
    $:soxi text_1.wav

    Input File : 'text_1.wav'
    Channels : 1
    Sample Rate : 48000
    Precision : 16-bit
    Duration : 00:00:12.39 = 594854 samples ~ 929.459 CDDA sectors
    File Size : 1.19M
    Bit Rate : 768k
    Sample Encoding: 16-bit Signed Integer PCM

    This is the command i used to downsample to 16kHz
    sox text_1.wav -b 16 text_1-f.wav rate 16k

    After resampling
    Input File : 'text_1-f.wav'
    Channels : 1
    Sample Rate : 16000
    Precision : 16-bit
    Duration : 00:00:12.39 = 198285 samples ~ 929.461 CDDA sectors
    File Size : 397k
    Bit Rate : 256k
    Sample Encoding: 16-bit Signed Integer PCM

    Though the training was successful for the word model(with some errors related to some audio files not reaching final state, where i removed such audios) I'm getting an word error rate of 93%.
    I assume that as the data increases the WER should decrease? But I've seen an increase in the WER which leads to poor accuracy
    is it because I have opted a wrong way of dividing train and test?(Should there be some sentences from train in the test set?)
    or is it becoz i downsampled the audio to 16kHz that the recognition is so horrible?
    or is it due to the feature extraction?

     

    Last edit: Tania Mendonca 2017-03-22
    • Arseniy Gorin

      Arseniy Gorin - 2017-03-23

      Quite many questions in one place. Still:

      Please let me know if this is the right way?(80% train and 20% for test).

      This ratio seems OK. You can even do 10% for testing to have more training data.

      I'm getting an word error rate of 93%.

      This is too much. Something likely was wrong. You should probably analyse alignment files in the decoding directory. You can also share your working directory for further analysis

      (Should there be some sentences from train in the test set?)

      No. Moreover, it is not recommended to have the same speaker in train and test. But it is good to have train and test phonetically balanced.

      or is it becoz i downsampled the audio to 16kHz that the recognition is so horrible?

      unlikely

      or is it due to the feature extraction?

      could be. but it is difficult to say without having your working directory to replicate

       
      • Tania Mendonca

        Tania Mendonca - 2017-03-24

        When i checked the perplexity of my language model on the test set i got a perplexity of 5700 and the OOV rate is 53%
        Is it because of this my WER is 93%?

         
        • Nickolay V. Shmyrev

          Yes

           
          • Tania Mendonca

            Tania Mendonca - 2017-03-27

            Is there a way i can reduce the perplexity of the language model

             
            • Nickolay V. Shmyrev

              Use more relevant data in language model training.

               

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.