Menu

Bad accuracy on call center recordings

Help
Umair
2019-05-08
2019-05-29
  • Umair

    Umair - 2019-05-08

    Dear Team,
    I have built an Urdu based recognition model which has very good accuracy when decoder is provided the trained recordings or microphone input. My model had 50% SER and 21% WER.

    The real recordings of call center are 16bit, 8Khz, mono channel while the model is 16bit, 16Khz, mono channel. I thought due to the mismatch of sampling frequency it is not able to decode recordings so i built a new model with 8Khz but the accuracy is still very poor.

    I have 44 utterances and 15 speakers.

    I am stuck in this issue since long. Please help.

     
    • Nickolay V. Shmyrev

      Callcenter model requires a lot of data, I don't think you have it.

       
  • Umair

    Umair - 2019-05-09

    Your early response will be highly appreciated.

     
  • Umair

    Umair - 2019-05-09

    Dear,
    Please have a look at my trained model. I understand that i have less data. Is this could be the only reason for not being able to decode speaker independent recordings. What size do you suggest in this case ?

    When i do recording myself using words from the trained vocabulary, it somehow decodes it correctly while the same words if used in the call center calls, it fails to recognize. I don't know why.

    Really appreciate your help.

    Thanks.

     
    • Nickolay V. Shmyrev

      You need 1000 hours of training data recordings of callcenter. If you don't have that, your model will not work accurately.

       
  • Umair

    Umair - 2019-05-09

    Dear I need to only recognize the specific vocabulary of 44 sentences. Does it also require 1000 hours to work ?

    Model attached.

     
    • Nickolay V. Shmyrev

      Yes

       
  • Umair

    Umair - 2019-05-29

    Dear Nickolay,

    I am in process of adding more data for training, however i just found the recordings i am using for training are 16bit, 16Khz, 16 bit Signed Integer PCM encoded. While the recordings used for testing are 16 bit, 8Khz GSM encoded. See attached. I have converted them to PCM inorder to sphinx decode them but accuracy is very very bad. If i provide recordings of the same format (16bit, 16Khz, 16 bit Signed Integer PCM) results are still good.

    May i know what are the issues with GSM encoded and how can i tackle this issue ?

    Thank you.

    Regards.

     
    • Nickolay V. Shmyrev

      You have to convert both training and test data to 8khz 16bit PCM before training and testing. And you need to train accordingly.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.