Menu

Adapting the default acoustic model

Help
2018-03-28
2018-03-29
  • Tavishi Gupta

    Tavishi Gupta - 2018-03-28

    So I am trying to adapt the acoustic model to increase accuracy of speech recognition. I have a couple of doubts when it comes to the recordings.
    1. .fileids is just a list of the names of the separate .wav files, right?
    2. .transcription is the .wav files transcribed?
    3. I will be cutting a huge audio file into pieces with the key words I want the system to recognize well. I have recordings of multiple people saying the same thing. I was wondering if the .fileids and the .transcription files only held information of one speaker saying the different sentences. Also, is there a way of training the system on multiple people? Do I just repeat the whole process on different .fileids and the .transcription files for each person?
    4. The recordings have some background noise, is this going to be a problem?

    Would really appreciate if I can clarify these questions!

     
    • Nickolay V. Shmyrev

      1. .fileids is just a list of the names of the separate .wav files, right?

      Yes

      1. .transcription is the .wav files transcribed?

      Yes

      1. I will be cutting a huge audio file into pieces with the key words I want the system to recognize well. I have recordings of multiple people saying the same thing. I was wondering if the .fileids and the .transcription files only held information of one speaker saying the different sentences.

      It could be multiple speakers at once.

      1. Also, is there a way of training the system on multiple people?

      It is the same as for single speaker

      Do I just repeat the whole process on different .fileids and the .transcription files for each person?

      Yes

      1. The recordings have some background noise, is this going to be a problem?

      Yes

       
      • Tavishi Gupta

        Tavishi Gupta - 2018-03-28

        Hi!
        Thank you for the prompt response!
        So for multiple speakers, do I just make different .wav files in the same .fileids of different people saying the same sentence? So for example, I can have 3 people say 5 different sentences and I can have all the 15 .wav files in the same .fileids and .transcription files, right?

        Also, how much of a problem can noise be? Do I basically need crystal clear recordings? I thought I was supposed to use recordings that were used from the same source as on what the voice recognition needs to be applied on.

         
        • Nickolay V. Shmyrev

          So for multiple speakers, do I just make different .wav files in the same .fileids of different people saying the same sentence? So for example, I can have 3 people say 5 different sentences and I can have all the 15 .wav files in the same .fileids and .transcription files, right?

          You can not have different files have the same filename. Different files always have different filenames.

          Also, how much of a problem can noise be?

          It will be a big problem.

          Do I basically need crystal clear recordings?

          It will not help you.

          I thought I was supposed to use recordings that were used from the same source as on what the voice recognition needs to be applied on.

          If you already know everything why do you ask so many questions?

           
          • Tavishi Gupta

            Tavishi Gupta - 2018-03-29
            I thought I was supposed to use recordings that were used from the same source as on what the voice recognition needs to be applied on.
            

            I don't know if you comprehend, but I was asking if that was a needed state.

            And you're saying it'll be an issue if there is crystal clear recording or if the recording has noise? I don't think I follow now.

            I would definitely appreciate less sass since I am just trying to learn and get a grip of this :)

             

Log in to post a comment.