CMU Sphinx / Forums / Help: Adapting the default acoustic model

Tavishi Gupta - 2018-03-28

So I am trying to adapt the acoustic model to increase accuracy of speech recognition. I have a couple of doubts when it comes to the recordings.
1. .fileids is just a list of the names of the separate .wav files, right?
2. .transcription is the .wav files transcribed?
3. I will be cutting a huge audio file into pieces with the key words I want the system to recognize well. I have recordings of multiple people saying the same thing. I was wondering if the .fileids and the .transcription files only held information of one speaker saying the different sentences. Also, is there a way of training the system on multiple people? Do I just repeat the whole process on different .fileids and the .transcription files for each person?
4. The recordings have some background noise, is this going to be a problem?

Would really appreciate if I can clarify these questions!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-03-28
  
  .fileids is just a list of the names of the separate .wav files, right?
  
  Yes
  
  .transcription is the .wav files transcribed?
  
  Yes
  
  I will be cutting a huge audio file into pieces with the key words I want the system to recognize well. I have recordings of multiple people saying the same thing. I was wondering if the .fileids and the .transcription files only held information of one speaker saying the different sentences.
  
  It could be multiple speakers at once.
  
  Also, is there a way of training the system on multiple people?
  
  It is the same as for single speaker
  
  Do I just repeat the whole process on different .fileids and the .transcription files for each person?
  
  Yes
  
  The recordings have some background noise, is this going to be a problem?
  
  Yes
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Tavishi Gupta - 2018-03-28
    
    Hi!
    Thank you for the prompt response!
    So for multiple speakers, do I just make different .wav files in the same .fileids of different people saying the same sentence? So for example, I can have 3 people say 5 different sentences and I can have all the 15 .wav files in the same .fileids and .transcription files, right?
    
    Also, how much of a problem can noise be? Do I basically need crystal clear recordings? I thought I was supposed to use recordings that were used from the same source as on what the voice recognition needs to be applied on.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2018-03-28
      
      So for multiple speakers, do I just make different .wav files in the same .fileids of different people saying the same sentence? So for example, I can have 3 people say 5 different sentences and I can have all the 15 .wav files in the same .fileids and .transcription files, right?
      
      You can not have different files have the same filename. Different files always have different filenames.
      
      Also, how much of a problem can noise be?
      
      It will be a big problem.
      
      Do I basically need crystal clear recordings?
      
      It will not help you.
      
      I thought I was supposed to use recordings that were used from the same source as on what the voice recognition needs to be applied on.
      
      If you already know everything why do you ask so many questions?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Tavishi Gupta - 2018-03-29
        
        I thought I was supposed to use recordings that were used from the same source as on what the voice recognition needs to be applied on.
        
        I don't know if you comprehend, but I was asking if that was a needed state.
        
        And you're saying it'll be an issue if there is crystal clear recording or if the recording has noise? I don't think I follow now.
        
        I would definitely appreciate less sass since I am just trying to learn and get a grip of this :)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adapting the default acoustic model

Speech Recognition Toolkit

Forums

Help

Adapting the default acoustic model document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Adapting the default acoustic model