Menu

Guidance required in creating Specialized vocabulary application in Indian English accent

Help
2018-08-23
2018-09-10
  • Aneri Sheth

    Aneri Sheth - 2018-08-23

    I am working on using CMU SPHINX4 for transcribing online programming lectures conducted using Indian English accent. I have tried adapting the acoustic model with approximately 3-4 hours of audio recordings containing recordings from around 3 different speakers and a limited vocabulary of 1300 words. The accuracy only slightly improved than that of the default en-us model applied on the recordings.

    However the accuracy is still way below the acceptable levels. That led me to a question :

    Is training the model from scratch required for recognizing Indian English accents?

    I initially tried using adaptation because of the following instructions mentioned on the cmusphinx site:

    Adaptation is known to work well when you are using different recording environments (close-distance or far microphone or telephone channel), or when a slightly different accent (UK English or Indian English) or even another language is present. Adaptation, for example, works well if you need to quickly add support for some new language just by mapping a phoneset of an acoustic model to a target phoneset with the dictionary.
    

    But adaptation is giving a very poor accuracy for all variations of noise cancellation, signal preprocessing and training sentences vocabulary attempted.
    Please guide me to the right way of approaching this task.

    Some important points about my requirement:

    1. We record lectures in mkv format. I separate out the audio convert to 16000 Hz Mono wav file and split it to small portions ranging from 1 to 28 seconds.
    2. The lecture accents is majorly Indian English with some Hindi words also being used in between English phrases
    3. The vocabulary majorly consists of technical jorgons with some casual conversation between the professors and students, as we teach programming subjects online.

    Any help will be duly appreciated

     
    • Nickolay V. Shmyrev

      We have Indian English model in downloads

       
  • Aneri Sheth

    Aneri Sheth - 2018-08-24

    Hi Nickolay,

    Thank you so much for your prompt response. I have already tried transcribing with the Indian English model available in the Downloads but since I got poor accuracy I attempted adapting the default US English model with both - my lecture recordings and manually collected audio.

    I have stated out the details of my attempt in this thread and also attached the required files and audio recordings

    https://sourceforge.net/p/cmusphinx/discussion/sphinx4/thread/b83844d5/

    Have I followed the right approach? Also what, in your opinion, would be a good approach?

    1. Trying to adapt the Indian English model with my audio recordings (If yes, how many hours of adaptation data is recommended and how many unique sentences should the language model contain at the least). Also if my final model is to be applied on recorded lectures, does it make sense to still have manually spoken recordings for training? I am asking this because the channel difference will persist.

    2. Buiding a model from scratch since it has a specialized vocabulary

    Please guide me regarding the same.Your help will be highly appreciated.

     

    Last edit: Aneri Sheth 2018-08-24
    • Nickolay V. Shmyrev

      I haven't seen your data and results so no idea. I also have no idea what kind of application are you trying to build.

       
  • Aneri Sheth

    Aneri Sheth - 2018-08-25

    Apologies for any inclarity in the given information.

    Regarding the application I am trying to build:

    We are a startup teaching programming online to students around the globe. My aim is to develop an ASR system for transcribing recorded ONLINE PROGRAMMING LECTURES with the following characteristics-

    1. Lectures have two way communication between the rpfessor and the student with professor speaking the most of the time.
    2. The accent is Indian English and the recordings also have some Hindi sentences in between English content.
    3. Vocabulary is limited as we only teach programing, data structures and algorithms.
     
  • Aneri Sheth

    Aneri Sheth - 2018-08-25

    I have attached my data for your reference

    corpus.txt - The set of spoken statements from which lm was generated
    audio.fileids - The file containing all .wav file names used for adaptation
    audio.transcription - Corresponding transcriptions of audio files
    corpus.dict - The dictionary generated using Sphinx Knowledge Base tool
    corpus.lm - The language model generated using Sphinx Knowledge Base tool
    AudiosASR/tar.gz - Contains all the audios used for adaptation

    Implementation details (I have avoided repeating it as I have alresdy mentioned it as a part of another question ) are also stated on the following thread:

    https://sourceforge.net/p/cmusphinx/discussion/sphinx4/thread/b83844d5/

    Please help me and provide a little direction. I have been trying for a long time with different tweakings but to o avail.

     

    Last edit: Aneri Sheth 2018-08-25

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.