CMU Sphinx / Forums / Help: Guidance required in creating Specialized vocabulary application in Indian English accent

Aneri Sheth - 2018-08-23

I am working on using CMU SPHINX4 for transcribing online programming lectures conducted using Indian English accent. I have tried adapting the acoustic model with approximately 3-4 hours of audio recordings containing recordings from around 3 different speakers and a limited vocabulary of 1300 words. The accuracy only slightly improved than that of the default en-us model applied on the recordings.

However the accuracy is still way below the acceptable levels. That led me to a question :

Is training the model from scratch required for recognizing Indian English accents?

I initially tried using adaptation because of the following instructions mentioned on the cmusphinx site:

Adaptation is known to work well when you are using different recording environments (close-distance or far microphone or telephone channel), or when a slightly different accent (UK English or Indian English) or even another language is present. Adaptation, for example, works well if you need to quickly add support for some new language just by mapping a phoneset of an acoustic model to a target phoneset with the dictionary.

But adaptation is giving a very poor accuracy for all variations of noise cancellation, signal preprocessing and training sentences vocabulary attempted.
Please guide me to the right way of approaching this task.

Some important points about my requirement:

We record lectures in mkv format. I separate out the audio convert to 16000 Hz Mono wav file and split it to small portions ranging from 1 to 28 seconds.

The lecture accents is majorly Indian English with some Hindi words also being used in between English phrases

The vocabulary majorly consists of technical jorgons with some casual conversation between the professors and students, as we teach programming subjects online.

Any help will be duly appreciated
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-08-23
  
  We have Indian English model in downloads
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aneri Sheth - 2018-08-24

Hi Nickolay,

Thank you so much for your prompt response. I have already tried transcribing with the Indian English model available in the Downloads but since I got poor accuracy I attempted adapting the default US English model with both - my lecture recordings and manually collected audio.

I have stated out the details of my attempt in this thread and also attached the required files and audio recordings

https://sourceforge.net/p/cmusphinx/discussion/sphinx4/thread/b83844d5/

Have I followed the right approach? Also what, in your opinion, would be a good approach?

Trying to adapt the Indian English model with my audio recordings (If yes, how many hours of adaptation data is recommended and how many unique sentences should the language model contain at the least). Also if my final model is to be applied on recorded lectures, does it make sense to still have manually spoken recordings for training? I am asking this because the channel difference will persist.

Buiding a model from scratch since it has a specialized vocabulary

Please guide me regarding the same.Your help will be highly appreciated.

Last edit: Aneri Sheth 2018-08-24
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-08-24
  
  I haven't seen your data and results so no idea. I also have no idea what kind of application are you trying to build.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aneri Sheth - 2018-08-25

Apologies for any inclarity in the given information.

Regarding the application I am trying to build:

We are a startup teaching programming online to students around the globe. My aim is to develop an ASR system for transcribing recorded ONLINE PROGRAMMING LECTURES with the following characteristics-

Lectures have two way communication between the rpfessor and the student with professor speaking the most of the time.

The accent is Indian English and the recordings also have some Hindi sentences in between English content.

Vocabulary is limited as we only teach programing, data structures and algorithms.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aneri Sheth - 2018-08-25

I have attached my data for your reference

corpus.txt - The set of spoken statements from which lm was generated
audio.fileids - The file containing all .wav file names used for adaptation
audio.transcription - Corresponding transcriptions of audio files
corpus.dict - The dictionary generated using Sphinx Knowledge Base tool
corpus.lm - The language model generated using Sphinx Knowledge Base tool
AudiosASR/tar.gz - Contains all the audios used for adaptation

Implementation details (I have avoided repeating it as I have alresdy mentioned it as a part of another question ) are also stated on the following thread:

https://sourceforge.net/p/cmusphinx/discussion/sphinx4/thread/b83844d5/

Please help me and provide a little direction. I have been trying for a long time with different tweakings but to o avail.

Last edit: Aneri Sheth 2018-08-25

AudiosASR.tar.gz

audio.fileids

audio.transcription

corpus.dict

corpus.lm

corpus.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Guidance required in creating Specialized vocabulary application in Indian...

Speech Recognition Toolkit

Forums

Help

Guidance required in creating Specialized vocabulary application in Indian English accent

Guidance required in creating Specialized vocabulary application in Indian...

Speech Recognition Toolkit

Forums

Help

Guidance required in creating Specialized vocabulary application in Indian English accent document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Guidance required in creating Specialized vocabulary application in Indian English accent