Continuous Urdu Speech recognition

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Continuous Urdu Speech recognition

Forum: Help

Created: 2018-02-10

Updated: 2018-02-10

Ali - 2018-02-10

I am making a continous speech recogntion system for Urdu

0 ) Can we use content of transcription file in building a langugae model . If No why ?

1)

1 hour of recording for command and control for a single speaker 5 hours of recordings of 200 speakers for command and control for many speakers 10 hours of recordings for single speaker dictation 50 hours of recordings of 200 speakers for many speakers dictation

Is this apply to both transcription and language model file ??

2) We must have to declare Urdu phonmes in English alphabetic letter (right) in both an4.phone and an4.dic file ??

3)Can we use urdu word in an4.dic file for example
ایک E YK

4) If I am recording 10 hours of recordings for single speaker dictation . Can I use one wav file having 10 hours of recording OR Do I need to split ? If Yes then what is the right duration and right way to split ?

5) I used same data for training and testing purpose ? Why I am not geeting 0% WER ??

6)
reference to it prefers to use letter-only phone names without special symbols. can we use _ in phone like in Urdu we have CISAMPA representation of one of the phone is T_SH_h
Thanks in advance

Last edit: Ali 2018-02-10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.