Menu

Help required in hindi speech recognition

Help
2016-07-18
2016-11-01
  • anand vadlamani

    anand vadlamani - 2016-07-18

    Hi we are trying to create a model for Hindi Speech Recongiztion

    We have put the data and logfile in the following location
    ftp://123.176.44.99/hindi_asr/
    credentials are: anandftp (username) password (password)

    Here are the steps that we have done:

    1. We have built the language model using http://www.speech.cs.cmu.edu/tools/lmtool.html. We have submitted our corpus text (ftp://123.176.44.99/hindi_asr/an4_test/etc/latest_dictionary.txt) and we got the language model which is (ftp://123.176.44.99/hindi_asr/an4_test/etc/an4_test.lm.DMP

    This corpus text we have transliterated into english from hindi (is it ok ?)

    1. Using the above language model, we created acoustic model as per the tutorial. Training data is in ftp://123.176.44.99/hindi_asr/an4_test/wav/an4_clstk/. We have 2400 odd audio files along with their transcription.

    2. After training, we tested using "sphinxtrain -s decode run". In this testing phase, we have given all fileids (all 2400 fileids) along with their transcriptions from the training folder itself.

    We got at the end the following message:

    MODULE: DECODE Decoding using models previously trained
    Decoding 2404 segments starting at 0 (part 1 of 1)
    0%
    Aligning results to find error rate
    SENTENCE ERROR: 55.9% (1345/2404) WORD ERROR RATE: 7.2% (3044/42406)

    We are using the same files from the training data to test in decoding phase. Is it ok or should we use different audio for training?

    After this, we tried to recognize one of the audio (which we have used in testing) using the language model, acoustic model and dictionary which we created in the above steps.

    command:
    pocketsphinx_continuous -infile wav/an4_clstk/hindi/hindi_0010.wav -hmm model_parameters/an4.ci_cont_flatinitial/ -lm etc/an4.lm.DMP -dict etc/an4.dic

    But the accuracy is very poor and we are not getting a single word getting recongized.

    While running the test phase, sentence error rate and word error rate are low

    Please guide us, where we are wrong.

    Thanks
    Anand

     
    • Nickolay V. Shmyrev

      This corpus text we have transliterated into english from hindi (is it ok ?)

      It is ok but not necessary.

      We are using the same files from the training data to test in decoding phase. Is it ok or should we use different audio for training?

      It is not recommended to use same audio for testing, you need to split your data on train and test. Those should not intersect.

      While running the test phase, sentence error rate and word error rate are low

      Final model is in an4.cd_cont_200 and it is very small due to small datasize. You need larger dataset and most recent pocketsphinx, the result would be

       YE DO PAR LAATHEE RAKH KAR GHAR SE NIKALA TO DHANIYA DVAAR PAR KHADEE USE DER TAK DEKHATEE RAHEE
      

      which is about accurate

       
  • anand vadlamani

    anand vadlamani - 2016-07-18

    Hi Nickolay,
    thanks for prompt reply, but when i use in my system (we are using pocketsphinx-5prealpha) with the following command -

    pocketsphinx_continuous -infile wav/an4_clstk/hindi/hindi_0010.wav -hmm model_parameters/an4.cd_cont_200/ -lm etc/an4.lm.DMP -dict etc/an4.dic

    we get the following result : OONT DIYE

    I agree that our dataset is small, but while decoding it is able to decode properly where as while running through the command it is giving the above result.

    In an4.align file, the decoded result we observe as: horee kandhon par laathee rakh kar ghar se nikala to dhaniya dvaar par khadee use der tak dekhatee rahee

    We are confused, are we doing something wrong ? Are you using the same mode that we mentioned in our link or is it a different one ?

     
    • Nickolay V. Shmyrev

      You need to compile latest sphinxbase and pocketsphinx from github.

       
  • anand vadlamani

    anand vadlamani - 2016-07-18

    Hi Nickolay, we set -cmninit 71, and we have given long sentences and we are able to get good results. Thank you.

     
  • Arun Kumar

    Arun Kumar - 2016-11-01

    Hi Anand,
    We are struggling with the same issue with the latest cmusphinx4-5prealpha build. I see you have been successful in transcribing hindi audio. I'm thinking it's something to do with the way we have our corpus setup. Can you share the ftp details, so we can see how u have formed the corpus to get the results?

    Thanks,
    Arun

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.