CMU Sphinx / Forums / Help: Language Model

Senjam Shantirani - 2016-05-16

Can I directly use .lm we obtained from http://www.speech.cs.cmu.edu/tools/lmtool-new.html,
INSTEAD of .DMP?

Will the replacement of .DMP by .lm cause any accuracy issue?

Kindly advice.
Shanti

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-16
  
  Can I directly use .lm we obtained from http://www.speech.cs.cmu.edu/tools/lmtool-new.html,
  INSTEAD of .DMP?
  
  Yes
  
  Will the replacement of .DMP by .lm cause any accuracy issue?
  
  No
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Senjam Shantirani - 2016-05-16

When I decode using the .lm it shows no decoded phone sequence in the data-1-1.match, instead shows the file name and score only. How to solve this?

(test/367-130732-0000 -1514)
(test/367-130732-0001 -2968)
(test/367-130732-0002 -5604)

I put .dmp from other corpus(an4), just to see , then it gives the phone sequence, which I know is not correct as the .dmp does not belong to my data and gave WER 73% for Librespeech test data of 500 wav files.

I ran sphinxlmconvert, but still the output comes the same, without decoded phone sequence in data-1-1.match

How do I create .dmp for my data? OR is there any setting for .lm to give the phone sequence in data-1-1.match ?

I am using the PocketSphinx command : sphinxtrain -s decode run

Shanti

Last edit: Senjam Shantirani 2016-05-16

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-16
  
  When I decode using the .lm it shows no decoded phone sequence in the data-1-1.match, instead shows the file name and score only. How to solve this?
  
  Most likely your language model does not match the dictionary case, they must have the same case. You can check decoding log for error, it should be there.
  
  How do I create .dmp for my data?
  
  Language model training is described in tutorial http://cmusphinx.sourceforge.net/wiki/tutoriallm
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Language Model

Speech Recognition Toolkit

Forums

Help

Language Model document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Language Model