language model maker

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

language model maker

Forum: Help

Creator: skatz_teyp

Created: 2007-11-12

Updated: 2012-09-22

skatz_teyp - 2007-11-12

hey guys, im still up in creating a big language model... after being stuck in the allocation of big data, i somehow managed to go behind it and its now working properly... but during the computation of unigram probabilities, i get the error of having the some of the probabilities < 1 which in my case is 0.72849311.... the sum of the unigrams is ought to be equal to 1 or less about 1e-6... but i get this very huge difference... why do i get this values? is it because of my training data? or another bug on the toolkit? by the way, im using the cmuclmtk... are there other tools out there which can be used to create one? ive tried srilm... but i also get errors about the big data size.... hope you can help me with this... thanks..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- skatz_teyp - 2007-11-14
  
  hmmmm... i thought so.... maybe its because of high counts of ngram occurences (1 million to 1 billion)... i'm trying to change the data types to bigger one e.g. int to long, float to double...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2007-11-13
  
  I think it's another bug. Probably related with floating-point precision. Can we reproduce it?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.