Menu

language model maker

Help
skatz_teyp
2007-11-12
2012-09-22
  • skatz_teyp

    skatz_teyp - 2007-11-12

    hey guys, im still up in creating a big language model... after being stuck in the allocation of big data, i somehow managed to go behind it and its now working properly... but during the computation of unigram probabilities, i get the error of having the some of the probabilities < 1 which in my case is 0.72849311.... the sum of the unigrams is ought to be equal to 1 or less about 1e-6... but i get this very huge difference... why do i get this values? is it because of my training data? or another bug on the toolkit? by the way, im using the cmuclmtk... are there other tools out there which can be used to create one? ive tried srilm... but i also get errors about the big data size.... hope you can help me with this... thanks..

     
    • skatz_teyp

      skatz_teyp - 2007-11-14

      hmmmm... i thought so.... maybe its because of high counts of ngram occurences (1 million to 1 billion)... i'm trying to change the data types to bigger one e.g. int to long, float to double...

       
    • Nickolay V. Shmyrev

      I think it's another bug. Probably related with floating-point precision. Can we reproduce it?

       

Log in to post a comment.