Menu

Issue loading DMP file

Help
Halle
2012-04-28
2012-09-22
  • Halle

    Halle - 2012-04-28

    Hi Nickolay,

    Here is a bit of a puzzler. Since OpenEars 1.x that uses
    pocketsphinx/sphinxbase .7 it is apparently no longer possible to load .DMP
    files that were compiled on Linux (DMP files compiled internally in OpenEars
    work fine). Here is what the logging says when trying to load hub4.5000.DMP
    with the hub4.5000.dic and the wsj acoustic model:

    INFO: ngram_model_arpa.c(79): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(197): ngrams 1=5001, 2=436879, 3=418286
    INFO: ngram_model_dmp.c(244): 5001 = LM.unigrams(+trailer) read
    1: offset is 81672 // This is my added logging to see what the offset value is
    at the beginning of if (do_mmap) at line 247
    filesize is 5642542 // This is my added logging to see what the overall
    filesize is
    2: offset is 5324232 // This is my added logging to see what the offset value
    is at the beginning of (do_mmap) at line 273
    INFO: ngram_model_dmp.c(297): 436879 = LM.bigrams(+trailer) read
    3: offset is 8670520 // This is my added logging to see what the offset value
    is at the beginning of if (do_mmap) at line 303
    INFO: ngram_model_dmp.c(326): 418286 = LM.trigrams read
    4: offset is 8670520 // This is my added logging to see what the offset value
    is at the beginning of if (do_mmap) { at line 335
    fread error detected // This is my added logging of the error condition after
    ngram_model_dmp.c line 339 "if (fread(&k, sizeof(k), 1, fp) != 1) {" which is
    apparently the cause of the problem
    premature eof detected // This is my added logging of the eof condition for
    the line of code mentioned in the previous comment
    Position: 8670520 // This is my added logging of the seek position when the
    eof is happening
    ERROR: "ngram_search.c", line 211: Failed to read language model file:
    /Users/username/Library/Application Support/iPhone
    Simulator/5.1/Applications/035669A4-AB0E-4E2C-852C-
    C771386CB4DF/OpenEarsSampleApp.app/hub4.5000.DMP

    I've decompiled the .DMP in question into an arpa file and verified that those
    1-gram/2-gram/3-gram counts are correct.

    Do you have any idea why this fails in the section of ngram_model_dmp.c under
    the comment that follows, after apparently reading the n-grams correctly:

    / read n_prob2 and prob2 array (in memory) /

    Thanks for any leads you can give me in troubleshooting this. I have checked
    out in sphinx_config.h whether there are any type sizes that are set wrongly
    (since this seems to be about seek position in a binary and it could be due to
    wrong bytesizes) but I can't see any.

     
  • Nickolay V. Shmyrev

    This seems to be a regression introduced recently, one need to check change
    history to find where did it broke.

     
  • Halle

    Halle - 2012-04-30

    OK, are you saying it's a probable regression in one of the .7 distributions
    of sphinxbase?

     
  • Halle

    Halle - 2012-05-13

    Well, on further investigation this turned out to be an own goal. I had made
    some changes to the ARPA/DMP model templating system to fix some other library
    issues and it broke DMP reading for most DMPs. Thanks for your help.

     

Log in to post a comment.