Menu

text2idngram Error reading temp file cmuclmtk

Help
2015-05-14
2015-05-29
  • Charbel Fakhry

    Charbel Fakhry - 2015-05-14

    Hello,

    I am trying to develop both French and Spanish Language models using the a wikipedia dump corpus (formatted as mentioned in the tutorial along with the and tags & properly edited according)

    After having successfully created the vocabulary files for each language, I have come across an error while running the text2idngram command. I have tried large and small vocabularies (65000 and 2,000,000) and I get the same error for both Spanish and french.
    Assuming FrenchLM.txt is my corpus, below is the log error I get when running
    text2idngram -vocab french.vocab -idngram FrenchLM.idngram < FrenchLM.txt

    FrenchLM.txt is a large file (around 2.4 GB).
    This is the remainder part of the execution (since the first part is working fine)
    The error is in the last line.

    [...]
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/29
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/30
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/31
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/32
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/33
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/34
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/35
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/36
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/37
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/38
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/39
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/40
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/41
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/42
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/43
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/44
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/45
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/46
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/47
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/48
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/49
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/50
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ..................................................
    ................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/51
    Reading text into the n-gram buffer...
    20,000 n-grams processed for each ".", 1,000,000 for each line.
    ..................................................
    ..................................................
    ..................................................
    .....................
    Sorting n-grams...
    Writing sorted n-grams to temporary file cmuclmtk-a07004/52
    Merging 52 temporary files...
    Error reading temp file cmuclmtk-a07004/1

    D:\FYP\French>

    And afterwards the command stops executing.

    What would you recommend?

    Thank you.

     
    • Nickolay V. Shmyrev

      Use srilm

       
  • Charbel Fakhry

    Charbel Fakhry - 2015-05-29

    This really helped. Thank you!!

     

Log in to post a comment.