Pre-Processing Error

Anonymous
2013-02-08
2013-10-31

  • Anonymous
    2013-02-08

    Hello,

    In attempting to pre-process a text file, I receive the error, "Something wrong with the database: bun_r table KH Coder will exit now. How can I fix this error?

    Thanks for your time!

    Kent

     
    Last edit: Anonymous 2013-02-08
  • HIGUCHI Koichi
    HIGUCHI Koichi
    2013-02-09

    Thank you for the post!

    Did you try "Botchan" tutorial? If you can follow the tutorial without any errors, it seems that your KH Coder installation is working properly.

    In that case, the data you tried could contain some incompatibilities. Please prepare a plain text file (.txt) as a "target file" for KH Coder. Not Word (.doc .docx) or PDF (.pdf). The encoding of the text file should be something like "Latin-1," "ISO 8859-1," "US-ASCII," or "Plain ASCII." And be sure that the text file doesn't include any tab characters or any other control characters other than "line feed." And you should delete "<" and ">" from your data unless you are using tags for KH Coder.

    Hope it helps.
    Best regards.

     

  • Anonymous
    2013-02-09

    Thanks for your help.

    I have had success with the Botchan file and smaller portions of my file that is creating the errors. I have attempted to create a clean ASCII text file but I must have missed some errant characters.

     

  • Anonymous
    2013-05-07

    I am having the same issue as Kent, while I have had success with the Botchan file and smaller portions of my file, other parts are creating errors. I have removed every < and > besides the ones in \<h1> or \</h1>, using notepad++ I asked it to remove everything defined by the regex [\x00-\x09\x0B-\x1F\x7F] (which wikipedia tells me are the control characters expect for (in hex) 0A which is new line, and reading about the clean function is what it removes). Notepad++ tells me it is encoded as UTF-8 without BOM, which is the same as a smaller portion that works.

    Is there some other character that that regex is not picking up that might be causing problems?

    ETA: I also tried using the clean function, it is now ANSI, and then removing all < and > and still had this error.

     
    Last edit: Anonymous 2013-05-07
  • HIGUCHI Koichi
    HIGUCHI Koichi
    2013-05-07

    Thank you for the post!

    1) Would you give us the EXACT error message you got? Dose it say "bun_r?" Isn't it bun-bun_r1? or bun-bun_r2?

    2) Can you send me the data by e-mail?

     

  • Anonymous
    2013-05-07

    1) The exact message is "Something wrong with the database:bun_r table KH Coder will exit now" .

    2) I'll send an email. Thank you!

     
    Last edit: HIGUCHI Koichi 2013-05-08
  • HIGUCHI Koichi
    HIGUCHI Koichi
    2013-05-08

    Thank you for the post and e-mail.

    I see a lot of "\" (backslash) in your data.
    Please delete them all and it will be OK.

    Good luck with your research.

     

  • Anonymous
    2013-06-14

    Koichi San konnichiwa !

    KH coder is one of the best text mining tools I have come across. Thanks.

    I had issues when I tried to run pre processing for Japanese text files. KH coder seems to work only for ANSI Encoded text files. Whereas if a text file is saved in ANSI the kanjis turn into question mark. How can I keep my Japanese text file in Unicode and run the preprocess without getting the following
    Fatal error: Could not execute ChaSen.

     
  • HIGUCHI Koichi
    HIGUCHI Koichi
    2013-06-14

    Hello from Japan and thank you for the post!

    Please try saving Japanese text files in Shift-JIS (SJIS) or EUC-JP (EUC) encoding. Currently, KH Coder can not hundle Unicode.

    Best regards.

     
    Last edit: HIGUCHI Koichi 2013-06-14


Anonymous


Cancel   Add attachments