KH Coder / Discussion / Open Discussion: Pre-Processing Error

Comment has been marked as spam.
Undo

View and moderate all "Open Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Discussion"

Anonymous - 2013-02-08

Hello,

In attempting to pre-process a text file, I receive the error, "Something wrong with the database: bun_r table KH Coder will exit now. How can I fix this error?

Thanks for your time!

Kent

Last edit: Anonymous 2013-02-08

Hello, In attempting to pre-process a text file, I receive the error, "Something wrong with the database: bun_r table KH Coder will exit now. How can I fix this error? Thanks for your time! Kent

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2013-02-09

Thank you for the post!

Did you try "Botchan" tutorial? If you can follow the tutorial without any errors, it seems that your KH Coder installation is working properly.

In that case, the data you tried could contain some incompatibilities. Please prepare a plain text file (.txt) as a "target file" for KH Coder. Not Word (.doc .docx) or PDF (.pdf). The encoding of the text file should be something like "Latin-1," "ISO 8859-1," "US-ASCII," or "Plain ASCII." And be sure that the text file doesn't include any tab characters or any other control characters other than "line feed." And you should delete "<" and ">" from your data unless you are using tags for KH Coder.

Hope it helps.
Best regards.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Open Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Discussion"

Anonymous - 2013-02-09

Thanks for your help.

I have had success with the Botchan file and smaller portions of my file that is creating the errors. I have attempted to create a clean ASCII text file but I must have missed some errant characters.

Thanks for your help. I have had success with the Botchan file and smaller portions of my file that is creating the errors. I have attempted to create a clean ASCII text file but I must have missed some errant characters.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2013-02-10

Thank you for the post!

You can try "CLEAN" function of EXCEL / CALC to remove all control characters. Copy and paste the data to Excel / Calc, apply "CLEAN" function, then copy the data back to a text file.
http://www.techrepublic.com/blog/msoffice/clean-up-your-data-with-this-easy-to-use-excel-function/896

About "<" and ">," use search or replace function of your favorite text editor to delete them all.

Hope it helps.
Best regards.

Last edit: HIGUCHI Koichi 2013-02-10

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Open Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Discussion"

Anonymous - 2013-05-07

I am having the same issue as Kent, while I have had success with the Botchan file and smaller portions of my file, other parts are creating errors. I have removed every < and > besides the ones in \<h1> or \</h1>, using notepad++ I asked it to remove everything defined by the regex [\x00-\x09\x0B-\x1F\x7F] (which wikipedia tells me are the control characters expect for (in hex) 0A which is new line, and reading about the clean function is what it removes). Notepad++ tells me it is encoded as UTF-8 without BOM, which is the same as a smaller portion that works.

Is there some other character that that regex is not picking up that might be causing problems?

ETA: I also tried using the clean function, it is now ANSI, and then removing all < and > and still had this error.

Last edit: Anonymous 2013-05-07

I am having the same issue as Kent, while I have had success with the Botchan file and smaller portions of my file, other parts are creating errors. I have removed every < and > besides the ones in \<h1> or \</h1>, using notepad++ I asked it to remove everything defined by the regex [\x00-\x09\x0B-\x1F\x7F] (which wikipedia tells me are the control characters expect for (in hex) 0A which is new line, and reading about the clean function is what it removes). Notepad++ tells me it is encoded as UTF-8 without BOM, which is the same as a smaller portion that works. Is there some other character that that regex is not picking up that might be causing problems? ETA: I also tried using the clean function, it is now ANSI, and then removing all < and > and still had this error.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2013-05-07

Thank you for the post!

1) Would you give us the EXACT error message you got? Dose it say "bun_r?" Isn't it bun-bun_r1? or bun-bun_r2?

2) Can you send me the data by e-mail?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Open Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Discussion"

Anonymous - 2013-05-07

1) The exact message is "Something wrong with the database:bun_r table KH Coder will exit now" .

2) I'll send an email. Thank you!

Last edit: HIGUCHI Koichi 2013-05-08

1) The exact message is "Something wrong with the database:bun_r table KH Coder will exit now" . 2) I'll send an email. Thank you!

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2013-05-08

Thank you for the post and e-mail.

I see a lot of "\" (backslash) in your data.
Please delete them all and it will be OK.

Good luck with your research.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Open Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Discussion"

Anonymous - 2013-06-14

Koichi San konnichiwa !

KH coder is one of the best text mining tools I have come across. Thanks.

I had issues when I tried to run pre processing for Japanese text files. KH coder seems to work only for ANSI Encoded text files. Whereas if a text file is saved in ANSI the kanjis turn into question mark. How can I keep my Japanese text file in Unicode and run the preprocess without getting the following
Fatal error: Could not execute ChaSen.

Koichi San konnichiwa ! KH coder is one of the best text mining tools I have come across. Thanks. I had issues when I tried to run pre processing for Japanese text files. KH coder seems to work only for ANSI Encoded text files. Whereas if a text file is saved in ANSI the kanjis turn into question mark. How can I keep my Japanese text file in Unicode and run the preprocess without getting the following Fatal error: Could not execute ChaSen.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2013-06-14

Hello from Japan and thank you for the post!

Please try saving Japanese text files in Shift-JIS (SJIS) or EUC-JP (EUC) encoding. Currently, KH Coder can not hundle Unicode.

Best regards.

Last edit: HIGUCHI Koichi 2013-06-14

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Open Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Discussion"

Anonymous - 2014-09-26

Greetings

I have just started using the coder so, I am not very familiar with it but, I keep getting the same error.

"Fatal error: could not execute ChaSen"

A window appears with this message right after I try to run the Pre-Processing. I click on OK and the whole program shuts down.

Any suggestion on how to solve this?

Thank you very much for your time, and sorry for the bother.

Greetings I have just started using the coder so, I am not very familiar with it but, I keep getting the same error. "Fatal error: could not execute ChaSen" A window appears with this message right after I try to run the Pre-Processing. I click on OK and the whole program shuts down. Any suggestion on how to solve this? Thank you very much for your time, and sorry for the bother.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2014-09-27

Hi,

Are you trying to analyze Japanese texts or English texts?　If the text is in English, follow steps in page 5 of this tutorial:
http://www.slideshare.net/khcoder/quick-start-tutorial-of-kh-coder-quantitative-content-analysis-or-text-mining-of-english-language-data

If the text is in Japanese, please try the tutorial with "kokoro2.txt"
http://www.slideshare.net/khcoder/kh-coder-28776074

Anyway, you better check if you can analyze tutorial data or not. If you can, the problem is in your text maybe. And if you cannot, the problem is in the setup procedure of KH Coder or just in KH Coder itself.

Last edit: HIGUCHI Koichi 2014-09-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "Open Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Discussion"

Anonymous - 2014-09-30

Greetings, Mr Higuchi

Thank you very much for your reply. I am attempting to analyze a bilingual text that has both Japanese and English characters. Is it possible to do this kind of procedure with two languages at the same time?

Thank you very much for your attention, and sorry for the nuisance.

Greetings, Mr Higuchi Thank you very much for your reply. I am attempting to analyze a bilingual text that has both Japanese and English characters. Is it possible to do this kind of procedure with two languages at the same time? Thank you very much for your attention, and sorry for the nuisance.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2014-10-01

No, KH Coder is not ready for a bilingual text. Sorry. If you use ChaSen, all English words will be treated as "unknown" POS words.

But the error is not normal. Not expected. You should be able to process/analyze the data.

Last edit: HIGUCHI Koichi 2014-10-01

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

David-O. Mercier - 2014-10-13

Although it is impossible to use the program to analyse two different languages at once, as briefly discussed in another thread, I have been able to process heavily multilingual data with KH Coder by:
1) removing all non-ASCII characters;
2) using the English "left3words" Tagger and simply treating foreign words as "foreign"/"unknown" with regard to part-of-speech, as prof. Higuchi just explained.

I have found that in that way, most tools in KH Coder will work as desired (for the English parts of the data only!).

You could use a similar procedure with a Japanese Tagger - minus the deletion of non-ASCII characters, of course - then interpret the complementary results of both analyses at once? I am not familiar with parsing Japanese data, but if you want to ignore English words altogether for that part of the analysis, you could set up an exhaustive list of words to temporarily ignore for that purpose ('Pre-Processing' --> 'Select Words to Analyze').

Last edit: David-O. Mercier 2014-10-14

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2018-04-11

Hi,

I also get the same issue as the OP. I'm working with a french text and it doesnt work when I select "FreeLing". It does work when selecting Snow Ball Stemmer.

Any idea what the issue could be? I am attaching the txt file.

Thanks!
Felipe

cattleclean2.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2018-04-11

Thank you very much for attaching a text file that can reproduce the error. It really helps.

I have fixed this issue and released as 3.Alpha.13b. Please try.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2018-04-11

Hi,

thanks so much for addressing the issue so quickly! I ran the new program. This time I get no error but the program just shuts itself down with no error message unfortunately...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- HIGUCHI Koichi - 2018-04-11
  
  Would you follow this procedure and let me know the result?
  
  Start “Command Prompt” (click the Windows (Start) button, type “cmd” and hit “Enter” key)
  
  In the “Command Prompt”, type “cd c:\khcoder3” and hit “Enter” key
  
  In the “Command Prompt”, type “kh_coder” and hit “Enter” key. KH Coder will start.
  
  Open the project and run pre-processing
  
  If KH Coder silently quits, copy everything in the “Command Prompt” window and paste it here
  
  Thanks!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2018-04-11

It worked with a different txt file!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2018-05-25

Hello,
I analyse a big txt (154Mb, 157536 lines, 177 columns) and I run into a problem when pre-processing. The txt is well-cleaned, but when pre-precessing the console goes mad, repeating the message:
"Use of uninitialized value i nconcatenation <.> or string at /<C:\\khcoder.exe>Lingua/Sentence.pm line 131, <TRGT> line 1"; then ...line 2", "...line 3" etc. and it never ends (well above a million as far s I have tried).
Any idea what is going wrong?
I'm using the coder 3 alpha 13
Thanks!!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2018-05-25

Please read the "READ ME" before you post.
https://sourceforge.net/p/khc/discussion/222396/thread/e1c799c1/

Anyway, can you attach or send me the text?

Also, "Use of uninitialized value" messages do not mean there is a fatal error. It's non-critical warning. You may just try waiting for processing to finish.

Last edit: HIGUCHI Koichi 2018-05-25

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2018-05-27
  
  Hello,
  Thanks for your answer. Sorry I forgot to mention I use Windows 7. The text has 2 million sentences, it is a transcription of an online forum. Unfortunately I cannot transfer the file here, as it is considered as a political extremist text. I used your excellent software on several other texts and it worked perfectly - I therefore did not run the tutorial text, but I can do it tomorrow.
  I tried waiting for the process to finish, but maybe not long enough - the error message repeated itself so much I thought it was not worth it. I'll launch again tomorrow and wait as long as possible. thanks!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

HIGUCHI Koichi - 2018-05-27

If it works with other text files, it's ok. You don't have to try with tutorial data.

What did you mean by the word "columns"? If you are analyzing Excel or CSV file that has multiple columns, make sure you choose right column to analyze when you create a new project.

Though it depends heavily on the speed of your CPU, please be aware that it may take days to complete pre-processing. When the process is running, perhaps you should check CPU usage of “kh_coder.exe” in task manager. If your CPU is working, it may be worth the wait.

Best,

Last edit: HIGUCHI Koichi 2018-05-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2019-09-19

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Pre-Processing Error

Quantitative Content Analysis or Text Mining

Forums

Help

Pre-Processing Error

Pre-Processing Error

Quantitative Content Analysis or Text Mining

Forums

Help

Pre-Processing Error document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Pre-Processing Error