Assist with error in preprocessing

2012-10-12
2016-07-22
  • Nobody/Anonymous

    Here was my error message. Can you help interpret my problem?

    --- Begin Traceback ---
    Died at /<C:\\khcoder\\kh_coder.exe>kh_projects.pm line 130.

    Tk callback for .toplevel4.button1
    Tk::ANON at /<C:\\khcoder\\kh_coder.exe>Tk.pm line 250
    Tk::Button::butUp at /<C:\\khcoder\\kh_coder.exe>Tk/Button.pm line 175
    <ButtonRelease-1>
    (command bound to event)

     
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2012-10-13

    Hello from Japan. Thank you for your post!

    Hmmm, it seems that KH Coder failed to create a new project. The error message
    means that KH Coder could not write the project information into the file
    (c:\khcoder\config\project).

    I am not very sure but you may try this procedure (plan A):

    1. Open "config" folder and find a file called "projects"
    2. Delete the "projects" file
    3. Start KH Coder and try to create a new project and run pre-processing

    If it won't work, then you can try this procedure (plan B):

    1. Create a folder on your desktop called "khcoder."
    2. Copy the downloaded *.exe file into the "khcoder" folder.
    3. Double click the copied *.exe file.
    4. Before clicking "Unzip," edit the "Unzip to folder" entry. Delete "C:\khcoder," and input a period. Just one period "." is OK.
    5. Then click "Unzip" button.
    6. Now KH Coder is extracted in the "khcoder" folder on your desktop. Find "kh_coder.exe" and double click the file to start KH Coder.
    7. Change language settings, create a new project and run pre-processing.

    If you get the same error message, please let me know. In that case, please
    also check the black console window of KH Coder and look if there are other
    error messages.

     
  • Nobody/Anonymous

    Hello from the US and thank you for your assistance. Plan B seems to have
    worked.

    But now I have another question. I began pre-processing of my large text file
    yesterday, and KH Coder seems to still be running after nearly 24 hours. Do
    you think there would be a problem, or is it more likely that my text file is
    just quite large?

     
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2012-10-16

    Hello. Thank you for your post!

    You can open the task manager and check CPU usage. If a core of your CPU is
    occupied by KH Coder related processes, it seems to be normal. Or you can
    check console window of KH Coder and see if there are any possible error
    messages.

    Generally speaking, I recommend that you actually follow the "Botchan"
    tutorial not only to learn the usage, but also to check if your KH Coder
    installation is working properly. Alternatively, it is also very nice idea to
    make a small test data around 1 or 2 MB and perform tentative analysis with it
    first.

    About the size of data, if you have more than 100MB or 100,000 documents, I
    would recommend that you perform random sampling to reduce the data size. If
    you put really big data into KH Coder, it will take too much time to run pre-
    processing or analyses. And thanks to the statisticians, it is now clear that
    we only need 2,500 respondents to estimate percentages of the whole country
    with 2% margins of error. I mean that "big data" is not really necessary in
    many cases.


    Time is HH:MM:SS and test data is in Japanese. Test PC has SSD (Intel X-25M)
    and Core2Quad Q9650@4Ghz, although KH Coder utilizes only one core. English
    data would take longer time for pre-processing. And PC with HDD can take 10x
    longer time for pre-processing.

    Best regards.

     
  • Nobody/Anonymous

    Hello and thanks very much for this information. I am still having difficulty.
    The "preprocessing" goes on and on for hours and days, never finishing even
    though it appears to be working. The only thing I can think that might be
    getting in my way is a message I get from windows firewall after unzipping the
    downloaded khcoder program. It says that the Windows firewall has blocked
    "mysqld-nt". Could that explain why I'm having trouble, do you think? Thank
    you for all of your informative responses.

     
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2012-10-24

    Hello and thank you for your post!

    Hmmm, could you follow the "Botchan" tutorial? Is it possible to make a co-
    occurrence network of words with "Botchan" data? If it is, your KH Coder seems
    to be working at least with a smaller data like "Botchan." In this case, there
    may be something wrong with you data. Maybe it is just too big.

    When pre-processing goes well, the console window should look like this:

    Do you see any errors or notable differences here?

    About the "mysqld-nt.exe," it is the MySQL process. And KH Coder needs to
    access MySQL via TCP/IP. But if MySQL is really blocked by a firewall, you
    can't even create / open a project. So I guess your MySQL is working fine.

    BTW, I found out the bug which caused the following error message.

    Here was my error message. Can you help interpret my problem?

    --- Begin Traceback ---
    Died at /<C:\\khcoder\\kh_coder.exe>kh_projects.pm line 130.

    If you get this error when you try to create a new project, please avoid using
    single quotes, double quotes, or other suspicious characters in the "Memo"
    entry. Just use alphabets and numbers. This bug will be fixed in the next
    release. Sorry for the inconvenience.

    Best regards.

     
    Last edit: HIGUCHI Koichi 2012-10-30
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2015-01-20

    I am also having trouble with pre-processing.
    The problem is that the console window does not look like what you have above.
    It says:
    ...
    Checking icode (jp)... sjis
    Starting server, pid: 6528, Connecting................................................................
    and the periods just keep on going- could there be a problem with the server?

     
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2015-01-20

    It happens when a firewall or security software blocks KH Coder and/or Stanford POS Tagger. Or, it also happens when Stanford POS Tagger fails to start.

    So, I would try disabling firewall or security software temporarily and re-run KH Coder again.

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2015-01-20

    So I've tried disabling the firewall and security software but it is still not working. Any other ideas?

     
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2015-01-21

    Stanford POS Tagger is JAVA application. So you can try updating your JAVA. And checking security settings of JAVA may help.

    Also, you can try testing Stanford POS Tagger alone and see if it works or not.

     
    Last edit: HIGUCHI Koichi 2015-01-25
    • Comment has been marked as spam. 
      Undo

      You can see all pending comments posted by this user  here

      Anonymous - 2015-02-02

      Hello,

      I have the same problem, which is odd as I have run the POS with no problems previously.

      I also rang the POS tagger alone using its GUI and all worked OK.

      Very odd

       
      • HIGUCHI Koichi

        HIGUCHI Koichi - 2015-02-04

        Then, probably, a firewall or security software blocks the communication between KH Coder and Stanford POS Tagger, I think. KH Coder uses TCP/IP (Telnet) to communicate with Stanford POS Tagger. Did you try disabling firewall or security software?

         
        Last edit: HIGUCHI Koichi 2015-02-04
        • Comment has been marked as spam. 
          Undo

          You can see all pending comments posted by this user  here

          Anonymous - 2016-07-07

          (I am a different anonymous user. I am using Windows.). I have had a similar problem where the pre-processing was not connecting to the Stanford POS Tagger, but I might have partially gotten it to work. But not totally. I am posting this in case what I have figured out might be helpful to anyone, and in case anyone/you have a solution to the rest of my problem.
          When I was looking at the list of processes, I did not see java open, so I believe that it was failing to open the POS tagger, though I could run the POS tagger with the GUI for it. But, when I tried to run it with java (with java -jar stanford-postagger.jar) , I was getting an error. I eventually realized that I had two versions of java installed, a 1.7 version (in program files), and a rather old 1.6 version (in sys32). When I was running stanford-postagger.jar by itself, it was opening with the 1.7 version, but when trying to run java with the command line, it was running the old 1.6 version.

          When I used the 1.7 version of java, I was able to follow the instructions on the POS tagger FAQ in order to run the server for it, and when I ran kh coder while this was open, it got past the "Connecting.." stage, saying "ok. Tagging..." , so I believe that kh coder connected to the pos tagger local server.
          However, immidiately after it got to "Tagging...", it had the error "Warning: Sentences that include unrecognized characters are dropped from the processing. Dropped sentences are recorded in this file:" \tutorial_en\coder_data\botchan_en_dp.txt

          Then I press ok.

          (I am using the botchan file from the en tutorial.)

          Then, in the console, it says "A sentence whcih includes unrecognized characters is dropped!" many times (I believe once per line in the file, probably).

          Once it has output all of these, it says says in the console:

          " ok.
          Morpho1 26 wallclock secs ( 2.81 usr + 2.25 sys = 5.05 CPU)
          Morpho2 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
          The HEAP table will eat approx. 1MB; We have 2045MB max.
          Read 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU)
          Format 1 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
          Strat1 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU)
          Strat2 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
          RawTXT 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
          df: heap df: heap df 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
          fc 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
          DBD::mysql::st execute failed: Table 'khc0.dan' doesn't exist at /<[path to folder]\kh_coder.exe>mysql_exec.pm line 276."

          and there is a window that says:
          "Failed in accessing MySQL database system. KH Coder will exit now.

          SQL Input:
          select count(*) from dan
          Error:
          Table 'khc0.dan' doesn't exist"

          and then when I press OK it quits, after putting "Exit (gui_errormsg.pm)" to the console.

          Sorry if I should have made this a separate thread.

          Summary: I suspect kh_coder may have been trying to open the POS tagger with the wrong version of Java. When I started the POS tagger server myself, it seemed to connect, but it said that the sentences had unrecognized characters on I think all the lines, and then couldn't find a .dan file. I wonder if maybe it didn't get the right things from the POS tagger because I started it myself, and maybe that is why the sentences were considered to have unrecognized characters. (because, the file is the botchan_en.txt)

          (edit: fixed typo)

           
          Last edit: Anonymous 2016-07-07
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2016-07-08

    Hi,

    Thank you very much for your informative post!

    Can you rename the “java.exe” in sys32 folder to “java_old.exe” or something and retry the preprocessing of KH Coder?

    Best,

     
    • Comment has been marked as spam. 
      Undo

      You can see all pending comments posted by this user  here

      Anonymous - 2016-07-08

      I am unable to rename the java.exe, because I am using a company computer and I don't have write permissions in that folder , but I did find a way to get it to work.

      I believe the reason that I was getting the unrecognized character error was because I used the wrong settings on the POS tagger.

      I had been starting it with

      java -mx300m -cp stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTaggerServer -model ./models/left3words-wsj-0-18.tagger -port 2020

      (after putting the path to the correct java.exe in the %PATH% )

      but after I looked through the cvs (and looked at > khc > core > kh_lib > kh_morpho > win32 > stanford.pm ), I tried running it as

      java -mx300m -cp ./stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTaggerServer -outputFormat xml -outputFormatOptions lemmatize -port 2020 -model ./models/left3words-wsj-0-18.tagger

      (again, with the directory for the correct java.exe in the %PATH% )

      and when run this way, I did not get the errors about the unrecognized characters, and it seems to have completed the pre-processing correctly, and after that it seems to work fine for me.

      I think the reason I was getting that error was because I didn't tell the server to start with the settings for the xml output format and the lemmatize part.

      So, this works well as a workaround for me.

      Thank you for your response. I hope this can be of help if there are any others with the same problem.

      Again, thank you.

       
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2016-07-09

    Great trouble shooting!

    Thank you for sharing it!

    Best,

     
  •  Taylor-FL

    Taylor-FL - 2016-07-21

    Hello,

    I'm encountering the same error mentioned above, where when I select pre-processing, the app freezes up, only displaying "Connecting........." and never stopping.

    Here are the steps I performed:

    I configured KH Coder per the slideshare here:
    http://www.slideshare.net/khcoder/quick-start-tutorial-of-kh-coder-quantitative-content-analysis-or-text-mining-of-english-language-data

    I attempted to run it with the firewall disabled. No Success

    I attempted to run the POS Tagger independently, and POS was able to.

    Java is up to date.

    Any suggestions?

     
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2016-07-22

    Hello,

    You have 2 options, I think.

    1. figure out the trouble regarding Stanford POS tagger
    2. forget Stanford POS tagger and try FreeLing instead

    If you are not familiar with command prompt, I recommend option 2. Download and extract the latest alpha version here:
    https://sourceforge.net/projects/khc/files/KH%20Coder/3.Alpha.08/

    When you create a new project, select “FreeLing” instead of Stanford POS Tagger. Then try running pre-processing.

    If you want to choose option 1, please open task manager and check if you see “java.exe” when this happens:

    only displaying "Connecting........." and never stopping

    If you don’t see “java.exe”, it means that the server of Stanford pos tagger failed to start. In this case, try starting the server manually and see if it starts normally. To start the server manually, open command prompt, go to the directly like “C:\khcoder\dep\stanford-postagger”, then run this command:

    java -mx300m -cp ./stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTaggerServer -outputFormat xml -outputFormatOptions lemmatize -port 2020 -model ./models/left3words-wsj-0-18.tagger

    Best,

     
    •  Taylor-FL

      Taylor-FL - 2016-07-22

      Thanks for the assistance. I attempted both options, however it was unsuccesful.

      For option 1, attached is a screenshot showing the java applet running but it is still not connecting to the server. You were correct that java.exe was not running, however I am not that familiar with the cmd prompt and may have inputted the commands incorrectly.
      I first entered C:\khcoder\dep\stanford-postagger\stanford-postagger.jar and hit enter
      Then the "java.." command, and I received an error that stated "is not recognized as an internal or external command operable program or batch file

      For option 2, I downloaded and installed the latest alpha release, however I do not see an option for "FreeLing" either in settings or when I open a new project. Just the normal 4 options of "chasen", "mecab", "Stanford POS Tagger", and "Snowball".

       
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2016-07-22

    Hello,

    For option 1, you have to move to the folder first. So, in the command prompt window, you have to paste

    cd C:\khcoder\dep\stanford-postagger

    and hit “Enter” key on your keyboard. Then, you have to paste

    java -mx300m -cp ./stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTaggerServer -outputFormat xml -outputFormatOptions lemmatize -port 2020 -model ./models/left3words-wsj-0-18.tagger

    in 1 line and hit “Enter” to try running the server manually.

    For option 2, the latest alpha version is extracted into "C:\khcoder3" folder in the default setting. So you have to open "C:\khcoder3" folder and run khcoder.exe. Also, in the "new project" window, you have to select "English" first. Then you can select "FreeLing". I attach the screen shot of this screen below.

     
    Last edit: HIGUCHI Koichi 2016-08-04
    •  Taylor-FL

      Taylor-FL - 2016-07-22

      Thanks Higuichi! I was able to download the new release and choosing the FreeLing option allows the pre processing to complete without issue. I am now successfulling exporting my text analysis, this is a great tool!

       


Anonymous

Cancel  Add attachments





Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks