Menu

On manual import of English Wikipedia - java io "FileNotFoundException ... (Too many open files)"

Defects
Tim Weigel
2015-05-16
2015-05-18
  • Tim Weigel

    Tim Weigel - 2015-05-16

    Version 2.5.2.2 on linux 2.6 platform.

    Repeatable - always.

    Workaround - yes: download file and import using the instructions at "Help:Import/Using data dump files" in the section "Import through Help:Import/List".

    Situation: On build of full English wikipeida using a manual import the script completes, but consistently throws the errors onto the command console indicating java has run out of available file handles. Not sure of the relevance, but this seems to be consistently occurring when csv file #202 gets read (see the thrown errors, below).

    Possible Causes: The governing script for the manual load may not be closing file handles as required (I have not reviewed or tried to debug the script) and is thus leaking handle requirements into the queue causing it to fill to capacity. This assumption derives from the fact the wiki builds without errors when the downloaded bz2 file is run through the "Import through Help:Import/List" process rather than the manual process.

    Errors Thrown: The following error(s) appear on the command console (one or more times):

    1 class java.io.FileNotFoundException /media/Seagate Expansion Drive/Wiki Repository/xowa/wiki/en.wikipedia.org/tmp/text.cat.link/sort/0000000202.csv (Too many open files) <java.io.FileNotFoundException>


    2 file open failed:class java.io.FileNotFoundException /media/Seagate Expansion Drive/Wiki Repository/xowa/wiki/en.wikipedia.org/tmp/text.cat.link/sort/0000000202.csv (Too many open files) <gplx.ios.IoException>
    @url /media/Seagate Expansion Drive/Wiki Repository/xowa/wiki/en.wikipedia.org/tmp/text.cat.link/sort/0000000202.csv

    3 class java.io.FileNotFoundException /media/Seagate Expansion Drive/Wiki Repository/xowa/wiki/en.wikipedia.org/tmp/text.cat.link/sort/0000000202.csv (Too many open files) <java.io.FileNotFoundException>


       at gplx.Err.hdr_(Unknown Source)
       at gplx.Err.exc_(Unknown Source)
       at gplx.Err_.err_(Unknown Source)
       at gplx.xowa.Xoi_cmd_base.Process_async(Unknown Source)
       at gplx.xowa.Xoi_cmd_base.Invk(Unknown Source)
       at gplx.core.threads.Thread_adp.run(Unknown Source)
       at java.lang.Thread.run(Unknown Source)
    

    (SWT:32092): Gtk-CRITICAL **: IA__gtk_message_dialog_new: assertion `parent == NULL || GTK_IS_WINDOW (parent)' failed
    Exception in thread "main" org.eclipse.swt.SWTError: No more handles
    at org.eclipse.swt.SWT.error(Unknown Source)
    at org.eclipse.swt.SWT.error(Unknown Source)
    at org.eclipse.swt.SWT.error(Unknown Source)
    at org.eclipse.swt.widgets.Dialog.error(Unknown Source)
    at org.eclipse.swt.widgets.MessageBox.open(Unknown Source)
    at gplx.gfui.Swt_dlg_msg.Ask(Unknown Source)
    at gplx.gfui.Swt_dlg_msg.run(Unknown Source)
    at org.eclipse.swt.widgets.Synchronizer.syncExec(Unknown Source)
    at org.eclipse.swt.widgets.Display.syncExec(Unknown Source)
    at gplx.gfui.Swt_kit.Ask_ok(Unknown Source)
    at gplx.xowa.gui.Xoa_gui_mgr.Run(Unknown Source)
    at gplx.xowa.Xoa_app_boot_mgr.Run_app(Unknown Source)
    at gplx.xowa.Xoa_app_boot_mgr.Run(Unknown Source)
    at gplx.xowa.Xoa_app_.Run(Unknown Source)
    at gplx.xowa.Xowa_main.main(Unknown Source)

     
  • gnosygnu

    gnosygnu - 2015-05-17

    Hi! Thanks for the report.

    That's strange. I checked the code, and it looks like I do close file handles (RandomAccessFile.close)

    Out of curiosity, what distribution do you use? I haven't seen this on my Linux boxes when I do a build (I used to use KUbuntu 13.04 and currently use openSUSE 13.1).

    You can try raising the file limit command through ulimit. See http://posidev.com/blog/2009/06/04/set-ulimit-parameters-on-ubuntu/

    Ultimately, I have to rewrite the category parsing code to not create text files but write directly to a SQLite database. I'm aiming to do this in the next few months, but I'm hoping the ulimit workaround works for now.

    Let me know how it goes. Thanks.

     
  • Tim Weigel

    Tim Weigel - 2015-05-17

    I'm using Mandriva 2011 (a little old I know, but it serves me well), ulimit -n shows 1024 file handles. The build works with the downloaded bz2 file if I use the Import/List function (so I've actually built the database completely without errors that way).

    Unfortunately, ulimit does not seem like it can be changed by a user, only root (I tried both), and then only for the superuser session (on exit, reverts to the standard hard limit of 1024). Seems like a /etc/security/limits.conf edit would be required (likely along with a restart of the userid to read in the new script); there are production processes that would be interrupted doing that. I suppose I could try doing the experiment in a sandbox (VirtualBox), but as I said I did get the database to build correctly.

     
  • Tim Weigel

    Tim Weigel - 2015-05-17

    After some sleep, it occurs to me that it would be difficult to sandbox this (I've have to replicate the host machine exactly, including all open processes), and I'm not really willing to take the machine down to mess with limits.conf, so I'd like to suggest that if others run into the same problem, the database will build completely and correctly by using the procedure under "Import through Help:Import/List" on the "Help:Import/Using data dump files" page in xowa.

     
  • gnosygnu

    gnosygnu - 2015-05-17

    the database will build completely and correctly by using the procedure under "Import through Help:Import/List"

    Sorry, I should have commented on this earlier. Help:Import/List doesn't build Categories. This is because I felt most users don't want a full-fledged Category system, since it adds to the time (1+ hour) and space (10 GB) of the import. The only way to build Categories is through Help:Import/Script or using a command-line script (Help:Import/Command-line)

    I'm using Mandriva 2011 (a little old I know, but it serves me well), ulimit -n shows 1024 file handles.
    Unfortunately, ulimit does not seem like it can be changed by a user, only roo

    Ok. Thanks for the info. Basically, I need to fix the "Too many files error" in order for categories to build on these systems. I am planning to do this in a near release, because I want to reduce the disk space from 10 GB and will take the opportunity to rewrite the category import.

    In the meantime, you'll need to build on a different system. Or you can also download the category files from here: https://archive.org/details/Xowa_enwiki_latest

    Hope this helps.

     
  • Tim Weigel

    Tim Weigel - 2015-05-18

    I've tried closing a number of nice to have but not necessary processes on the machine; will try a rebuild - will let you know how it goes. If that doesn't work, will build on Windows. I think all I need is the Windows compliant java engine; it looks like everything else is "platform agnostic".

     
  • gnosygnu

    gnosygnu - 2015-05-18

    If that doesn't work, will build on Windows. I think all I need is the Windows compliant java engine; it looks like everything else is "platform agnostic".

    Yup, all the data files are platform agnostic. The only things that depend on a specific platform are located in /xowa/bin/platform_name. None of these are data files

    Let me know if you need anything else. Thanks!

     

Anonymous
Anonymous

Add attachments
Cancel