Library dies with System.exit(1) --> unusable

  • Peter Becker

    Peter Becker - 2003-07-03

    Hi all,

    at the moment we are evaluating Multivalent as document indexing tool for Unfortunately it won't work in its current version, since the pdf extraction dies on us. We tried running on a larger document collection and there is at least one document were an ArrayIndexOutOufBoundsException is thrown ( and then caught ( which causes a stack trace print and a System.exit. And we were wondering why our exception management doesn't work :-(

    In my opinion there should be no calls to System.exit() in a library at all. Java handles Exceptions nicely and letting the program die doesn't help -- in our case we'd happily keep indexing all the other documents and just give some UI feedback that this particular document could not be indexed. We don't expect full coverage, but dying in the middle of the indexing process is not an option.

    We are definitely interested in trying your tool more since it offers interesting file formats and PDFBox doesn't work 100% either. I guess we'll fix that problem by hacking through the code ourself but I hope you agree that this is something you should change in the official releases. I'd say it will be a PITA for many people -- including yourself -- otherwise.


    PS: since we did run into other problems I wonder if you want bug reports and were. The bug tracker seems to be rather empty.

    • Tom Phelps

      Tom Phelps - 2003-07-04

      > there should be no calls to System.exit() in a library

      Good point.  Those there now should never be triggered by any valid PDF, but nevertheless.

      You can post bugs in the bugs tracker or the message boards.  If there is a specific document that should work but does not, then the most effective way for me to diagnose the situation is for you to email me a copy.

      • Peter Becker

        Peter Becker - 2003-07-04

        My point is that doing a System.exit(..) is pointless unles you are in the outermost code level. Otherwise Exception handling is so much superiour. I don't see a single advantage of doing a System.exit(..) instead of throwing an exception, but as the example shows there are some serious drawbacks.

        The only places I use System.exit(..) are:
        - window close on the main windows -- and I blame Swing for having to do that
        - command line tools that carry a message in the exit number to allow integration in a script. In this case the System.exit calls will be very close to the main method, not deep in the code somewhere

        Every other use of the call seems dodgy to me.

        Back to the original topic: the problem we have occurs in a PDF Acrobat Reader opens happily. Unfortunately we can't just give you these documents since they appear when crawling our companies network drives and are usually not meant for outside usage. We will provide stack traces later on, at the moment we are still working on the framework for the indexing code. Once that is done we'll start doing major indexing runs again and we'll submit bug reports to all projects involved. No sample files, but hopefully the stack traces will help.

        BTW: a cool one was multivalent killing the 1.4.0_01-b03 JVM. Some native awt.font code crashed. Upgrading the JVM fixed that one, though -- are you still interested in a bug report? It won't be too helpful since a dead JVM doesn't produce stack traces and we won't have enough time to step through your code.


    • Tom Phelps

      Tom Phelps - 2003-07-05

      > Some native awt.font code crashed. Upgrading the JVM fixed that one, though

      The issue there was that Java's TrueType parser-renderer would crash if the font had certain bad data.  Multivalent merely shoveled the font as embedded in PDF over to Java.  A minimal test case that showed the bug have been confined to Java itself and, had it not already been reported and fixed, would have been appropriate for Sun.

      > Unfortunately we can't just give you these documents since they appear when

      I appreciated that you can't distribute sensitive documents, and some bugs are probably obvious errors that a stack trace would point to.  And if you fix a bug I can probably verify it without a sample PDF.  However, for me to locate complicated bugs and for regression testing some sample PDF that exhibits the issue is indispensible.

    • Ed Randall

      Ed Randall - 2003-10-21

      One of the reasons I found it dies in a servlet, is that it needs to create a preferences directory in {user.home}/.Multivalent;  in our servlet environment that was /home/www/Multivalent and /home/www doesn't exist.  A quick hack adding to the Tomcat startup script the variable:
      JAVA_OPTS="-Duser.home=/tmp"; export JAVA_OPTS
      has enabled it to run in a servlet doing text extraction from uploaded .pdf files.

      There are also a lot of calls to System.exit and Utility.error which in turn does a System.exit, I'd very much like to see these removed - perhaps some separation of the "browser" functionality from the "library" code to make 2 distinct layers would be a nice goal.

    • Tom Phelps

      Tom Phelps - 2003-10-22

      Thanks for the note on servlets; I've added this information to the how to run the browser.

      I have tried to separate out PDF reading and writing as its own library independent of the browser.  All of the PDF manipulation tools use it that way.  I like to keep System.exit for my own use, so that when I'm browsing PDFs and one of these serious errors happens, it will fail for me in a way I'm sure not to overlook.  However, as far as I'm aware, all the System.exit's for PDF are now under control of the global debug flag multivalent.Multivalent.DEVEL, which is off in distributed source and compiled code -- in other words, you'd have to track it down and change it to see a System.exit in PDF.  If I have overlooked a System.exit in PDF, please let me know.

      • Peter Becker

        Peter Becker - 2003-10-22

        This is what exception handling is for. If you wouldn't constantly catch all exceptions or at least recast them -- and if only as RuntimeExceptions -- the problem wouldn't occur. And not only can any user of your code handle issues by catching the Exceptions, it would also add a lot into the debugging process -- any reasonable IDE gives you a jump to code for each level in the stack trace. And I hope you use a good IDE since Java without code completion, templates, debugger, refactoring and so on is a pain. I've met too many people who compared C++ IDEs with vi and decided that IDEs have to be crap independent of the language. So wrong.

        • Ed Randall

          Ed Randall - 2003-10-23

          Personally I hate IDEs.  I find I am most productive using Unix with a particular text editor, nedit, to which I have added certain customisations such as implementing interfaces and creating templates for new files, but little else.  I guess I just like what I'm used to.  However I'm sure IDEs have moved on quite a bit in recent years, and I could be missing out - which is your favourite at the moment Peter?

          • Peter Becker

            Peter Becker - 2003-10-23

            Refactoring is the major feature modern IDEs offer, and that requires more than just a bit of syntax parsing. At the moment I use Eclipse since it is pretty good, free and since I plan to write my own little plugin one day. If you ask me for the best I'd still say IntelliJ IDEA -- that thing is just smooth. Not that much difference in terms of features, but IDEA beats Eclipse in neatness. It's $500 for a commerical single-user licence, though. Not too dear, but still $500 more. There are trial licenses and usually they have some early access programs.



    • Ed Randall

      Ed Randall - 2003-10-22

      $ find src/multivalent src/phelps -type f -name \*.java -print | xargs egrep -n '^[^/]+System.exit'


      src/multivalent/devel/            catch (FileNotFoundException fnfe) { System.err.println("can't write to "+argv[argi]+": "+fnfe); System.exit(1); }
      src/multivalent/devel/        else if (arg.startsWith("-help")) { System.out.println(USAGE); System.exit(0); }
      src/multivalent/devel/        else { System.err.println(USAGE); System.exit(1); }
      src/multivalent/devel/    System.exit(0);
      src/multivalent/devel/        System.exit(0); // just a quick, clean shutdown, which C-c from Cygwin doesn't do
      src/multivalent/devel/      public void actionPerformed(ActionEvent e) { System.exit(0); }
      src/multivalent/devel/    System.exit(1);
      src/multivalent/    try { newp=(INode)p.clone(); } catch (CloneNotSupportedException cant) { System.out.println("can't clone anymore guys! "+cant); System.exit(1); }
      src/multivalent/            System.exit(1);
      src/multivalent/    System shutdown, in this sequence: shuts down all browsers, writes preferences, <code>System.exit(0)</code>.
      src/multivalent/    System.exit(0);
      src/multivalent/        System.exit(0);
      src/multivalent/std/adaptor/    System.exit(0);
      src/multivalent/std/adaptor/                    ae.printStackTrace(); System.out.println(src+" vis-a-vis "+baseURI_+", len="+src.length()+", char 67="+(int)src.charAt(67)); System.exit(1);
      src/multivalent/std/adaptor/    System.exit(0); continue;
      src/multivalent/std/adaptor/    System.exit(1);
      src/multivalent/std/adaptor/pdf/        if (PDF.DEBUG) System.exit(1);
      src/multivalent/std/adaptor/pdf/ (K>0) { PDF.sampledata("Group 3 mixed");    System.exit(0); }
      src/multivalent/std/adaptor/pdf/            if (DEBUG) { pe.printStackTrace(); System.exit(1); }
      src/multivalent/std/adaptor/pdf/        if (DEBUG) { fail.printStackTrace(); System.exit(1); }
      src/multivalent/std/adaptor/            System.exit(0);
      src/multivalent/std/ (rootStyle_==null) { System.out.println("rootStyle_=null in semEvAf"); System.exit(1); }
      src/phelps/lang/reflect/            System.exit(0);
      src/phelps/net/        System.exit(1);
      src/phelps/    Print message to System.err, exit via <code>System.exit(1)</code>.
      src/phelps/  public static void error(String msg) { System.err.println("FATAL ERROR: "+msg); System.exit(1); }

      One also needs to intercept calls to Utility.error to be totally safe:

      src/multivalent/        Utility.error("couldn't instantiate "+bname+" -- is it abstract?");
      src/multivalent/        Utility.error(bname+": "+e+" -- perhaps class or constructor needs to be public");
      src/multivalent/        Utility.error("unanticipated error while restoring "+logicalname+"/"+bname+": "+e);
      src/multivalent/        if (!dir.mkdirs()) Utility.error("Couldn't create "+userdir_+" for permanent files");
      src/multivalent/    } else if (!dir.canRead()) Utility.error("Can't read "+userdir_);
      src/multivalent/        if (!dir.mkdirs()) Utility.error("Couldn't create cache at "+tmpdir_);
      src/multivalent/    } else if (!dir.canRead()) Utility.error("Can't read "+tmpdir_);

      • Tom Phelps

        Tom Phelps - 2003-10-22

        Well, I can do a grep too.  I'd just like to read into the record for future reference that some of those System.exit()s are exactly right.  Since the browser starts the Java/OS event loop, it requires one to stop, in Multivalent and Embed.  The one in Debug is in response to a requeset to do exactly that.  The ones in Check are part of command line option checking.  Some others are guarded by a DEBUG flag which is turned off on builds and source code distributions.

        However, the one in needs a DEBUG guard.  I'll look through the others and see about dropping System.exit.

        • Ed Randall

          Ed Randall - 2003-10-23

          Tom, sorry I didn't intend to intimate that you didn't know how to use grep, I was merely trying to help by showing exactly how I got those results.

          Personally I'm trying to use the library as exactly that, a library, and I don't want my program to exit unless I say so!  From my standpoint it would be much preferable to consistently throw some kind of  "MultivalentException" subclass which you can catch and handle how you want to for your application, and me likewise.



    • Tom Phelps

      Tom Phelps - 2003-10-22

      Peter, back in July you mentioned other bugs, but you didn't post details either on the message boards or in bugs.  Is Docco happy with Multivalent now?  Does it still crash on that PDF?  Can you post that PDF to the bugs area (or email it)?  If you're indexing PDFs with wild layouts, perhaps the -layout flag to the ExtractText class helps.  Besides System.exit and I'm guessing support for Type 0 and CID fonts, what deficiencies remain?

      • Peter Becker

        Peter Becker - 2003-10-22

        We got Docco working with Multivalent after a long time of hacking -- mostly the removal of all System.exit calls. Fixing the exception handling would have been better and then we had other things to do.

        We never got to much testing and the PDFs we used were documents of other workgroups in our company at that time. We probably wouldn't have been able to give them out and by now I am not with that company anymore.

        You can get Docco and the Multivalent plugin here if you want to test it yourself:

        There is also a PDFBox plugin which might be handy for comparisons. On the small collection we used for quick testing both succeeded. But that was only about 90 PDFs.



Log in to post a comment.