Menu

how to parse a file from disk without net

Help
Anonymous
2004-07-13
2004-07-16
  • Anonymous

    Anonymous - 2004-07-13

    I have experienced that the parser can not read files even from a local hard disk if there is no live internet connection present. Is there any workaround for this situation?

     
    • Rodney S. Foley

      Rodney S. Foley - 2004-07-13

      I couldn't reproduce this.  I have an HTML page on my local hard drive and loaded it up and use HTMLParser to display all the links and it worked fine with out any network connected.

      Could you provide a simple test case (source code) that shows this happening?

       
    • Anonymous

      Anonymous - 2004-07-14

      Hi!

      My source code is:

      /* load the given HTML file to the parser */
      Parser parser = null;
      try {
            parser = new Parser(filename, Parser.noFeedback);
      } catch (ParserException ex) {
            NewsProcessorServer.logger.warning("Error creating the parser (file error?):\n" + ex.toString());
            return null;
      }

      The error message I get on my logger is:
      org.htmlparser.util.ParserException: setConnection() : Error in opening a connection to ftp://localhostC/temp/np/workingdir/elcoteq.htm

      In the meanwhile I made a test and I have to correct myself: it seems to be working on a computer that has properly installed and configured network connection even if the net is actually not alive. (It still works if I unplug the network cable for my desktop machine.) The problem occurs on machines whithout (this can be the point!) TCP/IP installed. Which seems to be understandable, since the file access goes through - as I see - FTP even to the local hard drive!

       
      • Derrick Oswald

        Derrick Oswald - 2004-07-14

        The ftp:// is of course bogus.

        Not many of us have machines that haven't got a tcp/ip stack, so your machine may be the only reproducible test case.
        Can you print out the embedded exception in the ParserException above.  Maybe there is a specific IOException that can be trapped (i.e. no network installed), and alternate code developed to use the file contents without relying on opening a URL.

         
        • Rodney S. Foley

          Rodney S. Foley - 2004-07-15

          Derrick,

          Why is there even an ftp:// protocol listed on the file anyway?  That is not a valid ftp:// on a localhost anyway, wouldn't even work on a machine with a TCP/IP stack that had that listed directory structure on the c drive.  When I use the parser to load the file it uses:  file://localhost/C:/<file> not ftp as shown above.  And all I passed was C:\<file>.

          Arpi,

          Are you sure your not passing in the ftp://localhostc<file> into the parser and not C:\<where ever file is>?

          Or did you doctor the file name for privacy and made a mistake in how it is being shown?

          If you dont mind try this test case and workaround - download this zip package:

          http://www.adadenterprises.com/ParseFileTest.zip

          It contains a ParseFileTest.java and a test.html  the test.html is hard coded to the c:\ in the java file. (You need to make sure . and htmlparser.jar are in your class path)

          It has TWO test methods:

          testParserLoadsFile()
          testILoadFile()

          The first one has the parser load the file.  The second one has the ParseFileTest load the file and pass the string to the Lexer and then the Lexer to the Parser.

          So you can use this file to double check the problem you are having just do the test from the root of your c: on the box without the tcp/ip stack.  If testParserLoadsFile works, then you this as an example to see where there is an error in your code.  If it fails, then we have a simple test case that fails on your box that has no TCP/IP stack. I can setup a box without a TCP/IP stack and use this to help Derrick debug this problem if he wants.

          Then try the testILoadFile and it should work since it doesn't let the parser load the file.  You can use this as an example for your workaround for the problem.

          BTW, what is your COMPUTERs environment? (OS, etc)

          -Rodney

           
          • Anonymous

            Anonymous - 2004-07-15

            Seems to be that the problem occurs only with version 1.3 of htmlparser. Look at this:

            ======= source code for p.java:
            import org.htmlparser.*;

            public class p
            {
                private static final String HTML_FILE = "C:\\test.html";
                public static void main(String[] args) {
                    try  {
                    Parser parser = new Parser("c:/test.html");
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            }

            ======= "screenshot" on the machine without net:
            C:\>java -cp .;htmlparser14.jar p
            INFO: file://localhost/C:/test.html

            C:\>java -cp .;htmlparser13.jar p
            INFO: file://localhostC:/test.html
            ERROR: setConnection() : Error in opening a connection to ftp://localhostC/test.
            html
            org.htmlparser.util.ParserException: setConnection() : Error in opening a connec
            tion to ftp://localhostC/test.html;
            java.net.UnknownHostException: localhostC
                    at java.net.PlainSocketImpl.connect(Unknown Source)
                    at java.net.Socket.connect(Unknown Source)
                    at java.net.Socket.connect(Unknown Source)
                    at sun.net.NetworkClient.doConnect(Unknown Source)
                    at sun.net.NetworkClient.openServer(Unknown Source)
                    at sun.net.ftp.FtpClient.openServer(Unknown Source)
                    at sun.net.ftp.FtpClient.<init>(Unknown Source)
                    at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
                    at org.htmlparser.Parser.setConnection(Parser.java:469)
                    at org.htmlparser.Parser.<init>(Parser.java:340)
                    at org.htmlparser.Parser.<init>(Parser.java:355)
                    at org.htmlparser.Parser.<init>(Parser.java:365)
                    at p.main(p.java:16)

            C:\>

            The string in the feedback of the parser (v1.3) is "file://localhostC:/test.html" (which is already bogous), while the string in the exception is "ftp://localhostC/test.
            html"! With version 1.4 there is no problem.

            But the really strange thing is that on my desktop machine both are working, even with the wrong string:

            ======= "screenshot" on my desktop machine:
            C:\aarpi\lab\x>java -cp .;htmlparser13.jar p
            INFO: file://localhostC:/test.html

            C:\aarpi\lab\x>java -cp .;htmlparser14.jar p
            INFO: file://localhost/C:/test.html

            C:\aarpi\lab\x>

            I use WinXP on my desktop machine and Win98 on the other machine. Feel free to ask about further details if you are about to further investigate this issue.

            Greets,
            Arpi

             
            • Rodney S. Foley

              Rodney S. Foley - 2004-07-15

              Arpi,

              If version 1.4 is working I do not know if there is an issue to report.  I am not sure if Derrick is supporting bug fixes to previous versions.  Since it appears to have been fixed in 1.4.

              Unless you have to for some requirement to use 1.3, you should use the latest 1.41 in your project.

              -Rodney

               
              • Anonymous

                Anonymous - 2004-07-16

                I'll do the upgrade as soon as I'll have time to adapt my code to v1.4.
                Thx and best regards,
                Arpi

                 
                • Rodney S. Foley

                  Rodney S. Foley - 2004-07-16

                  Don't forget I provide a work around in that java file that should work with 1.3. until you have time to upgrade to 1.4.

                   
            • Derrick Oswald

              Derrick Oswald - 2004-07-16

              Is there a JVM version difference?

              Win98 is a bit long in the tooth. I seem to remember, a while ago, problems with early JVM's on Windows, when the network failed to load, but I blamed it then on the Novel IPX stack (OK, really, really early versions of Windows).

               
              • Anonymous

                Anonymous - 2004-07-16

                There is a slight difference in the JVM version:
                WinXP machine: 1.4.1_02
                Win98 machine: 1.4.2_05

                 
    • Anonymous

      Anonymous - 2004-07-14

      The stack trace is:

      org.htmlparser.util.ParserException: setConnection() : Error in opening a connection to ftp://localhostC/temp/np/workingdir/elcoteq.htm;
      java.net.UnknownHostException: localhostC
              at java.net.PlainSocketImpl.connect(Unknown Source)
              at java.net.Socket.connect(Unknown Source)
              at java.net.Socket.connect(Unknown Source)
              at sun.net.NetworkClient.doConnect(Unknown Source)
              at sun.net.NetworkClient.openServer(Unknown Source)
              at sun.net.ftp.FtpClient.openServer(Unknown Source)
              at sun.net.ftp.FtpClient.<init>(Unknown Source)
              at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
              at org.htmlparser.Parser.setConnection(Parser.java:469)
              at org.htmlparser.Parser.<init>(Parser.java:340)
              at org.htmlparser.Parser.<init>(Parser.java:355)
              at hu.sztaki.rfo.newsprocessor.IOManager.importHTML(IOManager.java:268)
              at hu.sztaki.rfo.newsprocessor.IOManager.importRecords(IOManager.java:21
      0)
              at hu.sztaki.rfo.newsprocessor.IOManager.run(IOManager.java:165)

      Thanks for your help!

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.