I have experienced that the parser can not read files even from a local hard disk if there is no live internet connection present. Is there any workaround for this situation?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I couldn't reproduce this. I have an HTML page on my local hard drive and loaded it up and use HTMLParser to display all the links and it worked fine with out any network connected.
Could you provide a simple test case (source code) that shows this happening?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-07-14
Hi!
My source code is:
/* load the given HTML file to the parser */
Parser parser = null;
try {
parser = new Parser(filename, Parser.noFeedback);
} catch (ParserException ex) {
NewsProcessorServer.logger.warning("Error creating the parser (file error?):\n" + ex.toString());
return null;
}
The error message I get on my logger is:
org.htmlparser.util.ParserException: setConnection() : Error in opening a connection to ftp://localhostC/temp/np/workingdir/elcoteq.htm
In the meanwhile I made a test and I have to correct myself: it seems to be working on a computer that has properly installed and configured network connection even if the net is actually not alive. (It still works if I unplug the network cable for my desktop machine.) The problem occurs on machines whithout (this can be the point!) TCP/IP installed. Which seems to be understandable, since the file access goes through - as I see - FTP even to the local hard drive!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Not many of us have machines that haven't got a tcp/ip stack, so your machine may be the only reproducible test case.
Can you print out the embedded exception in the ParserException above. Maybe there is a specific IOException that can be trapped (i.e. no network installed), and alternate code developed to use the file contents without relying on opening a URL.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Why is there even an ftp:// protocol listed on the file anyway? That is not a valid ftp:// on a localhost anyway, wouldn't even work on a machine with a TCP/IP stack that had that listed directory structure on the c drive. When I use the parser to load the file it uses: file://localhost/C:/<file> not ftp as shown above. And all I passed was C:\<file>.
Arpi,
Are you sure your not passing in the ftp://localhostc<file> into the parser and not C:\<where ever file is>?
Or did you doctor the file name for privacy and made a mistake in how it is being shown?
If you dont mind try this test case and workaround - download this zip package:
It contains a ParseFileTest.java and a test.html the test.html is hard coded to the c:\ in the java file. (You need to make sure . and htmlparser.jar are in your class path)
It has TWO test methods:
testParserLoadsFile()
testILoadFile()
The first one has the parser load the file. The second one has the ParseFileTest load the file and pass the string to the Lexer and then the Lexer to the Parser.
So you can use this file to double check the problem you are having just do the test from the root of your c: on the box without the tcp/ip stack. If testParserLoadsFile works, then you this as an example to see where there is an error in your code. If it fails, then we have a simple test case that fails on your box that has no TCP/IP stack. I can setup a box without a TCP/IP stack and use this to help Derrick debug this problem if he wants.
Then try the testILoadFile and it should work since it doesn't let the parser load the file. You can use this as an example for your workaround for the problem.
BTW, what is your COMPUTERs environment? (OS, etc)
-Rodney
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-07-15
Seems to be that the problem occurs only with version 1.3 of htmlparser. Look at this:
======= source code for p.java:
import org.htmlparser.*;
public class p
{
private static final String HTML_FILE = "C:\\test.html";
public static void main(String[] args) {
try {
Parser parser = new Parser("c:/test.html");
} catch (Exception e) {
e.printStackTrace();
}
}
}
======= "screenshot" on the machine without net:
C:\>java -cp .;htmlparser14.jar p
INFO: file://localhost/C:/test.html
C:\>java -cp .;htmlparser13.jar p
INFO: file://localhostC:/test.html
ERROR: setConnection() : Error in opening a connection to ftp://localhostC/test.
html
org.htmlparser.util.ParserException: setConnection() : Error in opening a connec
tion to ftp://localhostC/test.html;
java.net.UnknownHostException: localhostC
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.NetworkClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.<init>(Unknown Source)
at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
at org.htmlparser.Parser.setConnection(Parser.java:469)
at org.htmlparser.Parser.<init>(Parser.java:340)
at org.htmlparser.Parser.<init>(Parser.java:355)
at org.htmlparser.Parser.<init>(Parser.java:365)
at p.main(p.java:16)
C:\>
The string in the feedback of the parser (v1.3) is "file://localhostC:/test.html" (which is already bogous), while the string in the exception is "ftp://localhostC/test.
html"! With version 1.4 there is no problem.
But the really strange thing is that on my desktop machine both are working, even with the wrong string:
======= "screenshot" on my desktop machine:
C:\aarpi\lab\x>java -cp .;htmlparser13.jar p
INFO: file://localhostC:/test.html
C:\aarpi\lab\x>java -cp .;htmlparser14.jar p
INFO: file://localhost/C:/test.html
C:\aarpi\lab\x>
I use WinXP on my desktop machine and Win98 on the other machine. Feel free to ask about further details if you are about to further investigate this issue.
Greets,
Arpi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If version 1.4 is working I do not know if there is an issue to report. I am not sure if Derrick is supporting bug fixes to previous versions. Since it appears to have been fixed in 1.4.
Unless you have to for some requirement to use 1.3, you should use the latest 1.41 in your project.
-Rodney
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-07-16
I'll do the upgrade as soon as I'll have time to adapt my code to v1.4.
Thx and best regards,
Arpi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Win98 is a bit long in the tooth. I seem to remember, a while ago, problems with early JVM's on Windows, when the network failed to load, but I blamed it then on the Novel IPX stack (OK, really, really early versions of Windows).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-07-16
There is a slight difference in the JVM version:
WinXP machine: 1.4.1_02
Win98 machine: 1.4.2_05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-07-14
The stack trace is:
org.htmlparser.util.ParserException: setConnection() : Error in opening a connection to ftp://localhostC/temp/np/workingdir/elcoteq.htm;
java.net.UnknownHostException: localhostC
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.NetworkClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.<init>(Unknown Source)
at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
at org.htmlparser.Parser.setConnection(Parser.java:469)
at org.htmlparser.Parser.<init>(Parser.java:340)
at org.htmlparser.Parser.<init>(Parser.java:355)
at hu.sztaki.rfo.newsprocessor.IOManager.importHTML(IOManager.java:268)
at hu.sztaki.rfo.newsprocessor.IOManager.importRecords(IOManager.java:21
0)
at hu.sztaki.rfo.newsprocessor.IOManager.run(IOManager.java:165)
Thanks for your help!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have experienced that the parser can not read files even from a local hard disk if there is no live internet connection present. Is there any workaround for this situation?
I couldn't reproduce this. I have an HTML page on my local hard drive and loaded it up and use HTMLParser to display all the links and it worked fine with out any network connected.
Could you provide a simple test case (source code) that shows this happening?
Hi!
My source code is:
/* load the given HTML file to the parser */
Parser parser = null;
try {
parser = new Parser(filename, Parser.noFeedback);
} catch (ParserException ex) {
NewsProcessorServer.logger.warning("Error creating the parser (file error?):\n" + ex.toString());
return null;
}
The error message I get on my logger is:
org.htmlparser.util.ParserException: setConnection() : Error in opening a connection to ftp://localhostC/temp/np/workingdir/elcoteq.htm
In the meanwhile I made a test and I have to correct myself: it seems to be working on a computer that has properly installed and configured network connection even if the net is actually not alive. (It still works if I unplug the network cable for my desktop machine.) The problem occurs on machines whithout (this can be the point!) TCP/IP installed. Which seems to be understandable, since the file access goes through - as I see - FTP even to the local hard drive!
The ftp:// is of course bogus.
Not many of us have machines that haven't got a tcp/ip stack, so your machine may be the only reproducible test case.
Can you print out the embedded exception in the ParserException above. Maybe there is a specific IOException that can be trapped (i.e. no network installed), and alternate code developed to use the file contents without relying on opening a URL.
Derrick,
Why is there even an ftp:// protocol listed on the file anyway? That is not a valid ftp:// on a localhost anyway, wouldn't even work on a machine with a TCP/IP stack that had that listed directory structure on the c drive. When I use the parser to load the file it uses: file://localhost/C:/<file> not ftp as shown above. And all I passed was C:\<file>.
Arpi,
Are you sure your not passing in the ftp://localhostc<file> into the parser and not C:\<where ever file is>?
Or did you doctor the file name for privacy and made a mistake in how it is being shown?
If you dont mind try this test case and workaround - download this zip package:
http://www.adadenterprises.com/ParseFileTest.zip
It contains a ParseFileTest.java and a test.html the test.html is hard coded to the c:\ in the java file. (You need to make sure . and htmlparser.jar are in your class path)
It has TWO test methods:
testParserLoadsFile()
testILoadFile()
The first one has the parser load the file. The second one has the ParseFileTest load the file and pass the string to the Lexer and then the Lexer to the Parser.
So you can use this file to double check the problem you are having just do the test from the root of your c: on the box without the tcp/ip stack. If testParserLoadsFile works, then you this as an example to see where there is an error in your code. If it fails, then we have a simple test case that fails on your box that has no TCP/IP stack. I can setup a box without a TCP/IP stack and use this to help Derrick debug this problem if he wants.
Then try the testILoadFile and it should work since it doesn't let the parser load the file. You can use this as an example for your workaround for the problem.
BTW, what is your COMPUTERs environment? (OS, etc)
-Rodney
Seems to be that the problem occurs only with version 1.3 of htmlparser. Look at this:
======= source code for p.java:
import org.htmlparser.*;
public class p
{
private static final String HTML_FILE = "C:\\test.html";
public static void main(String[] args) {
try {
Parser parser = new Parser("c:/test.html");
} catch (Exception e) {
e.printStackTrace();
}
}
}
======= "screenshot" on the machine without net:
C:\>java -cp .;htmlparser14.jar p
INFO: file://localhost/C:/test.html
C:\>java -cp .;htmlparser13.jar p
INFO: file://localhostC:/test.html
ERROR: setConnection() : Error in opening a connection to ftp://localhostC/test.
html
org.htmlparser.util.ParserException: setConnection() : Error in opening a connec
tion to ftp://localhostC/test.html;
java.net.UnknownHostException: localhostC
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.NetworkClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.<init>(Unknown Source)
at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
at org.htmlparser.Parser.setConnection(Parser.java:469)
at org.htmlparser.Parser.<init>(Parser.java:340)
at org.htmlparser.Parser.<init>(Parser.java:355)
at org.htmlparser.Parser.<init>(Parser.java:365)
at p.main(p.java:16)
C:\>
The string in the feedback of the parser (v1.3) is "file://localhostC:/test.html" (which is already bogous), while the string in the exception is "ftp://localhostC/test.
html"! With version 1.4 there is no problem.
But the really strange thing is that on my desktop machine both are working, even with the wrong string:
======= "screenshot" on my desktop machine:
C:\aarpi\lab\x>java -cp .;htmlparser13.jar p
INFO: file://localhostC:/test.html
C:\aarpi\lab\x>java -cp .;htmlparser14.jar p
INFO: file://localhost/C:/test.html
C:\aarpi\lab\x>
I use WinXP on my desktop machine and Win98 on the other machine. Feel free to ask about further details if you are about to further investigate this issue.
Greets,
Arpi
Arpi,
If version 1.4 is working I do not know if there is an issue to report. I am not sure if Derrick is supporting bug fixes to previous versions. Since it appears to have been fixed in 1.4.
Unless you have to for some requirement to use 1.3, you should use the latest 1.41 in your project.
-Rodney
I'll do the upgrade as soon as I'll have time to adapt my code to v1.4.
Thx and best regards,
Arpi
Don't forget I provide a work around in that java file that should work with 1.3. until you have time to upgrade to 1.4.
Is there a JVM version difference?
Win98 is a bit long in the tooth. I seem to remember, a while ago, problems with early JVM's on Windows, when the network failed to load, but I blamed it then on the Novel IPX stack (OK, really, really early versions of Windows).
There is a slight difference in the JVM version:
WinXP machine: 1.4.1_02
Win98 machine: 1.4.2_05
The stack trace is:
org.htmlparser.util.ParserException: setConnection() : Error in opening a connection to ftp://localhostC/temp/np/workingdir/elcoteq.htm;
java.net.UnknownHostException: localhostC
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.NetworkClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.<init>(Unknown Source)
at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
at org.htmlparser.Parser.setConnection(Parser.java:469)
at org.htmlparser.Parser.<init>(Parser.java:340)
at org.htmlparser.Parser.<init>(Parser.java:355)
at hu.sztaki.rfo.newsprocessor.IOManager.importHTML(IOManager.java:268)
at hu.sztaki.rfo.newsprocessor.IOManager.importRecords(IOManager.java:21
0)
at hu.sztaki.rfo.newsprocessor.IOManager.run(IOManager.java:165)
Thanks for your help!