hi,
I'm trying to parse the following page: http://tijdnet.tijd.be/koersen/index.asp?page=fundamentals&view=kerncijfers&ID=60114918
I'm almost certain that the problem is that this page sets cookies. (When I block cookies from this website it also doesn't load in firefox/ie)
I read the docs about cookies but the example was how to set cookies and not accept them.
Can anyone help me out plz?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am having problems with the cookie processing as well. I am using the 1.6 version. The following is the code I am using with defaulturl as a String of the URL. Upon executing the code I get back a page from the site that tells me cookies must be enabled. Any pointers are greatly appreciated. Any Ideas?
CODE:
parser = new Parser();
parser.getConnectionManager().setCookieProcessingEnabled(true);
try {
url = new URL(defaulturl);
connection = url.openConnection();
My code works fine when running on my machine (windows Xp) but on the unix server I get this error
parsing url: http://www.google.de/search?num=100&as_q=etf&start=0
java.lang.NullPointerException
at org.htmlparser.http.ConnectionManager.addCookies(ConnectionManager.java:894)
at org.htmlparser.http.ConnectionManager.addCookies(ConnectionManager.java:866)
at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:604)
at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:792)
at org.htmlparser.Parser.setURL(Parser.java:341)
I checked the src code of
ConnectionManager.java:894
----------------------------------------------
893 cookie = (Cookie)cookies.elementAt (i);
894 if (cookie.getExpiryDate ().before (now))
895 {
cookies.remove (i);
i--; // dick with the loop variable
}
----------------------------------------------
so why is it null pointer exception
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just wanted to follow up with a little bit more information as I am trying to figure this out. I am running Java 1.5b6 for the VM.
Also it appears that the ConnectionManager thinks it is handling the cookies for when I added the following two lines I got "Cookies? true" in my output.
Am I under the correct assumption that htmlparser automatically handles the receiving and sending of cookies in this mode or is there some other piece that I have missed? Sorry I am new to htmlparser.. Thanks for any help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hmmm.
The cookie is probably being dropped when the expiry time is less that the current time. It looks like the time processing for cookies needs to account for time zone or something like that.
Looks like you should file a bug.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have looked at that a little more by capturing some network traffic and you may be on to something with that. I will need to go back and research the cookie thing a little more. But from a packet capture the server is a IIS 5.0 server and this is what is coming back from the server:
Does the Date and the Expires being a minute earilier seem to go along with your hunch? I am going to research cookies a bit more in detail and look at the ConnectionManager/Cookie Control as well.
Thanks for your help I will submit a bug report when I can give a good record of what is happening.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi,
I'm trying to parse the following page: http://tijdnet.tijd.be/koersen/index.asp?page=fundamentals&view=kerncijfers&ID=60114918
I'm almost certain that the problem is that this page sets cookies. (When I block cookies from this website it also doesn't load in firefox/ie)
I read the docs about cookies but the example was how to set cookies and not accept them.
Can anyone help me out plz?
I think you just need to use:
parser.getConnectionManager().setCookieProcessingEnabled(true)
This should accept cookies and on the next access replay them to the host that sets them. If this doesn't work, let me know.
Thanks for your reply!
However, I allready tried that and it didn't seem to work.
This is my code:
ConnectionManager manager = Parser.getConnectionManager();
manager.setCookieProcessingEnabled(true);
ConnectionMonitor monitor = new ConnectionMonitor() {
public void preConnect(final HttpURLConnection connection) {
System.out.println(ConnectionManager.getRequestHeader(connection));
}
public void postConnect(HttpURLConnection connection) {
System.out.println(ConnectionManager.getResponseHeader(connection));
}
};
manager.setMonitor(monitor);
Parser parser = new Parser("http://www.tijd.be/koersen/index.asp?page=fundamentals&view=kerncijfers&ID=60114918");
When I look at the output it seems a cookie is actually set but the expiry date is set to 2hours less than the current time.
Help is greatly appreciated.
Hello-
I am having problems with the cookie processing as well. I am using the 1.6 version. The following is the code I am using with defaulturl as a String of the URL. Upon executing the code I get back a page from the site that tells me cookies must be enabled. Any pointers are greatly appreciated. Any Ideas?
CODE:
parser = new Parser();
parser.getConnectionManager().setCookieProcessingEnabled(true);
try {
url = new URL(defaulturl);
connection = url.openConnection();
parser.setConnection(connection);
for (NodeIterator iterator = parser.elements (); iterator.hasMoreNodes (); )
System.out.println (iterator.nextNode ());
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ParserException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Hi,
My code works fine when running on my machine (windows Xp) but on the unix server I get this error
parsing url: http://www.google.de/search?num=100&as_q=etf&start=0
java.lang.NullPointerException
at org.htmlparser.http.ConnectionManager.addCookies(ConnectionManager.java:894)
at org.htmlparser.http.ConnectionManager.addCookies(ConnectionManager.java:866)
at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:604)
at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:792)
at org.htmlparser.Parser.setURL(Parser.java:341)
I checked the src code of
ConnectionManager.java:894
----------------------------------------------
893 cookie = (Cookie)cookies.elementAt (i);
894 if (cookie.getExpiryDate ().before (now))
895 {
cookies.remove (i);
i--; // dick with the loop variable
}
----------------------------------------------
so why is it null pointer exception
Just wanted to follow up with a little bit more information as I am trying to figure this out. I am running Java 1.5b6 for the VM.
Also it appears that the ConnectionManager thinks it is handling the cookies for when I added the following two lines I got "Cookies? true" in my output.
Parser.getConnectionManager().setCookieProcessingEnabled(true);
System.out.println("Cookies? "+ Parser.getConnectionManager().getCookieProcessingEnabled());
Am I under the correct assumption that htmlparser automatically handles the receiving and sending of cookies in this mode or is there some other piece that I have missed? Sorry I am new to htmlparser.. Thanks for any help.
Hmmm.
The cookie is probably being dropped when the expiry time is less that the current time. It looks like the time processing for cookies needs to account for time zone or something like that.
Looks like you should file a bug.
Thanks Derrick-
I have looked at that a little more by capturing some network traffic and you may be on to something with that. I will need to go back and research the cookie thing a little more. But from a packet capture the server is a IIS 5.0 server and this is what is coming back from the server:
http.server = Server: Microsoft-IIS/5.0\r\n
http.date = Date: Sun, 11 Dec 2005 16:08:14 GMT\r\n
http.location = Location: sbaweb/error.asp?err=nocookies\r\n
HTTP/1.1 302 Object moved.
.Server: Microsoft-IIS/5.0.
.Date : Sun, 11 Dec 2005 16:08:14 GMT.
.X-Powered-By: ASP.NET.
.Location: sbaweb/error.asp?err=nocookies
.Content-Length: 121.
.Content-Type: text/html.
.Expires: Sun, 11 Dec 2005 16:07:14 GMT.
.Set-Cookie: ASPSESSIONIDSSBAQSAC=KKLNOLPCOFKHHJOBPPMGBJHH; path=/.
.Cache-control: no-cache.
Does the Date and the Expires being a minute earilier seem to go along with your hunch? I am going to research cookies a bit more in detail and look at the ConnectionManager/Cookie Control as well.
Thanks for your help I will submit a bug report when I can give a good record of what is happening.