I'm stuck trying to get some information automaticly from a site with a login (password + username) I think I need to use the ConnectionManager class, but I can't figure out how to combine that with the parser. I've got this (this gives me the site without being logged in):
[code]
ConnectionManager cm = new ConnectionManager();
cm.setUser(aUsername);
cm.setPassword(aPassword);
Parser parser = new Parser (aSite);
NodeList list = parser.parse (null);
System.out.println(list);
[/code]
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Makes a lot of sense. However this doesn't seem to work. The code you posted gives this warning: "The static method getConnectionManager() from the type Parser should be accessed in a static way" when I change parser.getConnectionManager() to ParsergetConnectionManager() , I still don't get the website which should be displayed when you are logged in.
Thx for the previous reply
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hmmm, I wonder why there is a reference to org.apache.http.client.protocol.ResponseProcessCookies. Surely the server isn't using a client side Java library.
The Cookie class is in with the other classes (org.htmlparser.http) so I can't see why it wouldn't be found. But adding one is only a kludge. You should see where the cookie is being set and why it isn't accepted by the server.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think this conversation will never end unless I mention the specific site: http://www.hellboundhackers.org/challenges/basic.php ( my goal is to complete a challenge, not to abuse this in any way). This site has these cookies: PHPSESSID, fusion_lastvisit, fusion_user, fusion_visited.
Is it normal I experience this much trouble? Most people say it's easy to do it on that site.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And there are more (from another site?) because the response is not the same when I erase these cookies and again hit the site. In other words, it no longer gives me the "you must log in" page now.
Clean your browser cookie cache and hit the site with acookie policy "ask":
PHPSESSID
fusion_visited
VISITOR_INFO1_LIVE - third party (youtube.com)
VISITOR_INFO1_LIVE - third party (youtube.com)
__utma
__utmb
__utmc
__utmz
__utma
__utmb
__utmb (again)
_csoot
_csuid
others:
uid - third party (ad.yieldmanager.com)
AK1 - third party (content.yieldmanager.com)
OAID - third party (acc.depascor.nl)
OAGEO - third party (acc.depascor.nl)
PHPSESSID - third party (beacons.hottraffic.nl)
hotbeacon - third party (beacons.hottraffic.nl)
There's more going on here than your simple example.
If you turn cookie processing on in HtmlParser it should collect all these and you can see them in a debugger.
And all this is without logging in.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I managed to connect to the server using URLConnection and sending the cookie fusion_user. But I can't use the authentication information of my account to login. Very strange...I tried the setRequestProperty for the authentication.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm stuck trying to get some information automaticly from a site with a login (password + username) I think I need to use the ConnectionManager class, but I can't figure out how to combine that with the parser. I've got this (this gives me the site without being logged in):
[code]
ConnectionManager cm = new ConnectionManager();
cm.setUser(aUsername);
cm.setPassword(aPassword);
Parser parser = new Parser (aSite);
NodeList list = parser.parse (null);
System.out.println(list);
[/code]
You don't need a new one...
[code]
Parser parser = new Parser (aSite);
ConnectionManager cm = parser.getConnectionManager ();
cm.setUser(aUsername);
cm.setPassword(aPassword);
NodeList list = parser.parse (null);
System.out.println(list);
[/code]
The message indicates it should be accessed as:
ConnectionManager cm = Parser.getConnectionManager ();
Makes a lot of sense. However this doesn't seem to work. The code you posted gives this warning: "The static method getConnectionManager() from the type Parser should be accessed in a static way" when I change parser.getConnectionManager() to ParsergetConnectionManager() , I still don't get the website which should be displayed when you are logged in.
Thx for the previous reply
Yes I know, but I still get the page without being logged in. So there needs to be something else that's troubling...
Often these sites do redirection, so you might want to turn redirect following on.
Okay, so I added this piece of code, but still not working:
cm.setRedirectionProcessingEnabled(true);
Try exposing the HTTP request and response with the ConnectionMonitor interface.
See if you can see something going wrong.
This is what I got.
"3-sep-2009 14:51:13 org.apache.http.client.protocol.ResponseProcessCookies processCookies
WARNING: Invalid cookie header: "Set-Cookie: fusion_visited=TRUE; expires=Fri, 03 Sep 2010 12:51:12 GMT; path=/". Unable to parse expires attribute: Fri, 03 Sep 2010 12:51:12 GMT"
I tried to add the cookie, but errors popping up everywhere.
"Cookie cookie = new Cookie ("USER", "FreddyBaby");
manager.setCookie (cookie, "www.freshmeat.net"); "
from http://htmlparser.sourceforge.net/javadoc/org/htmlparser/http/package-summary.html
but the class cookie can't be found.... and so one. This is quite frustrating. Maybe even more for you
Hmmm, I wonder why there is a reference to org.apache.http.client.protocol.ResponseProcessCookies. Surely the server isn't using a client side Java library.
The Cookie class is in with the other classes (org.htmlparser.http) so I can't see why it wouldn't be found. But adding one is only a kludge. You should see where the cookie is being set and why it isn't accepted by the server.
I think this conversation will never end unless I mention the specific site: http://www.hellboundhackers.org/challenges/basic.php ( my goal is to complete a challenge, not to abuse this in any way). This site has these cookies: PHPSESSID, fusion_lastvisit, fusion_user, fusion_visited.
Is it normal I experience this much trouble? Most people say it's easy to do it on that site.
I see at least these cookies:
__utmb
__utmc
__utma
__utmz
PHPSESSID
_csuid
_csroot
fusion_visited
And there are more (from another site?) because the response is not the same when I erase these cookies and again hit the site. In other words, it no longer gives me the "you must log in" page now.
Clean your browser cookie cache and hit the site with acookie policy "ask":
PHPSESSID
fusion_visited
VISITOR_INFO1_LIVE - third party (youtube.com)
VISITOR_INFO1_LIVE - third party (youtube.com)
__utma
__utmb
__utmc
__utmz
__utma
__utmb
__utmb (again)
_csoot
_csuid
others:
uid - third party (ad.yieldmanager.com)
AK1 - third party (content.yieldmanager.com)
OAID - third party (acc.depascor.nl)
OAGEO - third party (acc.depascor.nl)
PHPSESSID - third party (beacons.hottraffic.nl)
hotbeacon - third party (beacons.hottraffic.nl)
There's more going on here than your simple example.
If you turn cookie processing on in HtmlParser it should collect all these and you can see them in a debugger.
And all this is without logging in.
Can you give me a link to a good manual? Because those I find confuse me even more.
I can only point to the relevant RFCs. I haven't read a manual on the internet... ever.
I managed to connect to the server using URLConnection and sending the cookie fusion_user. But I can't use the authentication information of my account to login. Very strange...I tried the setRequestProperty for the authentication.