I want to use HTMLUnit to fetch
information from websites I use (that I login
to) and create a personal page of information
just the way I want it and save it to disk for
offline viewing. I am hitting
some of these sites and am getting errors
back saying the page is not available. I believe
they are detecting whether I am connecting to the
site with a recognized browser or not.
I changed all the HTTP request headers using HTMLUnit
to match that of my browser but was still unable to get
through. What can I do to hit the web site login
with my information and get the data I need?
Does the mozilla browser for example connect in
such a way that these sites can tell? Am I going
to have to dig into the mozilla code to do this
for me or is there a way to do it with HTMLUnit?
The headers I sent matched those of Mozilla exactly
except for 2 things:
1. order - could not get HTMLUnit to send in the
order the header was added.
2. HTMLUnit adds "Content-Length" even when I requested
that it be removed.
Besides these 2 items - the headers I sent were identical to
mozilla headers and I am able to hit the site using mozilla 1.0.
How do I specify the exact order and get the "Content-Length"
removed? something tells me this may not make a difference but
I should at least be able to do these two and try.
If a browser gets through - there has to be a way to get through
with htmlunit or some other program.
thanks
netbeans
|