I've only began to use web harvest and it's really cool, but I have a big
issue with proxy. I have written a script that harvest data from a web site.
If I run this script and the site is running on my local web server,
everything is fine. But when I try to access the same site that is outside our
company(behind firewall), it doesn't work. i have setup the proxy in the GUI
and all the details(username, pasword and port) but still nothing. It does not
work from my java program either(yes, I did setup all the proxy details there
as well). I know that the proxy address and all the details are correct,
because i use the same details in another scraping software and it works fine.
When I diplay everyhing between the html body tags, here is what I get
returned:
The page
cannot be displayedThere is a problem with the
page you are trying to reach and it cannot be displayed.Please try the
following:
Click the
Refreshbutton,
or try again later.
Open the
<!--
if (!((window.navigator.userAgent.indexOf("MSIE") > 0) &&
(window.navigator.appVersion.charAt(0) == "2")))
{
Homepage();
}
//-->home page, and then look for links to the information you want.If you
typed the page address in the Address bar, make sure that it
is spelled correctly.
Verify that the Internet access policy on your network allows you
to view this this page.
If you believe you should be able to view this directory or page,
please contact the Web site administrator by using the e-mail address or
phone number listed on the
Homepage();home page.
HTTP 407 Proxy Authentication Required - The ISA Server requires authorization
to fulfill the request. Access to the Web Proxy service is denied. (12209)
Internet Security and Acceleration Server
Technical Information (for support personnel)
Background:
The gateway could not retrieve the requested page.
ISA Server: blahblah.emea.ourcompanyname.net
Via:
Time: 25/11/2011 14:34:53 GMT
Any thoughts on what could be wrong?
Thanks for any input.
Dave
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
unfortunately I cannot test it as I don't have any http proxy rolled out in
any of the networks I have access to.
From the source code perspective everything looks good and if you correctly
provided all the proxy details in WH IDE it should work. I'll try to test
proxy support somehow.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-11-27
I used the proxy parameters and it works fine (without authentification), but
I suppose that it's your ISA server wich is requesting an authentication in a
mode different than (I suppose) the text mode you are providing.
I neither have an ISA Serv. to confirm
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Mmm, didn't manage to get it work, so what I basically did, I cheated. In my
Java program, I download the web site to my local hard drive first and then
web-scrape this file. It's probably not the best solution, but hey, it works
:)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I've only began to use web harvest and it's really cool, but I have a big
issue with proxy. I have written a script that harvest data from a web site.
If I run this script and the site is running on my local web server,
everything is fine. But when I try to access the same site that is outside our
company(behind firewall), it doesn't work. i have setup the proxy in the GUI
and all the details(username, pasword and port) but still nothing. It does not
work from my java program either(yes, I did setup all the proxy details there
as well). I know that the proxy address and all the details are correct,
because i use the same details in another scraping software and it works fine.
When I diplay everyhing between the html body tags, here is what I get
returned:
cannot be displayedThere is a problem with the
page you are trying to reach and it cannot be displayed.Please try the
following:
Click the
Refreshbutton,
or try again later.
Open the
<!--
if (!((window.navigator.userAgent.indexOf("MSIE") > 0) &&
(window.navigator.appVersion.charAt(0) == "2")))
{
Homepage();
}
//-->home page, and then look for links to the information you want.If you
typed the page address in the Address bar, make sure that it
is spelled correctly.
Verify that the Internet access policy on your network allows you
to view this this page.
If you believe you should be able to view this directory or page,
please contact the Web site administrator by using the e-mail address or
phone number listed on the
Homepage();home page.
HTTP 407 Proxy Authentication Required - The ISA Server requires authorization
to fulfill the request. Access to the Web Proxy service is denied. (12209)
Internet Security and Acceleration Server
Technical Information (for support personnel)
Background:
The gateway could not retrieve the requested page.
ISA Server: blahblah.emea.ourcompanyname.net
Via:
Time: 25/11/2011 14:34:53 GMT
Any thoughts on what could be wrong?
Thanks for any input.
Dave
unfortunately I cannot test it as I don't have any http proxy rolled out in
any of the networks I have access to.
From the source code perspective everything looks good and if you correctly
provided all the proxy details in WH IDE it should work. I'll try to test
proxy support somehow.
I used the proxy parameters and it works fine (without authentification), but
I suppose that it's your ISA server wich is requesting an authentication in a
mode different than (I suppose) the text mode you are providing.
I neither have an ISA Serv. to confirm
Same issue with our ISA server....
Mmm, didn't manage to get it work, so what I basically did, I cheated. In my
Java program, I download the web site to my local hard drive first and then
web-scrape this file. It's probably not the best solution, but hey, it works
:)