From: Gredler, D. (Matrix) <Dan...@ic...> - 2004-10-28 19:06:50
|
Hello, I am running the following Jython code which is part of an online example of very simple web scraping: import urllib import re # always empty print urllib.getproxies_environment() URL = "http://www.amazon.com/exec/obidos/tg/detail/-/0140390847/ \ qid=1041706275/sr=8-1/ref=sr_8_1/103-8463458-5564619?v=glance&s=books&n=5078 46" pattern = "Sales Rank: </b> *([0-9,]*)" doc = urllib.urlopen(URL).read() result = search(pattern, doc) print result.group(1) The problem is that we have a firewall and have to use a proxy to get to the web. I have tried specifying the java system properties http.proxyHost and http.proxyPort, as well as calling PythonInterpreter.initialize( ) with a property keyed on "http_proxy", which is apparently the convention for specifying proxies with urllib (see http://www.python.org/doc/current/lib/module-urllib.html). I am not sure what to try next. As noted in the code, printing out the list of proxies always gives me an empty list even when trying to specify the proxy using aforementioned methods. Any pointers would be greatly appreciated. Thanks in advance, Daniel Gredler Java Version: java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) Jython Version: Jython 2.1 on java (JIT:null) Operating System: Microsoft Windows 2000 [Version 5.00.2195] |
From: Gredler, D. (Matrix) <Dan...@ic...> - 2004-10-26 22:28:31
|
Hello, I am running the following Jython code which is part of an online example of very simple web scraping: import urllib import re # always empty print urllib.getproxies_environment() URL = "http://www.amazon.com/exec/obidos/tg/detail/-/0140390847/ \ qid=1041706275/sr=8-1/ref=sr_8_1/103-8463458-5564619?v=glance&s=books&n=5078 46" pattern = "Sales Rank: </b> *([0-9,]*)" doc = urllib.urlopen(URL).read() result = search(pattern, doc) print result.group(1) The problem is that we have a firewall and have to use a proxy to get to the web. I have tried specifying the java system properties http.proxyHost and http.proxyPort, as well as calling PythonInterpreter.initialize( ) with a property keyed on "http_proxy", which is apparently the convention for specifying proxies with urllib (see http://www.python.org/doc/current/lib/module-urllib.html). I am not sure what to try next. As noted in the code, printing out the list of proxies always gives me an empty list even when trying to specify the proxy using aforementioned methods. Any pointers would be greatly appreciated. Thanks in advance, Daniel Gredler Java Version: java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) Jython Version: Jython 2.1 on java (JIT:null) Operating System: Microsoft Windows 2000 [Version 5.00.2195] |
From: Diez B. R. <de...@we...> - 2004-10-29 13:44:17
|
> The problem is that we have a firewall and have to use a proxy to get to > the web. I have tried specifying the java system properties http.proxyHost > and http.proxyPort, as well as calling PythonInterpreter.initialize( ) with > a property keyed on "http_proxy", which is apparently the convention for > specifying proxies with urllib (see > http://www.python.org/doc/current/lib/module-urllib.html). I am not sure > what to try next. As noted in the code, printing out the list of proxies > always gives me an empty list even when trying to specify the proxy using > aforementioned methods. Any pointers would be greatly appreciated. Use an environment-variable http_proxy. Then invoke jython command line, import module os and print os.environment - if the http_proxy. value shows up there, things should work - at least the urllib.py in jython's lib suggests that. Diez |