From: Greg P. <Gre...@us...> - 2009-08-20 06:09:35
|
FYI I've added r1315 which removed a single print line I'd left in bookcover.php by mistake when I was looking into a problem with bookcovers from Google. I noticed Yesterday the google bookcovers are returning 302 errors and a captcha, then linking to here: http://www.google.com/support/websearch/bin/answer.py?answer=86640 "The 'We're Sorry' message appears when Google detects that a computer on your network is sending automated traffic to Google. Automated queries are against our Terms of Service<http://www.google.com/accounts/TOS>." Is anyone else seeing issues? I'm starting to see bans/captchas on general searching of google... so I hope I haven't got us black listed :) Starting read through some old info to edumacate myself: http://www.librarything.com/thingology/2008/06/covers-from-google-too-good-to-be-true.php http://code.google.com/apis/books/terms.html (ewww) Greg Pendlebury Electronic Services Officer (Systems Team) Division of Academic Information Services University of Southern Queensland Phone: +61 7 4631 1501 Fax: +61 7 4631 1841 ________________________________ From: Greg Pendlebury [mailto:Gre...@us...] Sent: Wednesday, 19 August 2009 4:47 PM To: vuf...@li... Subject: [VuFind-Tech] Proxy server for HTTP requests I've finally got around to doing the work I've been wanting on the proxy server settings. It's r1290 and r1308 in the USQ branch at the moment. If anyone would like to vet the content before I move to trunk please let me know. Web requests using fopen() and file_get_contents() were easy, there's simple proxy settings in index.php and bookcover.php that cover them picking up proxy settings from the config file (r1290). Use of HTTP_Request was more annoying because you have to put in the proxy settings every time it's used in between the object creation and the sendRequest() call. In r1290 I was finding places in the code where HTTP_Request is used and adding them manually, but stopped after finding several thinking there must be a better way. So r1308 has a new 'sys/Proxy_Request.php' object that simply extends HTTP_Request and I've pointed all the old calls at that. You can expressly disable/enable the proxy after creating the object, although I haven't found a need to. When you do the final sendRequest() the request object will: 1) As a priority obey any express instructions it's been given regarding proxy use. 2) In the absence of express instruction, it will default to no proxy for 'localhost' traffic, and default to using the proxy for everything else. This isn't hard to extend (eg. I was thinking of including in the config file : dont_proxy = localhost, usq.edu.au) 3) Check the config file for proxy settings and use if it's been instructed to (and they exist of course). Areas I found using the HTTP_Request (and changed): Drivers : Innovative, NCIP, Voyager Bookcovers : Google (directly), Amazon (via sub-class) Wikipedia : Author screen (classic skin) OAI: Harvester Record Screen : Ajax SFX, Excerpt, Export, Reviews Sys : SRU (Worldcat?), Solr/Zebra indexes Things like the solr index call probably will never need to be proxied, that's why the turn on/off calls are there and obviously the default to not proxying for localhost traffic. Many of those areas above I don't even know how to test, but I also can't see them breaking for such a simple change. Greg Pendlebury Electronic Services Officer (Systems Team) Division of Academic Information Services University of Southern Queensland Phone: +61 7 4631 1501 Fax: +61 7 4631 1841 ________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government (CRICOS Institution Code No's. QLD 00244B / NSW 02225M) This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government (CRICOS Institution Code No's. QLD 00244B / NSW 02225M) |