From: Alistair G. <akg...@gm...> - 2015-01-13 09:13:42
|
On Mon, Jan 12, 2015 at 9:31 AM, <hon...@gm...> wrote: > On Sun, 11 Jan 2015 20:39:28 +0100, Alistair Grant wrote: > >> > While I agree that some supposedly permanent errors sometimes turn out to be transient (e.g. 404, 408, 500), most errors (4xxx, 5xxx) *are* permanent and should be considered fatal. >> > >> > I don't think you should be doing a blanket retry for every possible 4xxx/5xxx code. >> > >> > What errors are you actually getting? (c.f. "$r->status_line" ) >> >> The error I'm getting is: >> >> 500 Can't connect to www.port.cz:80 (Connection timed out) >> >> I'm happy to limit the retry code to the list of status values you >> provided above. I'll also add a backoff timer. >> >> Thanks, >> Alistair > > > Yes, "connection timeout" is always a tricky one; you don't know if it's a genuine problem at the website end or for how long it will last (if it's transient). > > In most cases though, a 404 or a 500 *is* permanent and fatal; in these cases there is simply no point in trying again. > > I don't think we should be allowing retries for *all* grabbers to cater for the erroneous errors experienced by one or two. > > In version 0.005065 I exposed the LWP response object (as $Response) so the grabber could check the reply code and other headers if that grabber needed to. This allows the calling grabber to set > > $Get_nice::FailOnError = 0; # prevent failure on GET error > > and then check $Get_nice::Response->code and decide what to do with the error. So in effect the retry loop becomes part of tv_grab_huro rather than in get_nice. This way each grabber can make its own decisions as to what to do based on the specific issues with its particular website source. Hi Geoff, The way it's implemented allows each grabber to make its own decision. The grabber requests retries, and then decides what to do if there is still a problem after the maximum number of retries have been exceeded. Just to reiterate, the default behaviour is unchanged, i.e. we're not forcing all grabbers to retry, and if retries are enabled, the behaviour after the final retry is also unchanged, i.e. it is defined by $FailOnError. If the grabbers are required to implement the retry functionality it will need to be done for each of the entry points that the grabber uses, i.e. one or more of get_nice, get_nice_tree, get_nice_xml and get_nice_json, or a dispatch table will have to be maintained, instead of just implementing it once in get_nice_aux. If you're really determined for this to be in the grabber I can move it, but I still think Get_nice.pm is the best place. Thanks again for your feedback, Alistair |