From: Alistair G. <akg...@gm...> - 2015-01-13 13:18:24
|
On Tue, Jan 13, 2015 at 2:00 PM, <hon...@gm...> wrote: > On Tue, 13 Jan 2015 10:13:35 +0100, Alistair Grant wrote: > >> Hi Geoff, >> >> The way it's implemented allows each grabber to make its own decision. >> The grabber requests retries, and then decides what to do if there is >> still a problem after the maximum number of retries have been >> exceeded. Just to reiterate, the default behaviour is unchanged, i.e. >> we're not forcing all grabbers to retry, and if retries are enabled, >> the behaviour after the final retry is also unchanged, i.e. it is >> defined by $FailOnError. >> >> If the grabbers are required to implement the retry functionality it >> will need to be done for each of the entry points that the grabber >> uses, i.e. one or more of get_nice, get_nice_tree, get_nice_xml and >> get_nice_json, or a dispatch table will have to be maintained, instead >> of just implementing it once in get_nice_aux. >> >> If you're really determined for this to be in the grabber I can move >> it, but I still think Get_nice.pm is the best place. >> >> Thanks again for your feedback, >> Alistair > > > Hi Alistair, > > Shouldn't be too onerous; it's only a simple subroutine in the grabber which acts as a wrapper around get_nice. > > I agree get_nice_aux is the cleanest place to put it but my concern arises around the use cases for it. A 404 for one grabber may be worth retrying (the source is flakey) but for another may be pointless (the source can be trusted). So you then start to need to parameterise which codes are 'retry-able' (i.e. on a per-grabber basis). This has already been made configurable - the grabber can set the list of codes to retry in $XMLTV::Get_nice::RetryCodes. > Plus it wouldn't fix all scenarios, so some situations will *still* need grabber-specific code. > > E.g. In one grabber I worked on, the web pages frequently returned code 200 but the content was empty! In another it returned a 304 (Not Modified) even though you'd specified no caching. Another required checking certain response headers before you could be sure the data were valid. Thanks for the explanation, this helps me understand where you're coming from. Some of these, e.g. checking headers, are grabber specific, I agree. However I think retries are a much more common requirement, e.g. it is something provided in wget, and it doesn't preclude the grabbers performing additional checks. > But I'm not judge and jury; let's see what others think. :-) Thanks again, Alistair |