From: Peter R. <dr...@us...> - 2007-02-28 13:27:39
|
So the motivation for the recent accessor additions I've been making is I've been playing with a 'Planet Pinky' application. I've checked in an implementation to /test/examples/planet/planet.idoc in the tests - it runs as the 3rd test in the example tests. I wanted to have a declaratively configurable "planet engine" to aggregate an arbitrary mix of RSS and Atom feeds into a single uniform feed and visual site (cf planetapache.org etc etc) . The implementation uses a declarative config that looks like this: <feeds> <!--The info format is as documented for SetFeedInfo--> <info> <title>Planet BBC Sports Feeds (Football, Cricket, Golf, Tennis)</title> <date>now</date> </info> <!--The feeds to be aggregated - name is not currently used--> <feed> <name>Football</name> <url>http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/football/rss.xml</url> </feed> <feed> <name>Cricket</name> <url>http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/cricket/rss.xml</url> </feed> <feed> <name>Golf</name> <url>http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/golf/rss.xml</url> </feed> <feed> <name>Tennis</name> <url>http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/tennis/rss.xml</url> </feed> </feeds> The feed engine processes the config into a dynamically generated DPML script to perform the union. It then sorts, limits to 20 entries, and sets feed metadata according to the info section of the config. All in just 5 instructions! The nice thing is that the HTTP TTL for each feed provides the cache lifetime for the aggregate feed - this automagically provides an optimal global feed polling time since as each dependent feed expires the aggregate will automatically get rebuilt and refresh the relevant expired feed resource - upshot is individual feed polling is global minimum. The only thing missing is feed failure handling - eg a feed goes down or has a 404 URL etc. My preference to solve this is to review the pinky accessors and have them default to tolerant behaviour - that is we default to produce a resource that is processible whenever this makes sense rather than throw exceptions and break the whole pipeline. This is one of those 'philosophical' points of NetKernel - we tend to write accessors to default to be tolerant and then augment with strict variants (cf xrl, xrl-tolerant). This is in line with the Construct, Compose, Constrain development model. So my suggestion is we have, for example... active:feedUnion (default tolerant, if any feed argument is missing carry-on doing the best it can). active:feedUnion-strict (throws exception if a feed argument cannot be sourced). Other accessors may not need this differentiation and may naturally fit with a strict or tolerant design. For example I have implemented the Sort accessor to be tolerant of variants between RSS and Atom feeds - if the comparison cannot be made then it still sorts based on fields that are there and effectively drops the missing sort criteria. Equally if you request to sort on author and the feed doesn't have author it just sorts on the other valid criteria. P. PS I've not changed the service URIs to active:feedXXX yet - still waiting to hear if this meets with general approval? |