From: Anton v. S. <an...@ap...> - 2003-06-15 05:08:01
|
I've checked an initial version into CVS of the rss channel monitor & notification code I've been promising. It's rather alpha-quality at the moment, but it works and should be easy to get going. In its current configuration, it will monitor a list of rss channels at a configurable refresh interval, as specified in the rss-config file. When new items are detected on a channel, an email will be sent for each item, to the email address specified in the notify-config file. I've been using the email address of my SMS cellphone. I originally thought I would support other delivery mechanisms than email, but email should be good enough for many purposes. For anyone who wants Lambda the Ultimate or Slashdot headlines to start popping up on their cellphone or other email-capable device, all you need to do is: * install the schemerss collection from CVS * edit the notify-config file with your email details, including smtp server * edit the list of channels you want to monitor, via Bruce's web interface. Currently, this cannot be done by directly editing rss-config, because the cached files will get out of sync and my code can't handle that yet. * run the following in PLT Scheme: (require (lib "rssmonitor.ss" "rss")) (monitor-rss-channels) This spawns some monitoring threads and should silently return to the REPL prompt. As long as you leave Scheme running, channel monitoring will continue. This but is a bit "manual" - there's no other interface to this yet. There's also currently zero feedback as to whether anything's happening - if it's working, you'll start getting emails once new items appear on the channels being monitored, but that's it. The currently non-existent exception handling (on my side) also needs to be improved. The above is the only documentation at present. I'll add proper docs to doc.txt when it's a bit more mature. * * * Onto the development notes: I added a new module, rssmonitor. Minor changes were made to rssparser and rsscomm to support the integration of rssmonitor. In particular, a synchronized wrapper for get-pure-port was added to rsscomm, and temporarily, two unused functions for reading a channel from a url were commented out in rssparser. In writing rssmonitor, some requirements came up which I've temporarily dealt with using some stub and wrapper code in the rssmonitor module. I'd like to integrate this better, but it'll require some changes to the other modules which I thought I should discuss first: 1. A slightly higher-level interface to perform channel dowloads and retrieve the channel list was desirable. To achieve this, I've added two wrapper functions in rssmonitor: * rss-channel-download, which downloads a channel specified by an rss-channel struct, and returns a new rss-channel struct * get-rss-channel-list, which returns the configured channel list as a list of rss-channel structs. It reads the channel data from the cache files, it does not download them. I think these functions should ultimately reside in rssparser and rsscomm, and could allow for some more refactoring. I'm following a model in which the rss-channel struct fully encapsulates channel handling, including channel lists, downloads and caching, so a client module like rssmonitor does not need to deal directly with the cache files, urls, etc. It might make sense to create an rsschannel module, which would depend on rssparser and rsscomm, to implement the full rss-channel behavior I've described. I haven't looked at this closely, though. 2. I'd like to add an optional per-channel refresh interval. If no interval is specified for a channel, the global value would be used. In rss-config, this might look something like this: (mychannels (("http://slashdot.org/slashdot.rss" 1800) ; /. mandates >= 30-min updates "http://rss.com.com/2547-1071-0.xml")) ...but I'm open to suggestions. I already have a minor code change to do this (appended below), but I haven't checked it in. 3. I'd like to add named channel lists. This would provide an easy way to support different notification behaviors at different times, for example you might have "day", "night", and "weekend" channel lists, with different channels and refresh intervals. Also, it would support different channel lists for the web view vs. the monitoring module. This is actually an easy change, but I didn't want to make it without agreement. I thought the 'mychannels' entry in rss-config could be replaced with something like this (a little XML-ish): (channel-lists (channel-list default (("http://slashdot.org/slashdot.rss" 1800) "http://lambda.weblogs.com/xml/rss.xml")) (channel-list news-junkie (("http://slashdot.org/slashdot.rss" 1800) "http://www.infoworld.com/rss/news.rdf" "http://rss.com.com/2547-1071-0.xml"))) If no channel list name is specified by a client program, the 'default' channel would be used. To start with, the web interface could simply use the default channel, minimizing changes there. Here's some barely-tested & un-integrated code to implement named channel lists and optional per-channel timeouts in the config file: (define (parse-channel-list name channel-list) (cons name (map (match-lambda [(url timeout) (make-channel url #f timeout)] [url (make-channel url #f *global-timeout*)]) channel-list))) (define (parse-config sexp) ((match-lambda [`((paths (datadir ,data) (channeldir ,channeldir)) (channel-lists (channel-list ,list-names (,channel-lists ...)) ...) (refresh ,refresh) (timeout ,timeout)) (make-rss-config data channeldir (map parse-channel-list list-names channel-lists) refresh timeout)]) sexp)) I'm totally open to suggestions on this. I'm happy to go ahead and integrate these changes, but I'm not in any hurry, either. I've got it working well enough for the moment. Bruce, please let me know what you think, whenever you get a chance. Anton |