Home
Name Modified Size InfoDownloads / Week
page-watcher 2014-10-30
README.txt 2014-10-30 2.8 kB
Totals: 2 Items   2.8 kB 0
Description

The page watcher utility is a simple perl script that will, when used in 
conjunction with a scheduling program such as cron, notify when a page 
that is being watch has been changed.

The script is unique in a couple respects. The first is that it uses a 
configuration file to identify the pages to be watched. This configuration 
file allows for default values to be established and to permit overrides of 
the defaults on a page by page basis. For example. If you want to get notified 
at work for most updates you can set the default to notify your work email 
address. But if there is a page that is of interest to someone else, you can 
override the email notification address for that particular page.

The program is also unique in that it can watch only the links on a page for 
change instead of the whole page. This is very useful for sites that contain 
both dynamic content and download links. If you are only interested in the 
fact that there is a new release of your favorite software and not that the 
author of the page as updated some text on the page, this feature is very handy.

Another problem with comparing web pages is that often time javascript and the
like will change and you probably do not care about that. The "htmlonly" watch 
type will ignore all of the non html only components of a web page during the 
comparison. 

Some pages use hit counters or will have hidden input tags that are populated
on the back end and have a different value on each evocation of the page. This
causes the script to register and report a change every time it runs. You can 
get past this by using the ignore-regex option, specifying a pattern that should
be ignored when the compare is done. 

Installation

There is no fancy install script here. Just untar the archive and move the 
page-watch.pl to wherever you keep your local binaries; This is usually either 
~/bin or /usr/local/bin. Next move the page-watch.cf file to ~/.page-watch.cf. 
It is now installed. Not very useful but it is installed.

Now you need to edit the ~/.page-watch.cf file. The file is annotated internally 
on format and configuration options. It is very straight forward. You are close 
to being done. The last thing to do is to test it manually by simply running the 
command. Once it works like you want schedule it in your favorite scheduling 
program like cron or any of its cousins.

Dependencies

The script makes use of two perl modules that are not installed by default in 
any distribution that I know of. 

They are:

   WWW::Mechanize;
   HTML::Scrubber;
   Mail::Sendmail;

You can install them using whatever your favorite way of installing perl modules is. 

I use:

# perl -MCPAN -e shell
cpan> install WWW::Mechanize
cpan> install HTML::Scrubber
cpan> install Mail::Sendmail 
Source: README.txt, updated 2014-10-30