Menu

Latest spider improvements

The latest WebLech release, 0.0.3, contains some incremental improvements to the spider, and bugfixes. A major new feature is checkpointing, which means the spider saves its state every so often, meaning it can be killed and then resumed later. This is useful if you're spidering a big site and don't want to re-check and re-queue all of your URLs. Another new feature is classification of URLs as "interesting" or "boring". Interesting URLs are downloaded sooner than boring ones. Other fixes include better handling of URLs with fragments in them.

Posted by Brian Pitcher 2002-06-12

Log in to post a comment.