Menu

WebLech URL Spider / News: Recent posts

Latest spider improvements

The latest WebLech release, 0.0.3, contains some incremental improvements to the spider, and bugfixes. A major new feature is checkpointing, which means the spider saves its state every so often, meaning it can be killed and then resumed later. This is useful if you're spidering a big site and don't want to re-check and re-queue all of your URLs. Another new feature is classification of URLs as "interesting" or "boring". Interesting URLs are downloaded sooner than boring ones. Other fixes include better handling of URLs with fragments in them.

Posted by Brian Pitcher 2002-06-12

Multi-threading code completed

It's been a while since the last release, but I've now got round to doing the multi threaded download code. You can configure the number of download threads in the config file, and retrieve multiple URLs simultaneously.

There's also a bugfix for mailto hrefs, thanks to gxd5 for the initial fix.

Posted by Brian Pitcher 2001-11-21

First download available!

I've put together the first cut of the Spider code, and packaged it into a download. This version of the Spider is quite functional, and will Spider a website pretty well. It's only single-threaded and you have to edit a properties file to configure it, but it works!

Have fun -- all comments (especially things to improve on) welcomed.

Posted by Brian Pitcher 2001-10-21