Methabot is a speed-optimized, highly configurable web, ftp and local file system crawler.
After a long time of hardcore programming, Methabot/1.4.0 is finally ready for release. You will need libcurl and spidermonkey installed on your system to be able to compile Methabot.
New features:
* Completely new architectural design
* Filetype parser scripting through Javascript/E4X
* Multithreading is now a primary concept
* HTTP HEAD requests are now done asynchronously in a separate thread using curl and libev
* Support for "peeking" at external URLs
* The Methabot Project has been split up into several subprojects, primarily there's the command line tool, which uses the web crawling library libmetha as its backend.
* Initial work on the distributed web crawling system Methanol.
So what to do now? Pick up a tutorial on Javascript with E4X and get started coding your own functions for extracting data from the web.
Check out the brand new Wiki at http://bithack.se/projects/methabot/ for more information!
Copyright © 2009 SourceForge, Inc. All rights reserved. Terms of Use