Methabot Web Crawler

5.0 Stars (2)
19 Downloads (This Week)
Last Update:
Download methanol-1.7.0.tar.gz
Browse All Files
Windows BSD Linux

Description

Methanol is a scriptable multi-purpose web crawling system with an extensible configuration system and speed-optimized architectural design. Methabot is the web crawler of Methanol.

Methabot Web Crawler Web Site

Update Notifications





User Ratings

★★★★★
★★★★
★★★
★★
2
0
0
0
0
ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
Write a Review

User Reviews

  • dreamafox
    1 of 5 2 of 5 3 of 5 4 of 5 5 of 5

    Methabot is great! Thanks.

    Posted 06/06/2013
  • mgshightech
    1 of 5 2 of 5 3 of 5 4 of 5 5 of 5

    I looked hard for a web crawler that was as flexible as I wanted. Methabot delivered. It is not full featured so far, but allows the addition of features in a very intelligent and flexible fashion that far outstripped other crawlers and made me very happy. Configure methabots behavior by writing configuration files and javascript parsers. Javascript parsers can be written in a snap. Mix and match the parsers into a chain where each parser can operate on or even change the text as it goes down the line. Write parsers in c language or in javascript. Javascript parsers can be written very quickly. The behavior of the crawler is way more configurable than other parsers. You can fire off different scripts for different file types and switch crawling and parsing behavior in a mechanistic fashion on the fly, use your own very rapidly written javascript E4x parser to intelligently decide which links you want to follow, Extract exactly the information you want in a way that's easy to understand. Methabot rocks. I can't tell you how much better of a crawler this is than the other ones I have looked at. I could not use them to do my project without drastically altering them. Methabot changed all that. The learning curve is not high to make this thing do what you want (assuming you can deal with sql scripts already). Version 1.6 supports mysql with javascript parsers, but 1.7 does not (yet). I wrote a stopwords parser that subtracts common words from the text.. message me if you want it. I'm mgshightech. (I may post it here on sf). version 1.7 supports distributed crawling farms frameworking. redirect the output of Javascript parsers into files to write csv files, or sql scripts for indexing. I may write an indexing parser for this in c language down the road. Enjoy both the speed and power of c language and the quick development times of javascript as this system gives you both. My complements to the original software engineers on a fine architecture. This thing will go far. mgshightech dated 9-21-11

    Posted 09/21/2011
Read more reviews

Additional Project Details

Intended Audience

Advanced End Users, End Users/Desktop, Information Technology

User Interface

Command-line, Non-interactive (Daemon)

Programming Language

C, JavaScript

Registered

2007-04-09
Screenshots can attract more users to your project.
Features can attract more users to your project.

Icons must be PNG, GIF, or JPEG and less than 1 MiB in size. They will be displayed as 48x48 images.