From: <mni...@mo...> - 2004-03-31 13:24:53
|
My additions below. I'm thinking for testing we should schedule a time frame and try to get everyone on the irc channel, to discuss what we see. I guess from 9:00pm EST to 12:00 pm EST is good for me. Here is the updated to-do list (as promised). Please feel free to pick something you like off the list, send an email to this list (-devel), and let us know what you're working on. Harvester: (indexer.pl) my additions although I've been a little out of the loop lately. ----------------------- o indexer doesn't completely store data in the local db files. for instance, the urls are stored, but not the text linking to those urls (the linked text should be stored with each url) (open) o there are a lot of index types missing (header text, small text, strong text, etc, etc) (open) o fix pick_lanquage method (Eric) o test and select an html parser (HTML:Parser,XML::Parser, TokeParser, Pull Parser) based on efficency (Ilya). o methods for determining font clashes (open) o renaming of all classes, methods to reflect current naming convention. (mojo) Controller: (master.pl) ----------------------- o Controller needs to check for "nastigrams" - charactors and such that could cause the Controller to execute commands on behalf of the user it is running as. (open)(mojo) o Methods for toggling states, re-indexing, etc (open). o Patch needed to make Controller only allow checkout of a max number of urls per Harvester, so we need to check how many they currently have checked out, and get the difference. (open) Queue Processor: (queue.pl) ------------------------- o add a queue processing agent that goes through the db files sent by the harvester to the controller and parse the data out and put it in the index tree. (open) Queue Maintainer: (maintainer.pl) ---------------------------- o Agent that runs independently of other programs, goes through the state db's and finds urls that need reindexing, and re-injects them into the queue by changed it's state (and moving it to the corresponding state db). (open) General ----------------- o TESTERS! We need your bandwidth! This is an easy way to get involved! (EVERYONE) o Should we try a certain time and have every one join # sprawler. o Design good user interface for web front end (open) o general error checking and code robustness. (open) |