Re: [Aperture-devel] CrawlerHandler thread-safety - antoni, please give feedback

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Antoni Myłka wrote:
> As far as websites are concerned, nothing prevents you from firing 
> multiple web crawlers simultaneously.

This only works when you crawl multiple web sites as well, i.e. you have 
multiple web data sources that don't overlap in terms of the collections 
of URLs they represent. The WebCrawler by design assumes it is the only 
one crawling a given website, e.g. there is no such thing yet as a 
shared queue of URLs to crawl.

> of filesystem crawling optimization for filesystems spread between 
> multiple spindles, is it possible at all in Java?

Not in Java but you could build an application where an admin can enter 
knowledge of how information (data sources) is distributed over physical 
disks. With AutoFocus Server we actually have the inverse use case for 
this: we have a scheduler for scheduling and launching crawls but we 
want to prevent *multiple* crawlers from hitting the *same* disk at the 
same time, for performance reasons.

Regards,

Chris
--