From: Christiaan F. <chr...@ad...> - 2007-09-25 08:22:16
|
Antoni Myłka wrote: > As far as websites are concerned, nothing prevents you from firing > multiple web crawlers simultaneously. This only works when you crawl multiple web sites as well, i.e. you have multiple web data sources that don't overlap in terms of the collections of URLs they represent. The WebCrawler by design assumes it is the only one crawling a given website, e.g. there is no such thing yet as a shared queue of URLs to crawl. > of filesystem crawling optimization for filesystems spread between > multiple spindles, is it possible at all in Java? Not in Java but you could build an application where an admin can enter knowledge of how information (data sources) is distributed over physical disks. With AutoFocus Server we actually have the inverse use case for this: we have a scheduler for scheduling and launching crawls but we want to prevent *multiple* crawlers from hitting the *same* disk at the same time, for performance reasons. Regards, Chris -- |