| |
From: Nadareishvili, Irakli <inadareishvili@dg...> - 2005-09-06 19:10
Attachments:
Message as HTML
|
Peng,
it's neither sequential, not purely random. JCrawler uses=20
a FIFO (First In First Out) Pool.
When you indicate msn.com it puts msn.com in the FIFO. Next,
it puts yahoo.com in the same FIFO. Since msn.com was put first
it will process msn.com first and find some additional links there
to process. But it does not wait for msn.com to finish processing.
After the interval you indicated in yout config file, it will fetch
next URL from the FIFO and begin processing it, in a separate thread
(process) neverthless if the previous one was already processed.
As pages get processed, URLs found on them are put in the FIFO and
at each "interval" retrieved from there and processed to find even
more URLs.
The same URL will not be processed twice, to avoid closed-circuit,
unless crawler "runs out of" URLS, at which point it will "restart".
This ensures that you are not hitting the same URLs all the time and
cover the widest range of your system in the shortest time. However,
sometimes (a lot of times) you want torun crawler for a long time (e.g.
24 hours) and system does not necessarily have enough unique URLs to=20
be hitting them for such a long time. But you want to test the stability
of the system for long time. To allow you do this, crawler will restart
once it "runs out" of unique URLs and begin crawling from start.
The sequence of URLs is not random but it depands on how quickly
cralwer reaches a URL and puts it in the FIFO, so it is not necessarily
pre-determined, too.
I hope I answered your question,
Irakli
-----Original Message-----
From: jcrawler-main-admin@li... on behalf of Peng Lim
Sent: Tue 9/6/2005 2:44 PM
To: jcrawler-main@li...
Subject: [Jcrawler-main] How does it crawl
=20
Does it crawl sequentially or randomly? For example if I had two links =
from
top to bottom:
=20
Yahoo.com
msn.com
=20
Does it always pick yahoo.com first to crawl and then after it is finish =
with
all of the yahoo.com links, it will move on to msn.com?
=20
Thanks,
=20
Peng Lim
Software Quality Assurance Engineer
650.230.6639
=20
|
|