|
From: Kord C. <ko...@gr...> - 2002-11-20 19:15:31
|
Otis,
Good questions. We expect you clients to hold us accountable
for what we do with the data, as it is either your data that
we are collecting, or your machines that we are using with
which to collect the data.
One thing that has been holding us up is the Windows client.
Now that we have it done (and hopefully stable today), we
should be able to retain more clients for crawling. This
also allows us to start marketing the client, without us
having to worry about newbies using it, and it crashing on
them. Crashing programs tend to turn people off strangely
enough. ;)
Right now we are crawling about 3M URLs a day, with about 30-40
clients running per day. This is an average of about 100,000
URLs per day, per client. We currently have about 30M URLs in
the database, so that puts our re-crawl rate at once every 10
days or so.
We think that a good goal for re-crawl is about once every 7
days. The plan is to scale the number of URLs in the database
to the number of crawlers currently running. As the number of
crawlers running goes up, so does the number of URLs that we
can re-crawl each week.
Expect an announcement from us next week concerning our plans
for making the returned data more accessible. I think you guys
are going to like what we are going to make available to you.
Later,
Kord
--
--------------------------------------------------------------
Kord Campbell Grub, Inc.
President 5500 North Western Avenue #101C
Oklahoma City, OK 73118
ko...@gr... Voice: (405) 848-7000
http://www.grub.org Fax: (405) 848-5477
--------------------------------------------------------------
Today's Topics:
1. Grub goals, ETA, etc. (otisg)
--__--__--
Message: 1
From: "otisg" <ot...@iV...>
To: <gru...@li...>
Cc:
Date: Mon, 18 Nov 2002 22:34:35 -0800
Subject: [Grub-general] Grub goals, ETA, etc.
This is a multi-part message in MIME format.
------=_NextPart_000_1004_01C28F52.AC16A290
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Hello,
I've been running the Grub client for a while, and I am curious when
some of
the things mentioned at http://www.grub.org/investors.php will start
happening?
Also, I am curious, what is the number of URLs that Grub has crawled so
far,
and I'm also wondering whether Grub is capable of re-fetching every page
it
knows at least once a month? I'm asking this because that's how often
Google, Alltheweb, etc. do it, so I assume Grub has to do better if it
wants
to appear attractive to the big search engines, no?
Thanks,
Otis
_______________________________________________________________
Sign up for FREE iVillage newsletters <http://s.ivillage.com/rd/16705> .
>From health and pregnancy to shopping and relationships, iVillage
has the scoop on what matters most to you.
------=_NextPart_000_1004_01C28F52.AC16A290
Content-Type: text/html
Content-Transfer-Encoding: 7bit
<HTML>
<BODY>
Hello,<br>
<br>
I've been running the Grub client for a while, and I am curious when some of<br>
the things mentioned at http://www.grub.org/investors.php will start happening?<br>
<br>
Also, I am curious, what is the number of URLs that Grub has crawled so far,<br>
and I'm also wondering whether Grub is capable of re-fetching every page it<br>
knows at least once a month? I'm asking this because that's how often<br>
Google, Alltheweb, etc. do it, so I assume Grub has to do better if it wants<br>
to appear attractive to the big search engines, no?<br>
<br>
Thanks,<br>
Otis<br>
<br>
</BODY></HTML>
<BR><font face="Arial, Helvetica, sans-serif" size="2" style="font-size:13.5px">_______________________________________________________________<BR>Sign up for <A HREF="http://s.ivillage.com/rd/16705">FREE iVillage newsletters</A>.<BR>From health and pregnancy to shopping and relationships, iVillage<BR>has the scoop on what matters most to you.
</font><br><br>
------=_NextPart_000_1004_01C28F52.AC16A290--
--__--__--
_______________________________________________
Grub-general mailing list
Gru...@li...
https://lists.sourceforge.net/lists/listinfo/grub-general
End of Grub-general Digest
|