|
From: Geoff H. <ghu...@ws...> - 2001-12-14 20:43:52
|
I usually pipe the results of htdig (actually rundig.sh) to a log file. So
I'm not concerned when something "scrolls off the screen." Additionally,
the script outputs the date stamps, so it's pretty easy to generate speed
ratings.
Personally, I'd prefer to stay away from too many statistics about the
local file access--keep in mind that to collect these, we must add code to
collect the statistics, calculate the read rate, check the time,
etc. While this may or may not help in profiling, it's certainly going to
kill the performance.
(Personally, I'm not so keen on the HTTP statistics either, but the
latency is a bit longer.)
You ask for the number from HTTP access and the number from local
access. This is probably easy enough, but you're clearly not getting any
from HTTP at the moment, right?
> My most recent dig (still ongoing) has been indexing for 20 hours and
> has reached 110,000 documents over local file access. 3.1.3 indexed
> 330,000 documents in around 6 hours. I have three weeks before my
> server needs to be up and I am very willing to help locate any possible
> slowdowns (NFS tuning, etc). I'm not a Linux guru but I have plenty of
> time to spend on this.
Please understand this is an extremely unfair comparison. For one, there
are bugs in 3.1.3 and features that have been added that slow down
indexing. Even 3.1.6 is reported to be slower than 3.1.3 for these
reasons.
You also mention using NFS but don't elaborate. It's probably fine to
index over "local" NFS disks, but I don't know that it's ncessarily better
than using the HTTP/1.1 code depending on the speed and latency of your
network.
-Geoff
---------- Forwarded message ----------
Date: Fri, 14 Dec 2001 07:53:09 -0500
From: Greg Lepore <gr...@md...>
To: Geoff Hutchison <ghu...@ws...>
Subject: Re: [htdig-dev] profiling
Geoff,
Well, I guess I wasn't too clear.
1. At the end of a dig with the -s flag htdig displays http
statistics:
( Persistent connections : Yes
HEAD call before GET : No
Connections opened : 0
Connections closed : 0
Changes of server : 0
HTTP Requests : 0
HTTP KBytes requested : 0
HTTP Average request time : 0 secs
HTTP Average speed : 0 KBytes/secs)
When you are using local file access, it displays the above, which is
accurate but not useful. It would be useful to see the number of documents
indexed at this point, which is after the dig is finished and after htdig
displays a list of problem urls. I know -v displays a running count and a
final count at the end, but this is before it prints the problem files and
the statistics, which usually push the document count off the screen. If
the -s flag prints statistics about the dig, surely one of the most
important is how many documents were indexed. Is it possible to add a
section on Local File Access Statistics that gives relevant information
about the dig? Total time, average transfer speed (which should help
diagnose disc read slowdowns), total documents, how many .pdfs, number of
links not found, etc. If htdig is using a combination of HTTP and local
file access, the number of each would also be nice to have at this
point. For instance, when a site is a mixture of dynamic and static pages,
and the static pages are being indexed over local file access, and the
dynamic ones via http.
My most recent dig (still ongoing) has been indexing for 20 hours and has
reached 110,000 documents over local file access. 3.1.3 indexed 330,000
documents in around 6 hours. I have three weeks before my server needs to
be up and I am very willing to help locate any possible slowdowns (NFS
tuning, etc). I'm not a Linux guru but I have plenty of time to spend on this.
At 11:43 PM 12/13/01, you wrote:
>At 8:09 AM -0500 12/13/01, Greg Lepore wrote:
>> 1. When running htdig with the -s flag, it would be nice to see
>> the number of documents indexed (I know it appears at the end of the
>> dig, but it would be nice to have it here as well).
>
>I'm not quite sure what you mean. Do you want the number of documents
>indexed so far? If so, it's probably easier to get this from htdig -v
>rather than anything from -s, which is only called at the end.
>
>> 2. The statistics give no results when using local file access,
>> at least the speed could be displayed.
>
>I'm not sure I understand what you mean by "no results." Do you mean the
>statistics that come up about HTTP access?
>
>> 3. When using local file access, there should be a message when
>> htdig has to use http instead, and then the summaries should be displayed.
>
>Again, I'm not quite sure I follow. This certainly comes up when you're
>running with htdig -v and this seems fine to me. Do you want something
>additional?
>
>-Geoff
Gregory Lepore
Webmaster, State of Maryland
Supervisor, Archives of Maryland Online
410-260-6425
|