From: Geoff H. <ghu...@ws...> - 2002-07-02 19:12:55
|
On Tue, 2 Jul 2002, Rylan W. Hazelton wrote: > I let it run for about 8hrs and it only dug about 20% of them. I need > to find a way to make the indexing more palatable to the server and was > hoping someone can help me here. I'm curious why you're using 3.2. Indexing speed at the moment is certainly slower than 3.1--it's indexing and storing a significantly large amount of information. Plus, it's assembling the databases on-the-fly rather than requiring the separate htmerge step. > 1) Run a big dig (all 1M posts) then, run nightly digs of the posts in > the last 24-36 hours, then merge the dbs. You should also take a look at the -m flag to htdig. This will only index a set of URLs and do nothing else. (Valid for 3.1.6 and 3.2 betas.) <http://www.htdig.org/htdig.html> > 2) break the posts up into ~50-100k page block and index them all > separately, then merge the dbs. This depends on how much load your server and CGIs can handle. If you think the server can handle indexing two sets at once, this will be faster. If you'd have to do one set, then another, etc. then this will definitely be slower. > Also how can I search multiple dbs at once in 3.2? Are there any docs > for 3.2? The installation you have should have full documentation. From a source .tar.gz, it will be in htdoc/. If you installed from a binary package, it should include docs as well. Beyond that, see: <http://www.htdig.org/dev/htdig-3.2/> To search multiple DB at the same time, you'll need to set up "collections." You should specify multiple config names to htsearch, separated by "|" characters. You could also specify one "master" config with a collection_names attribute. <http://www.htdig.org/dev/htdig-3.2/attrs.html#collection_names> > If anyone knows where I can find the correct format for the headers it > would be much appreciated. These are standard Last-Modified: headers: http://www.w3.org/Protocols/HTTP/Object_Headers.html#last-modified But in order for the stored date to be useful for speeding indexing, the server/CGI would need to recognize the If-Modified-Since: headers http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |