From: Geoff H. <ghu...@ws...> - 2001-12-29 15:13:24
|
On Fri, 28 Dec 2001, Daniel Escobar wrote: > I'm building a db on two of my domains, which each contains app. 2000 > files each. The size is obviously dependent not just on the number of files (4000 total), but the size of each file. If each file is several megabytes, then it's certainly possible to take up a lot of space. > According to what the FAQ's say, it should be approximatelly, 48 MB, > but not over a gig! The FAQ and the requirements page are using rough estimates of typical page sizes. Here are some things that could be going wrong: 1) Your indexing is going over more than just those two servers. (check your limit_urls_to directive) 2) You started indexing with databases that contained other URLs rather than indexing "from scratch" (If htdig is updating db, it will start by checking all the URLs currently in the db and then ADD the new URLs to this.) 3) You have a variety of "duplicate" URLs: <http://www.htdig.org/FAQ.html#q4.24> 4) You have some sort of URL "loop" (this is actually one cause of (#3) that makes it appear that there are many, many more documents than there are. But let's take a look at what you sent: 703M ./db.docdb 1.2M ./db.docdb.work 84k ./db.docs.index 1.1G ./db.wordlist 2.0M ./db.wordlist.work 8.8M ./db.words.db I don't know how you indexed this, but your .work files and your regular files are quite different. This implies that you have two different indexing runs here (normally after indexing, the script would move the .work files to replace the others). So which files correspond to the "huge" ones? And how did the other set come about? -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |