From: Jonathan S. <sch...@co...> - 2007-04-13 17:11:39
|
Hello, Recently we upgraded our server OS to Fedora Core 4 (I know, not supported and needs further upgrading, but that is to come). That said, we've experienced a large increase in dig times when indexing our site. There are around 20k urls to index and prior to the upgrade would complete in about 14 hours. Post upgrade estimated build time is around 4 days :( After doing some timing exercises on processes it appears the lag is occurring during a read operation of the db.docs.index file - the size of the file is only 1.3MB so that shouldn't be an issue. I know Berkeley DB v6 is built into htdig 3.2.0b5 (our version) but am wondering if some other library with Fedora Core 4 is causing the issue. Please help - I need to get a new database built and am at a loss as to what steps to take next :( Any ideas? Cheers, Jonathan Schlackl The Communication Initiative |
From: Jim C. <li...@yg...> - 2007-04-13 20:33:16
|
On Apr 13, 2007, at 11:11 AM, Jonathan Schlackl wrote: > Recently we upgraded our server OS to Fedora Core 4 (I know, not > supported and needs further upgrading, but that is to come). That > said, > we've experienced a large increase in dig times when indexing our > site. > There are around 20k urls to index and prior to the upgrade would > complete in about 14 hours. Post upgrade estimated build time is > around > 4 days :( Did you also move from a 3.1.x version of ht://Dig to a 3.2.x version? Indexing with 3.2.x generally requires significantly more time. This is due to the extra work required to support that version's enhanced search functionality. > Please help - I need to get a new database built and am at a loss > as to > what steps to take next :( Any ideas? If it is not an ht://Dig version issue, you might take a look at memory use. If the new OS setup is using more memory, it is possible that htdig is hitting swap, which would slow it down considerably. Jim |
From: Jonathan S. <sch...@co...> - 2007-04-13 21:47:28
|
HI Jim, Thanks for the response - no, we have been using 3.2 for a while and yes, we noticed a significant performance hit with the upgrade a few years ago. That said, we need phrase support so have no choice there ;) As for memory, we have a dual Xeon 3.2 w/ 4GB RAM, Ultra 320 SCSI + RAID 5 - tons of horsepower really and under RH 7.3 (old OS) on this same machine we had no issues.... I did some analysis of the digging process using strace and found that the bulk of the time spent is done while performing "read(4,..." operation - when looking at the process using lsof I find that the db.docs.index file is tagged with the "4" which tells me much of the time is spent reading that file. That said, I've installed compat-libstdc++-296-2.96-132.fc4.i386 since it wasn't installed - so my thoughts are that I was using the binary from a previous build - I have since successfully recompiled htdig and from what I can see the read calls are no longer showing.....from what I can see it may have resolved the issue although I'm not sure how to calculate that without starting a fresh dig - so, I'm doing that now - should be able to roughly calculate the build time after a short time.... My sense is the C++ compat library was a major issue....it was likely built into the old binary I had but I'm not sure how that works to be honest....kinda surprised it ran at all without having that library installed... Cheers, Jonathan. Jim Cole wrote: > On Apr 13, 2007, at 11:11 AM, Jonathan Schlackl wrote: > >> Recently we upgraded our server OS to Fedora Core 4 (I know, not >> supported and needs further upgrading, but that is to come). That said, >> we've experienced a large increase in dig times when indexing our site. >> There are around 20k urls to index and prior to the upgrade would >> complete in about 14 hours. Post upgrade estimated build time is around >> 4 days :( > > Did you also move from a 3.1.x version of ht://Dig to a 3.2.x version? > Indexing with 3.2.x generally requires significantly more time. This > is due to the extra work required to support that version's enhanced > search functionality. > >> Please help - I need to get a new database built and am at a loss as to >> what steps to take next :( Any ideas? > > If it is not an ht://Dig version issue, you might take a look at > memory use. If the new OS setup is using more memory, it is possible > that htdig is hitting swap, which would slow it down considerably. > > Jim > > --This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > |