|
From: Geoff H. <ghu...@ws...> - 2003-01-17 15:38:21
|
To quote Mark Twain... "Reports of my demise have been greatly exaggerated." Of course that's not to say that more development isn't welcome (as always). But you clearly haven't looked at any of the ht://Dig mailing lists. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ On Fri, 17 Jan 2003, Frank Piwarski wrote: > Hello, > > It looks to me that the ht/dig project is dead ... or fatally wounded. > > Is this the case. > > Thanks. > Frank > > > |
|
From: Lachlan A. <lh...@us...> - 2003-01-20 12:07:56
|
IMHO, this question highlights the need to get 3.2.0b[45] out ASAP. Am I correct in believing that the hold-up is basically database=20 errors? If so, could we set a target "acceptable bug level" just to=20 get the other bug-fixes/features out there? After all, people who=20 are savvy enough to run betas are able to choose to keep using=20 3.2.0b3 if they want. I don't know anything about the database code, but if someone points=20 the way, I'm happy to spend some time working on it. Do we need a=20 test dataset which reproducibly tickles a bug? Cheers, Lachlan On Saturday 18 January 2003 02:32, Geoff Hutchison wrote: > To quote Mark Twain... "Reports of my demise have been greatly > exaggerated." |
|
From: Neal R. <ne...@ri...> - 2003-01-27 22:42:56
|
Lachlan, Great job forward porting 3.1.6 features! I'm porting your changes to libhtdig and will make a new snapshot avaliable soon. What DB errors are you speaking of? Turning on wordlist_compress_zlib should be a workaround for the DB errors I know about. Thanks! Neal Richter On Mon, 20 Jan 2003, Lachlan Andrew wrote: > IMHO, this question highlights the need to get 3.2.0b[45] out ASAP. > Am I correct in believing that the hold-up is basically database > errors? If so, could we set a target "acceptable bug level" just to > get the other bug-fixes/features out there? After all, people who > are savvy enough to run betas are able to choose to keep using > 3.2.0b3 if they want. > > I don't know anything about the database code, but if someone points > the way, I'm happy to spend some time working on it. Do we need a > test dataset which reproducibly tickles a bug? > > Cheers, > Lachlan > > On Saturday 18 January 2003 02:32, Geoff Hutchison wrote: > > To quote Mark Twain... "Reports of my demise have been greatly > > exaggerated." > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-01-30 01:48:29
|
Greetings Neal,
Isn't wordlist_compress_zlib turned on by default? I've had a few=20
minor problems recently without changing it. The one I've got log=20
files for is that htpurge displays about six lots of diagnostics of=20
the form
pg->type: 0
************************************
************************************
************************************
page size:8192
00-07: Log sequence number. file : 0
00-07: Log sequence number. offset: 0
08-11: Current page number. : 143319
12-15: Previous page number. : 0
16-19: Next page number. : 132296
20-21: Number of item pairs on the page. : 0
22-23: High free byte page offset. : 8192
24: Btree tree level. : 0
25: Page type. : 0
entry offsets:
0: 0 0 0 0 0 0 0 0 d7 2f 2 0 0 0 0 0 c8 4 2 0=20
20: 0 0 0 20 0 0 fc 1f fc 1f ec 1f e8 1f d8 1f d4 1f c4 1f=20
40: c0 1f b0 1f ac 1f 9c 1f 98 1f 88 1f 84 1f 74 1f 70 1f 60 1f=20
60: 5c 1f 4c 1f 48 1f 38 1f 34 1f 24 1f 20 1f 10 1f c 1f fc 1e=20
80: f8 1e e8 1e e4 1e d4 1e d0 1e c0 1e bc 1e ac 1e a8 1e 98 1e=20
100: 94 1e 84 1e 80 1e 70 1e 6c 1e 5c 1e 58 1e 48 1e 44 1e 34 1e
=2E..
=2E..
=2E..
while it is discarding words. I assumed this is caused by a=20
recoverable error. The only difference in the entries are the=20
current/next pages, and bytes 8, 9, 16, 17, 18 of the "entry=20
offsets". The first "next page number" is 0, and subsequent ones are=20
the previous values of "current page number" (as if it is reading=20
backwards through a chain). If you like, I can send the whole 6MB=20
log file, and/or any configuration files you want.
A while ago (but I *think* with your fix in place), I also had a=20
problem with htdig crashing at one point, but I've lost the=20
details. When I've had problems, it has normally been on 10+ hour=20
digs. Any tips for isolating them? I've been thinking of doing an=20
integrity check every 100 database writes or so, but I don't know the=20
code well enough yet...
Thanks for your feedback, and very much for writng the _zlib fix!
Cheers,
Lachlan
On Tuesday 28 January 2003 09:38, Neal Richter wrote:
> =09What DB errors are you speaking of? Turning on
> wordlist_compress_zlib should be a workaround for the DB errors I
> know about.
> > Am I correct in believing that the hold-up is basically
> > database errors?
|
|
From: Neal R. <ne...@ri...> - 2003-02-01 00:31:00
|
Yep it is enabled by default.. If your error is repeatable, can you test it with wordlist_compress_zlib & wordlist_compress dissabled and re-run htpurge? I'd like to see if the error still appears. I have my doubts about whether this is caused by zlib or is a locical error independednt of page-compression scheme. Keep me posted! BerkeleyDB integrity check might be a nice feature.. I'll see what the SleepyCat book on BDB says. (New Riders ISBN 0735710643) I'd be happy to send you a copy of the book if you really want to dig into BDB. I could use some help there in the future increasing index efficiency ;-) Thanks! On Thu, 30 Jan 2003, Lachlan Andrew wrote: > Greetings Neal, > > Isn't wordlist_compress_zlib turned on by default? I've had a few > minor problems recently without changing it. The one I've got log > files for is that htpurge displays about six lots of diagnostics of > the form > > pg->type: 0 > ************************************ > ************************************ > ************************************ > page size:8192 > 00-07: Log sequence number. file : 0 > 00-07: Log sequence number. offset: 0 > 08-11: Current page number. : 143319 > 12-15: Previous page number. : 0 > 16-19: Next page number. : 132296 > 20-21: Number of item pairs on the page. : 0 > 22-23: High free byte page offset. : 8192 > 24: Btree tree level. : 0 > 25: Page type. : 0 > entry offsets: > 0: 0 0 0 0 0 0 0 0 d7 2f 2 0 0 0 0 0 c8 4 2 0 > 20: 0 0 0 20 0 0 fc 1f fc 1f ec 1f e8 1f d8 1f d4 1f c4 1f > 40: c0 1f b0 1f ac 1f 9c 1f 98 1f 88 1f 84 1f 74 1f 70 1f 60 1f > 60: 5c 1f 4c 1f 48 1f 38 1f 34 1f 24 1f 20 1f 10 1f c 1f fc 1e > 80: f8 1e e8 1e e4 1e d4 1e d0 1e c0 1e bc 1e ac 1e a8 1e 98 1e > 100: 94 1e 84 1e 80 1e 70 1e 6c 1e 5c 1e 58 1e 48 1e 44 1e 34 1e > ... > ... > ... > > while it is discarding words. I assumed this is caused by a > recoverable error. The only difference in the entries are the > current/next pages, and bytes 8, 9, 16, 17, 18 of the "entry > offsets". The first "next page number" is 0, and subsequent ones are > the previous values of "current page number" (as if it is reading > backwards through a chain). If you like, I can send the whole 6MB > log file, and/or any configuration files you want. > > A while ago (but I *think* with your fix in place), I also had a > problem with htdig crashing at one point, but I've lost the > details. When I've had problems, it has normally been on 10+ hour > digs. Any tips for isolating them? I've been thinking of doing an > integrity check every 100 database writes or so, but I don't know the > code well enough yet... > > Thanks for your feedback, and very much for writng the _zlib fix! > > Cheers, > Lachlan > > On Tuesday 28 January 2003 09:38, Neal Richter wrote: > > What DB errors are you speaking of? Turning on > > wordlist_compress_zlib should be a workaround for the DB errors I > > know about. > > > > Am I correct in believing that the hold-up is basically > > > database errors? > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-02 08:34:46
|
On Saturday 01 February 2003 11:25, Neal Richter wrote: > test it with > wordlist_compress_zlib & wordlist_compress dissabled and re-run > htpurge. I have my > doubts about whether this is caused by zlib or is a locical error > independednt of page-compression scheme. The test dig I ran didn't give the error, but the document set was=20 marginally different. I'll get back to you once I have concrete=20 results. > the SleepyCat book on BDB says. (New Riders ISBN 0735710643) > I'd be happy to send you a copy of the book if you really want to > dig into BDB. I could use some help there in the future increasing > index efficiency ;-) Thanks for the offer, but that sounds like more of a time commitment=20 than I can make at the moment :( If I can find some more time to=20 work on the database side of things, I'll let you know. Thanks again, Lachlan |
|
From: Lachlan A. <lh...@us...> - 2003-02-03 22:15:37
|
On Saturday 01 February 2003 11:25, Neal Richter wrote: > If your error is repeatable, can you test it with > wordlist_compress_zlib & wordlist_compress dissabled and re-run > htpurge? I'd like to see if the error still appears. The diagnostics only appear if compression is enabled. It takes over 24 hours (which becoems two days...) for the whole dig,=20 and I'll try to find a smaller data set that causes the problem. In=20 the mean time, are there any other tests I can do to try to track it=20 down? For example, is it possible to decompress the file offline to=20 compare it against the uncompressed one? Cheers, Lachlan |
|
From: Neal R. <ne...@ri...> - 2003-02-03 22:38:57
|
On Tue, 4 Feb 2003, Lachlan Andrew wrote: > On Saturday 01 February 2003 11:25, Neal Richter wrote: > > > If your error is repeatable, can you test it with > > wordlist_compress_zlib & wordlist_compress dissabled and re-run > > htpurge? I'd like to see if the error still appears. > > The diagnostics only appear if compression is enabled. > > It takes over 24 hours (which becoems two days...) for the whole dig, > and I'll try to find a smaller data set that causes the problem. In > the mean time, are there any other tests I can do to try to track it > down? For example, is it possible to decompress the file offline to > compare it against the uncompressed one? Other than using htdump on both the zlib compressed WordDB and the uncompressed WordDB I can't think of one. If there are differences in the dumps we could then see what effect it has on searching. If there are no differences I would lean towards a bug in htpurge. You may be able to hack htdump later to only uncompress the BDB pages in question. Have you stepped through the htpurge code to see when/how this happens? I am skeptical that it's caused by the zlib compression code because: 1) The changes to mp_cmpr are very minor to enable zlib compression and involve no changes to the input data that would otherwise goto the mifluz page compressor. 2) If the compressed data was getting corrupted zlib should report an error on the uncompress of that data-page. http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/htdig/htdig/db/mp_cmpr.c.diff?r1=1.2&r2=1.3 Thanks.. I appreciate your effort on this. One you get the htdumps tested for differences please report back! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-10 13:07:27
|
Greetings Neal, I've found a repeatable crash in the database code (stack dump below). =20 The bug seems to be very sensitive to the code, so I can't easily=20 print out debugging information. If you can suggest things to try in=20 ddd then I'll happily give them a go. The crash is about 30 mins=20 into execution. Thanks :) Lachlan Program received signal SIGABRT, Aborted. 0x4027e621 in kill () from /lib/libc.so.6 Current language: auto; currently c (gdb) where #0 0x4027e621 in kill () from /lib/libc.so.6 #1 0x4027e425 in raise () from /lib/libc.so.6 #2 0x4027fa53 in abort () from /lib/libc.so.6 #3 0x401ff45c in std::terminate() () from /usr/lib/libstdc++.so.5 #4 0x401ff616 in __cxa_throw () from /usr/lib/libstdc++.so.5 #5 0x401ff862 in operator new(unsigned) () from=20 /usr/lib/libstdc++.so.5 #6 0x401ff94f in operator new[](unsigned) () from=20 /usr/lib/libstdc++.so.5 #7 0x4007f24e in Compressor::get_vals(unsigned**, char const*)=20 (this=3D0xbfffe4a0, pres=3D0x85c3058, tag=3D0x2 <Address 0x2 out of=20 bounds>) at WordBitCompress.cc:815 #8 0x40085268 in WordDBPage::Uncompress_main(Compressor*)=20 (this=3D0xbfffe500, pin=3D0xbfffe4a0) at WordDBPage.cc:213 #9 0x40084fef in WordDBPage::Uncompress(Compressor*, int,=20 __db_cmpr_info*) (this=3D0xbfffe500, pin=3D0xbfffe4a0, ndebug=3D2) at=20 WordDBPage.cc:155 #10 0x40083547 in WordDBCompress::Uncompress(unsigned char const*,=20 int, unsigned char*, int) (this=3D0x8245c88, inbuff=3D0x87d2678 "\004",=20 inbuff_length=3D2032, outbuff=3D0x408309b0 "\001", outbuff_length=3D8192)= =20 at WordDBCompress.cc:156 #11 0x40082e11 in WordDBCompress_uncompress_c (inbuff=3D0x87d2678=20 "\004", inbuff_length=3D2032, outbuff=3D0x408309b0 "\001",=20 outbuff_length=3D8192, user_data=3D0x2) at WordDBCompress.cc:48 #12 0x400fea93 in CDB___memp_cmpr_read (dbmfp=3D0x82db790,=20 bhp=3D0x40830978, db_io=3D0xbfffe680, niop=3D0xbfffe67c) at mp_cmpr.c:306 #13 0x400fe832 in CDB___memp_cmpr (dbmfp=3D0x82db790, bhp=3D0x40830978,=20 db_io=3D0xbfffe680, flag=3D1, niop=3D0xbfffe67c) at mp_cmpr.c:153 #14 0x400fdcdf in CDB___memp_pgread (dbmfp=3D0x82db790, bhp=3D0x40830978,= =20 can_create=3D0) at mp_bh.c:212 #15 0x400ffdfb in CDB_memp_fget (dbmfp=3D0x82db790, pgnoaddr=3D0xbfffe758= ,=20 flags=3D0, addrp=3D0xbfffe75c) at mp_fget.c:353 #16 0x400d1747 in CDB___bam_search (dbc=3D0x82dbab0, key=3D0xbfffe970,=20 flags=3D12802, stop=3D1, recnop=3D0x0, exactp=3D0xbfffe844) at=20 bt_search.c:251 #17 0x400c95a9 in CDB___bam_c_search (dbc=3D0x82dbab0, key=3D0xbfffe970,=20 flags=3D15, exactp=3D0xbfffe844) at bt_cursor.c:1594 #18 0x400c8740 in CDB___bam_c_put (dbc_orig=3D0x82dbd50, key=3D0xbfffe970= ,=20 data=3D0xbfffe990, flags=3D15) at bt_cursor.c:982 #19 0x400da9ea in CDB___db_put (dbp=3D0x80e5f88, txn=3D0x0,=20 key=3D0xbfffe970, data=3D0xbfffe990, flags=3D0) at db_am.c:508 #20 0x40092796 in WordList::Put(WordReference const&, int)=20 (this=3D0xbffff190, arg=3D@0xbfffe970, flags=3D0) at WordDB.h:126 #21 0x4003ab7e in HtWordList::Flush() (this=3D0xbffff190) at=20 =2E./htword/WordList.h:118 #22 0x08056012 in Retriever::parse_url(URLRef&) (this=3D0xbffff0e0,=20 urlRef=3D@0x837c560) at Retriever.cc:667 #23 0x08055612 in Retriever::Start() (this=3D0xbffff0e0) at=20 Retriever.cc:432 #24 0x0805daa5 in main (ac=3D5, av=3D0xbffff704) at htdig.cc:338 #25 0x4026c280 in __libc_start_main () from /lib/libc.so.6 (gdb)=20 On Tuesday 28 January 2003 09:38, Neal Richter wrote: > =09What DB errors are you speaking of? Turning on > wordlist_compress_zlib should be a workaround for the DB errors I > know about. |
|
From: Neal R. <ne...@ri...> - 2003-02-10 18:27:51
|
This dump is happening using the old compression scheme. Notice that the code starts out in the HtDig code (Retriever/WordList) then travels to the BDB code (CDB_*) then pops back into HtDig code (WordDBCompress & WordDBPage) WordDBCompress & WordDBPage is the mifluz page compression scheme. stick a printf in mp_cmpr.c:CDB___memp_cmpr_deflate This function does the zlib page compression for the wordlist_compress_zlib=true option. The current mifluz code has issues and may be replaced by a newer milfuz snapshot in the future. Thanks. On Tue, 11 Feb 2003, Lachlan Andrew wrote: > Greetings Neal, > > I've found a repeatable crash in the database code (stack dump below). > The bug seems to be very sensitive to the code, so I can't easily > print out debugging information. If you can suggest things to try in > ddd then I'll happily give them a go. The crash is about 30 mins > into execution. > > Thanks :) > Lachlan > > Program received signal SIGABRT, Aborted. > 0x4027e621 in kill () from /lib/libc.so.6 > Current language: auto; currently c > (gdb) where > #0 0x4027e621 in kill () from /lib/libc.so.6 > #1 0x4027e425 in raise () from /lib/libc.so.6 > #2 0x4027fa53 in abort () from /lib/libc.so.6 > #3 0x401ff45c in std::terminate() () from /usr/lib/libstdc++.so.5 > #4 0x401ff616 in __cxa_throw () from /usr/lib/libstdc++.so.5 > #5 0x401ff862 in operator new(unsigned) () from > /usr/lib/libstdc++.so.5 > #6 0x401ff94f in operator new[](unsigned) () from > /usr/lib/libstdc++.so.5 > #7 0x4007f24e in Compressor::get_vals(unsigned**, char const*) > (this=0xbfffe4a0, pres=0x85c3058, tag=0x2 <Address 0x2 out of > bounds>) at WordBitCompress.cc:815 > #8 0x40085268 in WordDBPage::Uncompress_main(Compressor*) > (this=0xbfffe500, pin=0xbfffe4a0) at WordDBPage.cc:213 > #9 0x40084fef in WordDBPage::Uncompress(Compressor*, int, > __db_cmpr_info*) (this=0xbfffe500, pin=0xbfffe4a0, ndebug=2) at > WordDBPage.cc:155 > #10 0x40083547 in WordDBCompress::Uncompress(unsigned char const*, > int, unsigned char*, int) (this=0x8245c88, inbuff=0x87d2678 "\004", > inbuff_length=2032, outbuff=0x408309b0 "\001", outbuff_length=8192) > at WordDBCompress.cc:156 > #11 0x40082e11 in WordDBCompress_uncompress_c (inbuff=0x87d2678 > "\004", inbuff_length=2032, outbuff=0x408309b0 "\001", > outbuff_length=8192, user_data=0x2) at WordDBCompress.cc:48 > #12 0x400fea93 in CDB___memp_cmpr_read (dbmfp=0x82db790, > bhp=0x40830978, db_io=0xbfffe680, niop=0xbfffe67c) at mp_cmpr.c:306 > #13 0x400fe832 in CDB___memp_cmpr (dbmfp=0x82db790, bhp=0x40830978, > db_io=0xbfffe680, flag=1, niop=0xbfffe67c) at mp_cmpr.c:153 > #14 0x400fdcdf in CDB___memp_pgread (dbmfp=0x82db790, bhp=0x40830978, > can_create=0) at mp_bh.c:212 > #15 0x400ffdfb in CDB_memp_fget (dbmfp=0x82db790, pgnoaddr=0xbfffe758, > flags=0, addrp=0xbfffe75c) at mp_fget.c:353 > #16 0x400d1747 in CDB___bam_search (dbc=0x82dbab0, key=0xbfffe970, > flags=12802, stop=1, recnop=0x0, exactp=0xbfffe844) at > bt_search.c:251 > #17 0x400c95a9 in CDB___bam_c_search (dbc=0x82dbab0, key=0xbfffe970, > flags=15, exactp=0xbfffe844) at bt_cursor.c:1594 > #18 0x400c8740 in CDB___bam_c_put (dbc_orig=0x82dbd50, key=0xbfffe970, > data=0xbfffe990, flags=15) at bt_cursor.c:982 > #19 0x400da9ea in CDB___db_put (dbp=0x80e5f88, txn=0x0, > key=0xbfffe970, data=0xbfffe990, flags=0) at db_am.c:508 > #20 0x40092796 in WordList::Put(WordReference const&, int) > (this=0xbffff190, arg=@0xbfffe970, flags=0) at WordDB.h:126 > #21 0x4003ab7e in HtWordList::Flush() (this=0xbffff190) at > ../htword/WordList.h:118 > #22 0x08056012 in Retriever::parse_url(URLRef&) (this=0xbffff0e0, > urlRef=@0x837c560) at Retriever.cc:667 > #23 0x08055612 in Retriever::Start() (this=0xbffff0e0) at > Retriever.cc:432 > #24 0x0805daa5 in main (ac=5, av=0xbffff704) at htdig.cc:338 > #25 0x4026c280 in __libc_start_main () from /lib/libc.so.6 > (gdb) > > On Tuesday 28 January 2003 09:38, Neal Richter wrote: > > > What DB errors are you speaking of? Turning on > > wordlist_compress_zlib should be a workaround for the DB errors I > > know about. > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-10 22:07:51
|
Greetings Neal, I have found (what I think is) the problem -- the default=20 compression_level is 0, which causes mifluz to be used even when=20 the wordlist_compress_zlib flag is true. Is that correct? =20 Changing it to 8 causes CDB___memp_cmpr_deflate to be called, and=20 seems to fix the problem. Thanks for your help, and for writing the patch! Lachlan On Tuesday 11 February 2003 05:29, Neal Richter wrote: > This dump is happening using the old compression scheme. |
|
From: Neal R. <ne...@ri...> - 2003-02-10 22:27:12
|
Ahh. yes compression_level's new default should be '6' (the default compression level of gzip). This does make the excerpts database compressed also, but I don't see a dissadvantage there at all. It's pretty well tested at this point. Good work. I have the value set in all my xxx.conf files. Thanks On Tue, 11 Feb 2003, Lachlan Andrew wrote: > Greetings Neal, > > I have found (what I think is) the problem -- the default > compression_level is 0, which causes mifluz to be used even when > the wordlist_compress_zlib flag is true. Is that correct? > Changing it to 8 causes CDB___memp_cmpr_deflate to be called, and > seems to fix the problem. > > Thanks for your help, and for writing the patch! > Lachlan > > On Tuesday 11 February 2003 05:29, Neal Richter wrote: > > This dump is happening using the old compression scheme. > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-11 11:05:15
|
Greetings Neal, I agree that compressing the excerpts is a good side effect and have=20 just updated CVS to use '6' as the default. Cheers, Lachlan On Tuesday 11 February 2003 09:28, Neal Richter wrote: > compression_level's new default should be '6' (the > default compression level of gzip). > This does make the excerpts database compressed also, but I don't > see a disadvantage there at all. It's pretty well tested at this > point. |
|
From: Lachlan A. <lh...@us...> - 2003-02-12 23:37:24
|
Greetings Neal, I've just run a dig of about 50000 documents and got: =2E.. Deleted, not found: ID: 38018 URL:=20 file:///usr/share/doc/HTML/en/kdevelop/reference/C/CONTRIB/OR_PRACTICAL_C= /12_8.c htpurge: 37130 WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno =3D 33 WordDB: PANIC: Input/output error WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run database=20 recovery Any ideas? Thanks, Lachlan On Tuesday 11 February 2003 05:29, Neal Richter wrote: > This dump is happening using the old compression scheme. |
|
From: Neal R. <ne...@ri...> - 2003-02-12 23:51:24
|
what does 'htpurge' mean here? Did you start with a empty index via 'htdig -i'? Do you get this error during searching? What exactly are you using htpurge to do? If the page is corrupted you will be able to find the error using the correct search string with htsearch. Another idea is to do the dig and use htdump to dump the WordDB, that should also duplicate the error if the page is truly corrupted. If you can't duplicate the error with htsearch or htdump then htpurge is doing something funky. Need more info!!!! ;-) Thanks. On Thu, 13 Feb 2003, Lachlan Andrew wrote: > Greetings Neal, > > I've just run a dig of about 50000 documents and got: > ... > Deleted, not found: ID: 38018 URL: > file:///usr/share/doc/HTML/en/kdevelop/reference/C/CONTRIB/OR_PRACTICAL_C/12_8.c > htpurge: 37130 > WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno = 33 > WordDB: PANIC: Input/output error > WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run database > recovery > > Any ideas? > > Thanks, > Lachlan > > On Tuesday 11 February 2003 05:29, Neal Richter wrote: > > This dump is happening using the old compression scheme. > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-13 12:06:06
|
Sorry for the brief report, but I was just heading out to work... I was running rundig, so yes I started with htdig -i and then ran=20 htpurge. Running htdump was a good suggestion. It gives the same error after=20 getting up to 01examplesopengloverlay. htsearch also crashes when=20 searching for that string, but seems OK otherwise. =46rom ddd, it seems that the call to inflate() inside =20 CDB___memp_cmpr_inflate() is failing. Later I'll see if I can reproduce the problem on a smaller data set. Cheers, Lachlan On Thursday 13 February 2003 10:52, Neal Richter wrote: > what does 'htpurge' mean here? Did you start with a empty index > via 'htdig -i'? > > Do you get this error during searching? > > What exactly are you using htpurge to do? > > If the page is corrupted you will be able to find the error using > the correct search string with htsearch. > > Another idea is to do the dig and use htdump to dump the WordDB, > that should also duplicate the error if the page is truly > corrupted. > > If you can't duplicate the error with htsearch or htdump then > htpurge is doing something funky. > > Need more info!!!! ;-) |
|
From: Neal R. <ne...@ri...> - 2003-02-13 18:04:47
|
Please attempt to reproduce the error using ONLY htdig next. If the error is still present, the the error is in htdig. If the error is not present then the bug is happening during htpurge. On Thu, 13 Feb 2003, Lachlan Andrew wrote: > Sorry for the brief report, but I was just heading out to work... > > I was running rundig, so yes I started with htdig -i and then ran > htpurge. > > Running htdump was a good suggestion. It gives the same error after > getting up to 01examplesopengloverlay. htsearch also crashes when > searching for that string, but seems OK otherwise. > > >From ddd, it seems that the call to inflate() inside > CDB___memp_cmpr_inflate() is failing. > > Later I'll see if I can reproduce the problem on a smaller data set. > > Cheers, > Lachlan > > On Thursday 13 February 2003 10:52, Neal Richter wrote: > > what does 'htpurge' mean here? Did you start with a empty index > > via 'htdig -i'? > > > > Do you get this error during searching? > > > > What exactly are you using htpurge to do? > > > > If the page is corrupted you will be able to find the error using > > the correct search string with htsearch. > > > > Another idea is to do the dig and use htdump to dump the WordDB, > > that should also duplicate the error if the page is truly > > corrupted. > > > > If you can't duplicate the error with htsearch or htdump then > > htpurge is doing something funky. > > > > Need more info!!!! ;-) > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-13 23:49:47
|
An error occurs during an htdump straight after htdig. However, I=20 haven't yet got it to occur *within* htdig. Interestingly, the error first reported by htdump is similar to the=20 one I last reported, WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno =3D 23 WordDB: PANIC: Input/output error WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run=20 database recovery but the one by htpurge (and subsequent htdumps) is WordDB: CDB___memp_cmpr_read: unexpected compression flag value 0x8=20 at pgno =3D 26613 WordDB: PANIC: Successful return: 0 WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run=20 database recovery I'll keep looking... On Friday 14 February 2003 05:05, Neal Richter wrote: > Please attempt to reproduce the error using ONLY htdig next. > > If the error is still present, the the error is in htdig. If the > error is not present then the bug is happening during htpurge. |
|
From: Neal R. <ne...@ri...> - 2003-02-14 00:15:32
|
Interesting. Thanks for the clarification about your process.. Question: Your message below points to an error on page 26613. Your previous message pointed to an error on page 33. >WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno = 33 Is the error a moving target? ;-) I checked out the code in mp_cmpr.c around the error output (line 289) Basically I'm trying to figure out if you are getting page overflow or not.. Is there something you can tell us about the type of data you are indexing? Are they big pages with lots of repetitive information.. giving htdig many similar keys which hash/sort to the same pages? Please recompile just mp_cmpr.c with "gcc -DDEBUG_CMPR [etc]" and rerun htdig & htdump. You could do this quickly by hand via cut-paste and then link everything with make. If you could post the output to a webserver somewhere I'd like to look at it. At that point I'll check it out and get you a replacement mp_cmpr.c to try to get more information about the page in question... Thanks! On Fri, 14 Feb 2003, Lachlan Andrew wrote: > An error occurs during an htdump straight after htdig. However, I > haven't yet got it to occur *within* htdig. > > Interestingly, the error first reported by htdump is similar to the > one I last reported, > > WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno = 23 > WordDB: PANIC: Input/output error > WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run > database recovery > > but the one by htpurge (and subsequent htdumps) is > > WordDB: CDB___memp_cmpr_read: unexpected compression flag value 0x8 > at pgno = 26613 > WordDB: PANIC: Successful return: 0 > WordDBCursor::Get(17) failed DB_RUNRECOVERY: Fatal error, run > database recovery > > I'll keep looking... > > On Friday 14 February 2003 05:05, Neal Richter wrote: > > Please attempt to reproduce the error using ONLY htdig next. > > > > If the error is still present, the the error is in htdig. If the > > error is not present then the bug is happening during htpurge. > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-15 15:12:07
Attachments:
dump-v.trace.log
|
On Friday 14 February 2003 11:16, Neal Richter wrote: > Question: Your message below points to an error on page 26613.=20 > Your previous message pointed to an error on page 33. > Is the error a moving target? ;-) I think I reduced the data set slightly, but yes, the target does seem=20 to move :( As the attached file shows, it is now complaining about=20 page 46, and I don't think I've changed anything since the last time,=20 except the runtime flags (-v level). > Is there something you can tell us about the type of data you are > indexing? Are they big pages with lots of repetitive information.. > giving htdig many similar keys which hash/sort to the same pages? It is just micellaneous Linux documentation -- Qt, KDE, xemacs, LaTeX,=20 lots of READMEs. I can't think off hand of any particularly=20 repetitive pages... > Please recompile just mp_cmpr.c with "gcc -DDEBUG_CMPR [etc]" and > post the output to a webserver > At that point I'll check it out and get you a replacement mp_cmpr.c > to try to get more information about the page in question... When I compile htdig with that flag, the error does not occur. =20 However, I'm attaching a trace for htdump from a dig performed=20 without the flag, in case that helps. I hope it isn't too big... Cheers, Lachlan |
|
From: Neal R. <ne...@ri...> - 2003-02-16 19:41:02
|
First question: What were the results with the pagesize @ 32K?? > On Friday 14 February 2003 11:16, Neal Richter wrote: > > > Question: Your message below points to an error on page 26613. > > Your previous message pointed to an error on page 33. > > Is the error a moving target? ;-) > > I think I reduced the data set slightly, but yes, the target does seem > to move :( As the attached file shows, it is now complaining about > page 46, and I don't think I've changed anything since the last time, > except the runtime flags (-v level). If you are indexing the same data in the same order every time and the target moves with trivial changes of code or flags.. that usually means MEMORY CORRUPTION!!! Unfortunately these types of errors are difficult to duplicate on other machines/platforms. Valgrind is a nice open source memory debugger.. have you used it before? %valgrind htdig xxx yyy This will definetly help find a memory error, but will also drown you in output sice htdig is pretty bad with memory leaks.. but you can dissable leak detection and look only for memory corruption. > When I compile htdig with that flag, the error does not occur. > However, I'm attaching a trace for htdump from a dig performed > without the flag, in case that helps. I hope it isn't too big... Wow, very interesting since the debugg output doesn't do much. FYI: the other way to activate the DEBUG_CMPR flag is to make line 64 of mp_cmpr.c active. This way you can compile htdig normally and reduce the number of variables changing in the experiments. There are 12 spots in mp_cmpr.c where addtional code is executed with the flag. Three of them do nothing but a printf with no variable access.. they can be ignored. It would be interesting to comment out each of them except for one and see what the results are. Let us know what you find. I'd also like a link to your test data if you can put it up on a high bandwidth server.. and a copy of your conf file and commandline htdig options. I'd like to run Insure++ on it. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-02-18 12:04:49
|
Greetings Neal, On Monday 17 February 2003 06:41, Neal Richter wrote: > First question: What were the results with the pagesize @ 32K?? The run I didn't didn't produce any errors, but that doesn't give me=20 much confidence. The problem is elusive and I haven't yet played=20 around with the data set (using exclude_urls) to try to reproduce=20 it, as I have had to with 8k. > If you are indexing the same data in the same order every time > and the target moves with trivial changes of code or flags.. that > usually means MEMORY CORRUPTION!!! > Unfortunately these types of errors are difficult to duplicate on > other machines/platforms. True. But is the order actually the same? I haven't read the code=20 thoroughly, but I thought ht://Dig had a timing mechanism in order=20 not to flood a particular server. Couldn't that reorder the search=20 depending on how much time is spent on debugging output? > Valgrind is a nice open source memory debugger. > This will definetly help find a memory error, but will also drown > you in output sice htdig is pretty bad with memory leaks.. but you > can dissable leak detection and look only for memory corruption. Thanks for the tip. I had used mpatrol previously, which seems more=20 powerful, with the corresponding increase in hassle. Interestingly,=20 it didn't report any leaks or reproduce the error... I'm rerunning=20 it without valgrind now to check that the error is actually=20 repeatable. > I'd also like a link to your test data if you can put it up on a > high bandwidth server.. and a copy of your conf file and > commandline htdig options. I'd like to run Insure++ on it. I'll see what I can do. However, since all the URLs are file:///s,=20 the actual data set will differ for you, unless you overwrite your=20 current filesystem. Remember, this is a *very* pernickety bug! Now that you have suggested some tools, I'll keep playing around until=20 I have something more concrete to report. Thanks, as always, Lachlan |
|
From: Gilles D. <gr...@sc...> - 2003-02-18 16:01:17
|
According to Neal Richter: > > > Question: Your message below points to an error on page 26613. > > > Your previous message pointed to an error on page 33. > > > Is the error a moving target? ;-) > > > > I think I reduced the data set slightly, but yes, the target does seem > > to move :( As the attached file shows, it is now complaining about > > page 46, and I don't think I've changed anything since the last time, > > except the runtime flags (-v level). > > If you are indexing the same data in the same order every time and the > target moves with trivial changes of code or flags.. that usually means > MEMORY CORRUPTION!!! > > Unfortunately these types of errors are difficult to duplicate on other > machines/platforms. > > Valgrind is a nice open source memory debugger.. have you used it > before? > > %valgrind htdig xxx yyy > > This will definetly help find a memory error, but will also drown you in > output sice htdig is pretty bad with memory leaks.. but you can dissable > leak detection and look only for memory corruption. Just a little something from out in left field, as I've only been very loosely following this thread... Are you running the very latest release of the libz code? All this talk of memory corruption and memory leaks, seemingly related to libz (at least as suggested by Lachlan) reminds me of the recent security alerts regarding libz. The alerts referred to a double-free bug, which was presented as a security vulnerability, but if libz wasn't (or isn't) managing allocs and frees properly, couldn't that lead to all sorts of wierdness like this? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Lachlan A. <lh...@us...> - 2003-02-19 12:17:34
|
On Wednesday 19 February 2003 03:00, Gilles Detillieux wrote: > something from out in left field... Are you running the very > latest release of the libz code? Very good point, Gilles. I was running the old code. I've installed=20 zlib1-1.1.4 and am keeping testing. (KDE needs the zlib1 package=20 rather than zlib, but I assume the libraries are the same.) Cheers, Lachlan |
|
From: Lachlan A. <lh...@us...> - 2003-02-19 12:49:20
Attachments:
themes-example.html
|
On Friday 14 February 2003 11:16, Neal Richter wrote: > Is there something you can tell us about the type of data you are > indexing? Are they big pages with lots of repetitive information.. > giving htdig many similar keys which hash/sort to the same pages? Greetings Neal, I've found one page in the qt documentation which may be causing=20 those problems (attached). I hadn't realised it, but the=20 valid_punctuation attribute seems to be treated as an *optional*=20 word break. (The docs say it is *not* a word break, and that seems=20 the intention of WordType::WordToken...) The page has long strings=20 with many valid_punctuation symbols, and gives output like elliptical=091060=090=091113=0934 elp=091363=090=09131=090 elphick=091516=090=09750=090 elsbs=091372=090=09968=094 elsbsw=091372=090=09968=094 elsbswp=091372=090=09968=094 elsbswpe=091372=090=09968=094 elsbswpew=091372=090=09968=094 elsbswpewg=091372=090=09968=094 elsbswpewgr=091372=090=09968=094 elsbswpewgrr=091372=090=09968=094 elsbswpewgrr1=091372=090=09968=094 elsbswpewgrr1t=091372=090=09968=094 elsbswpewgrr1twa7=091372=090=09968=094 elsbswpewgrr1twa7z=091372=090=09968=094 elsbswpewgrr1twa7z1bea0=091372=090=09968=094 elsbswpewgrr1twa7z1bea0f=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fk=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fkd=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fkdrbk=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fkdrbke=091372=090=09968=094 elsbswpewgrr1twa7z1bea0fkdrbkezb=091372=090=09968=094 else=09225=090=091285=090 Might that be the trouble? (BTW, zlib 1.1.4 is still giving errors, albeit for a slightly=20 different data set.) Cheers, Lachlan |