From: Lachlan A. <lh...@us...> - 2003-06-08 05:20:32
|
Greetings all, Thank you to everybody for your work towards getting 3.2.0b5 out. Could people doing specific tasks give an update on how they're going,=20 and what others of us can do to help? Geoff: Did you manage the copyright update? If you have written the=20 sed script, but haven't had time to apply/commit, could you please=20 post it? I should have some time this week to do that. Gilles: How is the Latin encodings going? If it is simply a forward=20 port, I can try it. If you want new features, I'll leave it to you. Neal: It's good to hear that you're coming along with the Win32 port.=20 =46rom where I'm standing, the highest priority is applying the=20 memory-leak patch, so that the rest of us can test it thoroughly. =20 Could you please do that some time this week? Jess: Any luck with the HP-UX configuration? If you post the=20 config.log then one of the rest of us might have an idea why it is=20 failing. Does the following timeline sound feasible? This time, I think the showstoppers are gone, so we can afford to set=20 a firm timeline (although not necessarily this one :) Sun 8 - Fri 13: Finalise Unix code / configuration =09=09Update copyright =09=09Other documentation updates? Weekend 14-15: Unix code freeze, and re-test all available platforms =09=09Ask "non-team" volunteers to "install/make check" the snapshot Mon 16-Fri 20:=09Finish Win32, or decide to postpone until 3.2.0b6 =09=09Further documentation updates? =09=09Testing by non-team volunteers Weekend 21-22:=09Code freeze =09=09Test Win32 (if finished) =09=09Rename 3.2.0b4->3.2.0b5 Mon 23: BETA RELEASE What docs need to be updated? The www.htdig.org FAQ is much bigger=20 than the distribution FAQ. Should it be copied over (even though a=20 lot of it relates to 3.1)? |
From: J. op d. B. <ht...@op...> - 2003-06-08 08:10:57
|
I'm tracking down the problem. The original configure script (build bij autoconf 2.57) creates a C-program where HP-UX gcc chokes on, screaming that there are more than one prototype definitions for select. Autoconfing the configure.in script using autoconf 2.13 produces a different C-program and .. tada(.wav)! The select problem is gone. Actually, I took the select part from the 2.13 version and placed it over the 2.57 part. But now it breaks on the compile part saying something about ambigious declarations. Later more on that. I will make a file containing the original 2.57 select test and the 2.13 select test. Maybe someone with knowledge on autoconf can change it. ----- Original Message ----- From: "Lachlan Andrew" <lh...@us...> To: <htd...@li...> Sent: Sunday, June 08, 2003 7:20 AM Subject: [htdig-dev] 3.2.0b4 Progress check :) > Greetings all, > > Thank you to everybody for your work towards getting 3.2.0b5 out. > > Could people doing specific tasks give an update on how they're going, > and what others of us can do to help? > > Geoff: Did you manage the copyright update? If you have written the > sed script, but haven't had time to apply/commit, could you please > post it? I should have some time this week to do that. > > Gilles: How is the Latin encodings going? If it is simply a forward > port, I can try it. If you want new features, I'll leave it to you. > > Neal: It's good to hear that you're coming along with the Win32 port. > >From where I'm standing, the highest priority is applying the > memory-leak patch, so that the rest of us can test it thoroughly. > Could you please do that some time this week? > > Jess: Any luck with the HP-UX configuration? If you post the > config.log then one of the rest of us might have an idea why it is > failing. > > Does the following timeline sound feasible? > This time, I think the showstoppers are gone, so we can afford to set > a firm timeline (although not necessarily this one :) > > Sun 8 - Fri 13: Finalise Unix code / configuration > Update copyright > Other documentation updates? > Weekend 14-15: Unix code freeze, and re-test all available platforms > Ask "non-team" volunteers to "install/make check" the snapshot > Mon 16-Fri 20: Finish Win32, or decide to postpone until 3.2.0b6 > Further documentation updates? > Testing by non-team volunteers > Weekend 21-22: Code freeze > Test Win32 (if finished) > Rename 3.2.0b4->3.2.0b5 > Mon 23: BETA RELEASE > > What docs need to be updated? The www.htdig.org FAQ is much bigger > than the distribution FAQ. Should it be copied over (even though a > lot of it relates to 3.1)? > > > ------------------------------------------------------- > This SF.net email is sponsored by: Etnus, makers of TotalView, The best > thread debugger on the planet. Designed with thread debugging features > you've never dreamed of, try TotalView 6 free at www.etnus.com. > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > |
From: Lachlan A. <lh...@us...> - 2003-06-08 10:27:02
|
On Sun, 8 Jun 2003 18:10, J. op den Brouw wrote: > I'm tracking down the problem. Excellent :) It sounds like very clever detective work! > I will make a file containing the original 2.57 select test > and the 2.13 select test. Maybe someone with knowledge > on autoconf can change it. Could you also send config.log and the error messages produced by=20 the compiler? Thanks, Lachlan |
From: Gilles D. <gr...@sc...> - 2003-06-08 13:24:37
|
According to Lachlan Andrew: > Gilles: How is the Latin encodings going? If it is simply a forward > port, I can try it. If you want new features, I'll leave it to you. It's not a forward port, as the SGML decoding is quite different in 3.2. I was busy the past couple weeks with a server upgrade, and couldn't spare a few hours to code these changes and test them. I really will make the effort to get that done early this week, as I'm leaving next week for a much-needed two week vacation. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Gilles D. <gr...@sc...> - 2003-06-08 13:40:38
|
According to Lachlan Andrew: > What docs need to be updated? The www.htdig.org FAQ is much bigger > than the distribution FAQ. Should it be copied over (even though a > lot of it relates to 3.1)? Geoff has a pre-release check list he uses to determine which docs are forward ported or back ported from the maindocs tree. As this is still a beta release, I don't expect a lot of forward porting to maindocs, but there will be a need to upgrade some maindocs files like where.html, and the FAQ. If the FAQ is lacking or outdated in any 3.2 items, it should be updated, but the FAQ in the 3.2.0b5 release should be a snapshot of the FAQ in maindocs at release time. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
From: Lachlan A. <lh...@us...> - 2003-06-16 14:19:42
|
Greetings all, Are people happy with the tentative timetable I mailed a week ago? If=20 so, the Unix code should be essentially complete now, and 3.2.0b5=20 will be released in one week's time. Neal and Jess, do you still=20 plan to make any changes? Geoff, could you please post a to-do list for the release docs? All I=20 can think of is - Replace all references to 3.2.0b4 as current with 3.2.0b5 (Can this be automated at all?) - Remove checklist and solved issues from STATUS - Update copyright (if possible -- I don't see this as critical...) Please let me know what I can do to help the release. Thanks, Lachlan On Sun, 8 Jun 2003 15:20, Lachlan Andrew wrote: > Thank you to everybody for your work towards getting 3.2.0b5 out. > Geoff: Did you manage the copyright update? If you have written > the sed script, but haven't had time to apply/commit, could you > please post it? I should have some time this week to do that. > > Neal: It's good to hear that you're coming along with the Win32 > port. From where I'm standing, the highest priority is applying the > memory-leak patch, so that the rest of us can test it thoroughly. > Could you please do that some time this week? > > Jess: Any luck with the HP-UX configuration? If you post the > config.log then one of the rest of us might have an idea why it is > failing. > > Does the following timeline sound feasible? > This time, I think the showstoppers are gone, so we can afford to > set a firm timeline (although not necessarily this one :) > > Sun 8 - Fri 13: Finalise Unix code / configuration > =09=09Update copyright > =09=09Other documentation updates? > Weekend 14-15: Unix code freeze, and re-test all available > platforms >=09=09 Ask "non-team" volunteers to "install/make check" the >=09=09 snapshot > Mon 16-Fri 20: Finish Win32, or decide to postpone until =09=09 3.2.0b6 Further documentation updates? > =09=09 Testing by non-team volunteers > Weekend 21-22: Code freeze > =09=09 Test Win32 (if finished) > =09=09 Rename 3.2.0b4->3.2.0b5 > Mon 23: BETA RELEASE > > What docs need to be updated? --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Neal R. <ne...@ri...> - 2003-06-16 16:06:44
|
On Tue, 17 Jun 2003, Lachlan Andrew wrote: > Greetings all, > > Are people happy with the tentative timetable I mailed a week ago? If > so, the Unix code should be essentially complete now, and 3.2.0b5 > will be released in one week's time. Neal and Jess, do you still > plan to make any changes? Yes. I'll do some more with memory-leak fixes and the Win32 port. I'm in the process of comparing my tree with the CVS tree and moving changes over to CVS. I plan to commit code frequently this week and get my tree synched. I will also take care of updating the GPL notifications to LGPL. Thanks! > Geoff, could you please post a to-do list for the release docs? All I > can think of is > - Replace all references to 3.2.0b4 as current with 3.2.0b5 > (Can this be automated at all?) > - Remove checklist and solved issues from STATUS > - Update copyright (if possible -- I don't see this as critical...) > > Please let me know what I can do to help the release. > > Thanks, > Lachlan > > On Sun, 8 Jun 2003 15:20, Lachlan Andrew wrote: > > > Thank you to everybody for your work towards getting 3.2.0b5 out. > > Geoff: Did you manage the copyright update? If you have written > > the sed script, but haven't had time to apply/commit, could you > > please post it? I should have some time this week to do that. > > > > Neal: It's good to hear that you're coming along with the Win32 > > port. From where I'm standing, the highest priority is applying the > > memory-leak patch, so that the rest of us can test it thoroughly. > > Could you please do that some time this week? > > > > Jess: Any luck with the HP-UX configuration? If you post the > > config.log then one of the rest of us might have an idea why it is > > failing. > > > > Does the following timeline sound feasible? > > This time, I think the showstoppers are gone, so we can afford to > > set a firm timeline (although not necessarily this one :) > > > > Sun 8 - Fri 13: Finalise Unix code / configuration > > Update copyright > > Other documentation updates? > > Weekend 14-15: Unix code freeze, and re-test all available > > platforms > > Ask "non-team" volunteers to "install/make check" the > > snapshot > > Mon 16-Fri 20: Finish Win32, or decide to postpone until > 3.2.0b6 Further documentation updates? > > Testing by non-team volunteers > > Weekend 21-22: Code freeze > > Test Win32 (if finished) > > Rename 3.2.0b4->3.2.0b5 > > Mon 23: BETA RELEASE > > > > What docs need to be updated? > > -- > lh...@us... > ht://Dig developer DownUnder (http://www.htdig.org) > > > ------------------------------------------------------- > This SF.NET email is sponsored by: eBay > Great deals on office technology -- on eBay now! Click here: > http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Lachlan A. <lh...@us...> - 2003-06-17 13:12:00
|
That sounds great, Neal. Were you also planning to update the copyright dates to 2003 as well=20 as the COPYING file itself? (I'm not asking you to do it, just=20 asking if you're planning to :) It will be great having your Win32 patch in. That will help a lot of=20 people. Thanks for all your work! Lachlan =20 On Tue, 17 Jun 2003 02:06, Neal Richter wrote: > On Tue, 17 Jun 2003, Lachlan Andrew wrote: > > Neal and Jess, do you still plan to make any changes? > > Yes. I'll do some more with memory-leak fixes and the Win32 > port. > > I'm in the process of comparing my tree with the CVS tree and > moving changes over to CVS. I plan to commit code frequently this > week and get my tree synched. > > I will also take care of updating the GPL notifications to LGPL. > > Thanks! > > > Geoff, could you please post a to-do list for the release docs?=20 > > All I can think of is > > - Replace all references to 3.2.0b4 as current with 3.2.0b5 > > (Can this be automated at all?) > > - Remove checklist and solved issues from STATUS > > - Update copyright (if possible -- I don't see this as > > critical...) > > > > Please let me know what I can do to help the release. > > > > Thanks, > > Lachlan > > > > On Sun, 8 Jun 2003 15:20, Lachlan Andrew wrote: > > > Thank you to everybody for your work towards getting 3.2.0b5 > > > out. Geoff: Did you manage the copyright update? If you have > > > written the sed script, but haven't had time to apply/commit, > > > could you please post it? I should have some time this week to > > > do that. > > > > > > Neal: It's good to hear that you're coming along with the Win32 > > > port. From where I'm standing, the highest priority is applying > > > the memory-leak patch, so that the rest of us can test it > > > thoroughly. Could you please do that some time this week? > > > > > > Jess: Any luck with the HP-UX configuration? If you post the > > > config.log then one of the rest of us might have an idea why it > > > is failing. > > > > > > Does the following timeline sound feasible? > > > This time, I think the showstoppers are gone, so we can afford > > > to set a firm timeline (although not necessarily this one :) > > > > > > Sun 8 - Fri 13: Finalise Unix code / configuration > > > =09=09Update copyright > > > =09=09Other documentation updates? > > > Weekend 14-15: Unix code freeze, and re-test all available > > > platforms > > >=09=09 Ask "non-team" volunteers to "install/make check" the > > >=09=09 snapshot > > > Mon 16-Fri 20: Finish Win32, or decide to postpone until > > > > =09=09 3.2.0b6 Further documentation updates? > > > > > =09=09 Testing by non-team volunteers > > > Weekend 21-22: Code freeze > > > =09=09 Test Win32 (if finished) > > > =09=09 Rename 3.2.0b4->3.2.0b5 > > > Mon 23: BETA RELEASE > > > > > > What docs need to be updated? > > > > -- > > lh...@us... > > ht://Dig developer DownUnder (http://www.htdig.org) > > > > > > ------------------------------------------------------- > > This SF.NET email is sponsored by: eBay > > Great deals on office technology -- on eBay now! Click here: > > http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 > > _______________________________________________ > > htdig-dev mailing list > > htd...@li... > > https://lists.sourceforge.net/lists/listinfo/htdig-dev > > Neal Richter > Knowledgebase Developer > RightNow Technologies, Inc. > Customer Service for Every Web Site > Office: 406-522-1485 --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Neal R. <ne...@ri...> - 2003-06-17 15:54:33
|
On Tue, 17 Jun 2003, Lachlan Andrew wrote: > That sounds great, Neal. > > Were you also planning to update the copyright dates to 2003 as well > as the COPYING file itself? (I'm not asking you to do it, just > asking if you're planning to :) Sure! I'll update the copyright notices as well. > It will be great having your Win32 patch in. That will help a lot of > people. > > Thanks for all your work! > Lachlan > > On Tue, 17 Jun 2003 02:06, Neal Richter wrote: > > On Tue, 17 Jun 2003, Lachlan Andrew wrote: > > > Neal and Jess, do you still plan to make any changes? > > > > Yes. I'll do some more with memory-leak fixes and the Win32 > > port. > > > > I'm in the process of comparing my tree with the CVS tree and > > moving changes over to CVS. I plan to commit code frequently this > > week and get my tree synched. > > > > I will also take care of updating the GPL notifications to LGPL. > > > > Thanks! > > > > > Geoff, could you please post a to-do list for the release docs? > > > All I can think of is > > > - Replace all references to 3.2.0b4 as current with 3.2.0b5 > > > (Can this be automated at all?) > > > - Remove checklist and solved issues from STATUS > > > - Update copyright (if possible -- I don't see this as > > > critical...) > > > > > > Please let me know what I can do to help the release. > > > > > > Thanks, > > > Lachlan > > > > > > On Sun, 8 Jun 2003 15:20, Lachlan Andrew wrote: > > > > Thank you to everybody for your work towards getting 3.2.0b5 > > > > out. Geoff: Did you manage the copyright update? If you have > > > > written the sed script, but haven't had time to apply/commit, > > > > could you please post it? I should have some time this week to > > > > do that. > > > > > > > > Neal: It's good to hear that you're coming along with the Win32 > > > > port. From where I'm standing, the highest priority is applying > > > > the memory-leak patch, so that the rest of us can test it > > > > thoroughly. Could you please do that some time this week? > > > > > > > > Jess: Any luck with the HP-UX configuration? If you post the > > > > config.log then one of the rest of us might have an idea why it > > > > is failing. > > > > > > > > Does the following timeline sound feasible? > > > > This time, I think the showstoppers are gone, so we can afford > > > > to set a firm timeline (although not necessarily this one :) > > > > > > > > Sun 8 - Fri 13: Finalise Unix code / configuration > > > > Update copyright > > > > Other documentation updates? > > > > Weekend 14-15: Unix code freeze, and re-test all available > > > > platforms > > > > Ask "non-team" volunteers to "install/make check" the > > > > snapshot > > > > Mon 16-Fri 20: Finish Win32, or decide to postpone until > > > > > > 3.2.0b6 Further documentation updates? > > > > > > > Testing by non-team volunteers > > > > Weekend 21-22: Code freeze > > > > Test Win32 (if finished) > > > > Rename 3.2.0b4->3.2.0b5 > > > > Mon 23: BETA RELEASE > > > > > > > > What docs need to be updated? > > > > > > -- > > > lh...@us... > > > ht://Dig developer DownUnder (http://www.htdig.org) > > > > > > > > > ------------------------------------------------------- > > > This SF.NET email is sponsored by: eBay > > > Great deals on office technology -- on eBay now! Click here: > > > http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 > > > _______________________________________________ > > > htdig-dev mailing list > > > htd...@li... > > > https://lists.sourceforge.net/lists/listinfo/htdig-dev > > > > Neal Richter > > Knowledgebase Developer > > RightNow Technologies, Inc. > > Customer Service for Every Web Site > > Office: 406-522-1485 > > -- > lh...@us... > ht://Dig developer DownUnder (http://www.htdig.org) > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: J. op d. B. <ht...@op...> - 2003-06-17 19:33:52
|
Hi, ----- Original Message ----- From: "Lachlan Andrew" <lh...@us...> To: <htd...@li...> Sent: Monday, June 16, 2003 4:19 PM Subject: [htdig-dev] 3.2.0b5 Next progress check :) > Greetings all, > > Are people happy with the tentative timetable I mailed a week ago? If > so, the Unix code should be essentially complete now, and 3.2.0b5 > will be released in one week's time. Neal and Jess, do you still > plan to make any changes? I'm still having trouble getting the Berkeley database ./configure finding my select() function. I'm compiling a big post with lots of output from various programs/files, but I may not be able to produce a fix the planned release. > Geoff, could you please post a to-do list for the release docs? All I > can think of is > - Replace all references to 3.2.0b4 as current with 3.2.0b5 > (Can this be automated at all?) > - Remove checklist and solved issues from STATUS > - Update copyright (if possible -- I don't see this as critical...) Add a (new) document containing the Unix/Windows flavors htdig runs on + more information and hints on how to compile and install htdig on those various flavors. (HP-UX needs a huge list of parameters to get it compiled anyway). > Please let me know what I can do to help the release. > > Thanks, > Lachlan --Jesse |
From: Lachlan A. <lh...@us...> - 2003-06-18 12:37:34
|
Greetings all, I've just come across a database bug :( It was reporting WordKey::Compare: key length for a or b < info.num_length repeatedly when I ran a large dig without -i. I haven't tried repeating it yet, because the dig that produced it=20 takes three days!! (It uses a rather inefficient external_transport) I'll try to replicate it using a more manageable data set. Regards, Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Neal R. <ne...@ri...> - 2003-06-18 18:00:12
|
Hey, I think I have found something here.. no idea of importance yet. In db/db.c line 1049 we have an #ifdef: /* * If compression is on, the minimum page size must be multiplied * by the compression factor. */ #ifdef HAVE_ZLIB if(F_ISSET(dbp, DB_AM_CMPR)) { if(iopsize < DB_CMPR_MULTIPLY(dbenv, DB_MIN_PGSIZE)) iopsize = DB_CMPR_MULTIPLY(dbenv, DB_MIN_PGSIZE); } #endif /* HAVE_ZLIB */ This was added to db.c 3 years ago when Geoff merged Loic's code from milfuz. The problem is that it needs to be HAVE_LIBZ rather than HAVE_ZLIB! Our db_config.h contains HAVE_LIBZ & HAVE_ZLIB_H .. no HAVE_ZLIB. I am not sure yet how this affects us yet... it looks like a range check and may be part of why db-compression doesn't work when I try and set the wordlist_page_size to 64K. I also checked and this is the only occurance of this screwup. Obviously it must not be too important since we do have functional libz-based DB compression. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Lachlan A. <lh...@us...> - 2003-06-19 14:20:59
|
Greetings, Well spotted. It shouldn't matter, though. The comments talk about=20 "default" page size, so it could well be overridden anyway... Notice=20 that it is set to at most 16k a few lines above, so it isn't likely=20 to cause a 64k limit. Cheers, Lachlan On Thu, 19 Jun 2003 04:00, Neal Richter wrote: > Hey, > =09I think I have found something here.. no idea of importance yet. > > In db/db.c line 1049 we have an #ifdef: > > /* > * If compression is on, the minimum page size must be > multiplied * by the compression factor. > */ > #ifdef HAVE_ZLIB > if(F_ISSET(dbp, DB_AM_CMPR)) { > if(iopsize < DB_CMPR_MULTIPLY(dbenv, DB_MIN_PGSIZE)) > iopsize =3D DB_CMPR_MULTIPLY(dbenv, DB_MIN_PGSIZE); > } > #endif /* HAVE_ZLIB */ > > > This was added to db.c 3 years ago when Geoff merged Loic's code > from milfuz. > > The problem is that it needs to be HAVE_LIBZ rather than HAVE_ZLIB! > Our db_config.h contains HAVE_LIBZ & HAVE_ZLIB_H .. no HAVE_ZLIB. > > I am not sure yet how this affects us yet... it looks like a range > check and may be part of why db-compression doesn't work when I try > and set the wordlist_page_size to 64K. > > I also checked and this is the only occurance of this screwup. > > Obviously it must not be too important since we do have functional > libz-based DB compression. > > Thanks. > > Neal Richter > Knowledgebase Developer > RightNow Technologies, Inc. > Customer Service for Every Web Site > Office: 406-522-1485 --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Neal R. <ne...@ri...> - 2003-06-19 23:46:56
|
Hey, In looking at the amount of changes to the tree to be checked in for the Native Win32 port I decided to NOT check in incrementally through the week and do it all at once. I wrote up a report down below: There is pretty minimal impact for Unix code. The question is do you want me to sit on this or go ahead and check it in? I also looked over my fixes for memory leaks, these are separate from the native Win32 changes, and will be committing those ASAP.. early next week. I will also commit the latest version of libhtdig & libhtdigphp. These have independent makefiles from the rest of htdig and won't affect the new beta. I also have an efficiency improvement to WordDB.cc that uses the STL. This is probably inappropriate for the upcoming beta, but I'd like to add it soon. After I get this stuff added and I'll get back to thinking about cool new features like Porter Stemming in the worddb, improvements to worddb storage format and researchy AI stuff. Thanks! Neal Richter REPORT: There are 22 new files (10 of them WIN32 Makefiles), and nearly 1100 lines changed in the code base. Not much considering the size of the code base. I have made every effort to make all changes within #ifdef _MSC_VER and these changes should largely not effect the main body of code. Here are the exceptions: db/mp_alloc.c: I added two new functions CDB_get_mp_dirty_level & CDB_set_mp_dirty_level and made the CDB___mp_dirty_level static to db/mp_alloc.c. This is slightly cleaner that a true global variable and prevented some dllexporting win32 crap. htcommon/conf_lexer.lxx: bcopy() -> memcopy() htcommon/defaults.cc: I used "/quotes to divide up some of the very long strings to prevent MSVC++ from barfing on long lines. This won't affect gcc/g++ since "Hello" "World" is equivalent to "Hello \ World" in C. htdig/Document.h: added a one line ContentType() {return contentType.get();} for use with libhtdig htdig/Retriever.cc: Various changes to IsValidURL() to return an actual error code rather then FALSE for every error. More usage of the 'urls_seen' file. There were a few cases where a URL was not put into this file. htfuzzy/EndingsDB.cc & htfuzzy/Synonym.cc: I am using a new file_copy routine which is very well tested and works in Unix and Win32 rather than a 'system' call to copy files. This code is #ifdefed for use ONLY with Native Win32 & libhtdig/libhtdigphp usage and uses the old system("MV xx yy") for all others. I recommend at some point we do away with the 'system' call. /htlib/HtDateTime.cc: global var _strtime to my_strtime to make MSVC++ happy. There are a few new things added for Win32 which may prove useful later: local version of GNU regex (LGPL) for win32 Some POSIX-like directory routines for <dirent.h> for win32 (written by RightNow) local version of getopt (public domain version) Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Lachlan A. <lh...@us...> - 2003-06-21 14:04:05
|
On Fri, 20 Jun 2003 09:46, Neal Richter wrote: > =09There is pretty minimal impact for Unix code. The question is do > you want me to sit on this or go ahead and check it in? Greetings Neal, I've recently found that there is a big bug in the database code when=20 the size of a page is reduced, as when re-indexing. Rather than=20 hacking my earlier hack to limit the depth of recursion, we might=20 have to implement a proper fix before the beta goes out. However, I=20 don't think I'll have time for that for the next two months :( Translation: I'm in favour of your checking it in. > db/mp_alloc.c: > > I added two new functions CDB_get_mp_dirty_level & > CDB_set_mp_dirty_level and made the CDB___mp_dirty_level static to > db/mp_alloc.c. This is slightly cleaner that a true global > variable and prevented some dllexporting win32 crap. Yes, it was only ever meant to be a hack to tide us over until after=20 the next beta. That variable (and the config variable) will=20 disappear before 3.2.0b6... Cheers, Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Geoff H. <ghu...@ws...> - 2003-06-23 16:44:04
|
On Saturday, June 21, 2003, at 09:03 AM, Lachlan Andrew wrote: > have to implement a proper fix before the beta goes out. However, I > don't think I'll have time for that for the next two months :( > > Translation: I'm in favour of your checking it in. Ditto. -Geoff |
From: Lachlan A. <lh...@us...> - 2003-06-22 03:01:18
|
Greetings all, I think I've finally found the *source* of the recursions etc in the=20 database compression. Once 3.2.0b5 is out, I'll remove all the hacks=20 to limit explicit recursion, and to keep the cache clean... The problem is with the freelist of pages used when the compressed=20 page is larger than a "real" page. It is part of the same=20 environment as the rest of the database, and so shares the cache. =20 That means that writing a page can cause access to the cache, which=20 may require writing dirty pages etc. The solution seems to be simply to make it a "standalone" database. Can anyone see any problems with that approach? Do we need the=20 environment for anything? Cheers, Lachlan On Wed, 18 Jun 2003 22:36, Lachlan Andrew wrote: > I've just come across a database bug :( It was reporting > WordKey::Compare: key length for a or b < info.num_length > repeatedly when I ran a large dig without -i. > > I haven't tried repeating it yet, because the dig that produced it > takes three days!! (It uses a rather inefficient > external_transport) I'll try to replicate it using a more > manageable data set. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Neal R. <ne...@ri...> - 2003-06-23 14:41:14
|
Could you post a few more details? Do you have a patch I can play with? I'm looking for that term in both the New Riders Berkeley DB book and in the onone BDB documentation. Thanks. On Sun, 22 Jun 2003, Lachlan Andrew wrote: > Greetings all, > > I think I've finally found the *source* of the recursions etc in the > database compression. Once 3.2.0b5 is out, I'll remove all the hacks > to limit explicit recursion, and to keep the cache clean... > > The problem is with the freelist of pages used when the compressed > page is larger than a "real" page. It is part of the same > environment as the rest of the database, and so shares the cache. > That means that writing a page can cause access to the cache, which > may require writing dirty pages etc. > > The solution seems to be simply to make it a "standalone" database. > > Can anyone see any problems with that approach? Do we need the > environment for anything? > > Cheers, > Lachlan > > On Wed, 18 Jun 2003 22:36, Lachlan Andrew wrote: > > > I've just come across a database bug :( It was reporting > > WordKey::Compare: key length for a or b < info.num_length > > repeatedly when I ran a large dig without -i. > > > > I haven't tried repeating it yet, because the dig that produced it > > takes three days!! (It uses a rather inefficient > > external_transport) I'll try to replicate it using a more > > manageable data set. > > > -- > lh...@us... > ht://Dig developer DownUnder (http://www.htdig.org) > > > ------------------------------------------------------- > This SF.Net email is sponsored by: INetU > Attention Web Developers & Consultants: Become An INetU Hosting Partner. > Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! > INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Geoff H. <ghu...@ws...> - 2003-06-23 18:24:19
|
> The solution seems to be simply to make it a "standalone" database. > > Can anyone see any problems with that approach? Do we need the > environment for anything? Like Neal, I'm not entirely sure I understand your use of "standalone." IIRC, you're talking about the DB_ENV "environment" access to the database. I do not think we currently use the Berkeley environment support for anything much. IIRC, it's for client-server operation. But then again, it looks like you want DB_ENV for the Berkeley-level locking and transaction support, which we may want. -Geoff |
From: Neal R. <ne...@ri...> - 2003-06-23 23:07:11
|
Here's Lachlan's diff to db/mp_cmpr.c < if(CDB_db_create(&dbp, dbenv, 0) != 0) --- >/* Use *standalone* database, to prevent recursion when writing pages */ >/* from the cache, shared with other members of the environment */ > if(CDB_db_create(&dbp, NULL, 0) != 0) He is indeed talking about the DB_ENV "environment". The BDB book confirms that when the dbenv pointer is NULL the database is standalone.... not part of or using a Berkeley DB environment. My hunch is that this is a rather 'blunt' fix. It seems likely that their is a slight problem with the DB_ENV we use... maybe it needs to be tweaked before the db_create call if the compression is enabled?? The __db_env struct is fairly large, but most of it seems to be function-pointers. There are a number of variables for Locking, Logging, Transactions, Memory-pool, and some other flags. I'm looking to see what effect this will have. There are a number of important looking fields in __db_env... some having to do with db-filename sematics and location. I can't find much yet on what everything defaults to (if that is the right term) when standalone is used. I also think we need to look at how we can merge our changes with at least the next version 'up' of our BDB version at some point this year. Thanks. Neal On Mon, 23 Jun 2003, Geoff Hutchison wrote: > > The solution seems to be simply to make it a "standalone" database. > > > > Can anyone see any problems with that approach? Do we need the > > environment for anything? > > Like Neal, I'm not entirely sure I understand your use of "standalone." > IIRC, you're talking about the DB_ENV "environment" access to the > database. > > I do not think we currently use the Berkeley environment support for > anything much. IIRC, it's for client-server operation. But then again, > it looks like you want DB_ENV for the Berkeley-level locking and > transaction support, which we may want. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Lachlan A. <lh...@us...> - 2003-06-24 13:10:56
|
Greetings, Sorry for the delay in replying -- busy at work... Yes, I didn't think of the database location at all :( I don't think it is an issue of "tweaking". As long as the=20 environment is not the *same* environment as the rest of the=20 database, it will not share the cache. We could have another=20 environment with all the same parameters. (However we would probably=20 not want a cache, since the file shouldn't be used often.) Yes, we should eventually update the underlying BDB code, but perhaps=20 after 3.2.0b5 is out :) Cheers, Lachlan On Tue, 24 Jun 2003 09:06, Neal Richter wrote: > Here's Lachlan's diff to db/mp_cmpr.c > > < if(CDB_db_create(&dbp, dbenv, 0) !=3D 0) > --- > > >/* Use *standalone* database, to prevent recursion when writing > > pages */ > > /* from the cache, shared with other members of the > > environment */ > > if(CDB_db_create(&dbp, NULL, 0) !=3D 0) > > My hunch is that this is a rather 'blunt' fix. It seems likely > that their is a slight problem with the DB_ENV we use... maybe it > needs to be tweaked before the db_create call if the compression is > enabled?? > > The __db_env struct is fairly large, but most of it seems to be > function-pointers. There are a number of variables for Locking, > Logging, Transactions, Memory-pool, and some other flags. > > I'm looking to see what effect this will have. There are a number > of important looking fields in __db_env... some having to do with > db-filename sematics and location. > > I can't find much yet on what everything defaults to (if that is > the right term) when standalone is used. > > I also think we need to look at how we can merge our changes with > at least the next version 'up' of our BDB version at some point > this year. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Neal R. <ne...@ri...> - 2003-06-24 16:35:15
|
On Tue, 24 Jun 2003, Lachlan Andrew wrote: > I don't think it is an issue of "tweaking". As long as the > environment is not the *same* environment as the rest of the > database, it will not share the cache. We could have another > environment with all the same parameters. (However we would probably > not want a cache, since the file shouldn't be used often.) Do you know for certain that the environment is the same? It would be nice if I could duplicate this bug, but I've never been able to. This smells like something that should be handled in the WordDB class at the DB API level. 1) I notice that in WordDB there is a dbenv that is used with a BDB create function. There is also a set_cachesize() function in the C API. I'm wonder if we just set this variable to zero if that would have the desired effect. 2) I previously devised a 'cache' that works above the BDB API level for the WordDB class. This scheme would probably compensate for the loss of performance from eliminating the internal cache. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Lachlan A. <lh...@us...> - 2003-06-29 10:36:24
|
Greetings Neal, On Wed, 25 Jun 2003 02:35, Neal Richter wrote: > Do you know for certain that the environment is the same? Isn't that the meaning of passing the DB_ENV argument to =20 CDB_db_create()? I'm not sure how else to make two databases share=20 an environment. > It would be nice if I could duplicate this bug, but I've never been > able to. As Geoff pointed out recently, anyone with a sourceforce account can=20 get a compile farm account. That has a mac which had the problem. =20 See=20 <http://sourceforge.net/docman/display_doc.php?docid=3D762&group_id=3D1>. > This smells like something that should be handled in the WordDB > class at the DB API level. Hmm... Perhaps, but the original aim of the compression was to make=20 it transparent. Since it creates its own database, that seems to me=20 to be the place to fix things. I agree that totally disabling the environment is a bit drastic. It=20 should be possible to create a new environment containing *only* the=20 weak compression database, but with all "shared" fields (other than=20 the memory pool) copied from the environment of the main database's=20 environment. How would that sound? Regarding implementing a separate cache, that is certainly possible,=20 but it has the disadvantage of duplicating existing code. (I'm a big=20 fan of avoiding code bloat.) Cheers, Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
From: Neal R. <ne...@ri...> - 2003-07-01 20:09:09
|
On Sun, 29 Jun 2003, Lachlan Andrew wrote: > Greetings Neal, > > On Wed, 25 Jun 2003 02:35, Neal Richter wrote: > > > Do you know for certain that the environment is the same? > > Isn't that the meaning of passing the DB_ENV argument to > CDB_db_create()? I'm not sure how else to make two databases share > an environment. It is possible to create two different environments for the different DBs in the htdig classes that control them. > > It would be nice if I could duplicate this bug, but I've never been > > able to. > > As Geoff pointed out recently, anyone with a sourceforce account can > get a compile farm account. That has a mac which had the problem. > See > <http://sourceforge.net/docman/display_doc.php?docid=762&group_id=1>. Good point. > > This smells like something that should be handled in the WordDB > > class at the DB API level. > > Hmm... Perhaps, but the original aim of the compression was to make > it transparent. Since it creates its own database, that seems to me > to be the place to fix things. Maybe. I am hesitant to put lots of time into tweaking BDB code directly. 1)It makes moving to new versions of BDB harder 2)BDB is a VERY widely used piece of software and it is incredibly likely that most of the problems we encounter can be fixed at the BDB API level in our classes. This of course excludes our (Loic's) hacks to have ZLIB page compression. It comes down to treating the entire db directory as something we touch as little as possible.. and contain our code tweaking to mp_cmpr.c and a few other files. The more we tweak the more we diverge from stock BDB code and the more work we make for ourselves long term. We are 7 versions behind on BDB (we use 3.0.55). The changes/additions to the final 3.X (3.3.11) version are attractive. http://www.sleepycat.com/download/patchlogs.shtml > I agree that totally disabling the environment is a bit drastic. It > should be possible to create a new environment containing *only* the > weak compression database, but with all "shared" fields (other than > the memory pool) copied from the environment of the main database's > environment. How would that sound? Worth a try. > Regarding implementing a separate cache, that is certainly possible, > but it has the disadvantage of duplicating existing code. (I'm a big > fan of avoiding code bloat.) Yep. It looks like WordDBCache is supposed to do this, but it doesn't seem to be doing much. I used an STL hash and improved insertion time in the WordDB considerably. Probably due to queing up all the inserts in larger batches reduces overhead. > Cheers, > Lachlan > -- > lh...@us... > ht://Dig developer DownUnder (http://www.htdig.org) > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
From: Lachlan A. <lh...@us...> - 2003-07-06 00:42:38
|
On Wed, 2 Jul 2003 04:24, Neal Richter wrote: > It is possible to create two different environments for the > different DBs in the htdig classes that control them. True, but the "_weakcmpr" database is internally created by mp_cmpr.c=20 for any compressed database -- ht://Dig knows nothing about it. Of=20 course, we could *change* the API to pass in two database=20 environments (one for the database proper and one for _weakcmpr), but=20 that is far from fixing it at the API level. > I am hesitant to put lots of time into tweaking BDB code > directly. > 2)BDB is a VERY widely used piece of software and it is > incredibly likely that most of the problems we encounter can be > fixed at the BDB API level in our classes. This of course excludes > our (Loic's) hacks to have ZLIB page compression. Yes, it is exactly Loic's code (mp_cmpr.c) that I was proposing to=20 fix. In the past I hacked mp_alloc to try to work around the bug,=20 but I had planned to back those changes out once b5 was out and=20 there was time for a real fix. > The more we tweak the more we > diverge from stock BDB code and the more work we make for ourselves > long term. Agreed. > I used an STL hash and improved insertion time in the WordDB > considerably. Probably due to queing up all the inserts in larger > batches reduces overhead. That sounds impressive. OK -- I'll stop tinkering with the database=20 code... Cheers :) Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |