You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Jim C. <li...@yg...> - 2003-06-29 07:45:39
|
On Friday, June 27, 2003, at 02:00 PM, Patrick Robinson wrote: > I just installed htdig-3.2.0b4-20030622, and discovered that it's not > correctly handling Disallow: patterns from my robots.txt file. (I'm > hoping this is the correct list to post this!) > > I have these lines in my robots.txt: > User-agent: * > Disallow: /WebObjects/ > > In my config file, I do NOT exclude /cgi-bin/ via exclude_urls. > However, when I rundig -vvv, it tells me that URLs like the following > are rejected due to being "forbidden by server robots.txt": > href: http://www.mysite.edu/cgi-bin/WebObjects/blah/blah/blah I am seeing the same behavior in the current CVS code. As currently implemented, URL's are being checked for any occurrence of the disallow string, without regard to location within the URL. > This shouldn't happen. It should only be rejecting URLs *starting* > with "/WebObjects/" (at least, that's my interpretation of what I read > at http://www.robotstxt.org/wc/norobots.html). I agree that this behavior does not seem to match that specified by the standard. > I never had this problem in 3.1.6. Has something changed? I believe some of the related code changed with the introduction of new regex support. As it currently stands, the code is comparing the disallow against the full URL, rather than just the path, and it is not anchoring the comparison. In case you want to give it a try, I am attaching a patch that seems to correct the behavior of the robots code. I won't claim to have any deep insight into this part of the code, so no guarantees and all of that. Jim |
|
From: Geoff H. <ghu...@us...> - 2003-06-29 07:14:21
|
STATUS of ht://Dig branch 3-2-x CHECKLIST FOR 3.2.0b5: * Apply memory leak patches (Neal) * Check bugs listed in bug-tracker... * Polish release docs (Geoff) * Must be able to (a) make check and (b) index www.htdig.org using "robotstxt_name: master-htdig" on all systems listed as "supported". Systems tested so far: - Mandrake 8.2, gcc 3.2 (lha, 21 May) - FreeBSD 4.6, gcc 2.95.3 (lha, 23 May) - Debian, Linux kernel 2.2.19, gcc 2.95.4 (lha, 23 May) - SunOS 5.8 = Solaris 2.8, gcc 3.1 (lha, 25 May) - SunOS 5.8 = Solaris 2.8, Sun cc with g++ 3.1 (lha, 29 May) - OS X (Jim, 30 May) Partly tested: - RedHat 8 (Jim, 1 June. make check requires tweaking for apache) - SunOS 5.8 = Solaris 2.8, gcc 2.95.2 (lha. Makes check minus apache, Digs small htdig.org. 27 May) - SunOS 5.8 = Solaris 2.8, Sun cc with g++ 2.95.2 (lha. Makes check minus apache, Digs small htdig.org. 2 June) - RedHat 7.3 (lha. Makes check minus apache. Digs small htdig.org. 25 May) - Alpha Debian (lha. Makes check minus apache. Digs small htdig.org. 25 May) To be tested: - HP-UX 10.20, gcc 2.8.1 (Jesse) - RedHat, other versions anyone? - RedHat AdvanceServer Itanium II (David Bannon) Known to have problems: - SGI/Irix 6.5.3 using SGI compilers <http://www.geocrawler.com/mail/msg.php3?msg_id=8025827&list=8825> RELEASES: 3.2.0b5: Next release, July 2003 3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease. 3.2.0b3: Released: 22 Feb 2001. 3.2.0b2: Released: 11 Apr 2000. 3.2.0b1: Released: 4 Feb 2000. (Please note that everything added here should have a tracker PR# so we can be sure they're fixed. Geoff is currently trying to add PR#s for what's currently here.) SHOWSTOPPERS: * Mifluz database errors are a severe problem (PR#428295) -- Does Neal's new zlib patch solve this for now? KNOWN BUGS: * Odd behavior with $(MODIFIED) and scores not working with wordlist_compress set but work fine without wordlist_compress. (the date is definitely stored correctly, even with compression on so this must be some sort of weird htsearch bug) PR#618737. * META descriptions are somehow added to the database as FLAG_TITLE, not FLAG_DESCRIPTION. (PR#618738) Can anyone reproduce this? I can't! -- Lachlan Me either. Let's remove the PR. -Geoff PENDING PATCHES (available but need work): * Additional support for Win32. (Neal) * Memory improvements to htmerge. (Backed out b/c htword API changed.) * Mifluz merge. NEEDED FEATURES: * Quim's new htsearch/qtest query parser framework. * File/Database locking. PR#405764. TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Tests for new config file parser * Duplicate document detection while indexing * Major revisions to ExternalParser.cc, including fork/exec instead of popen, argument handling for parser/converter, allowing binary output from an external converter. * ExternalTransport needs testing of changes similar to ExternalParser. DOCUMENTATION: * List of supported platforms/compilers is ancient. (PR#405279) * Document all of htsearch's mappings of input parameters to config attributes to template variables. (Relates to PR#405278.) Should we make sure these config attributes are all documented in defaults.cc, even if they're only set by input parameters and never in the config file? * Split attrs.html into categories for faster loading. * Turn defaults.cc into an XML file for generating documentation and defaults.cc. * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. regex matching, database compression.) PRs# 405280 #405281. * TODO.html has not been updated for current TODO list and completions. I've tried. Someone "official" please check and remove this -- Lachlan * Htfuzzy could use more documentation on what each fuzzy algorithm does. PR#405714. * Document the list of all installed files and default locations. PR#405715. OTHER ISSUES: * Can htsearch actually search while an index is being created? * The code needs a security audit, esp. htsearch. PR#405765. |
|
From: Jim C. <li...@yg...> - 2003-06-29 04:55:41
|
On Thursday, June 26, 2003, at 11:24 AM, Neal Richter wrote: > Please get your compilers humming and pound on this. I made every > effort to make sure the Win32 changes are transparent to non native > Win32 > builds. Still seems to compile fine under OS X (as of Saturday's CVS). The 'make check' tests pass, but seem to require some effort. I assume this is unrelated to your changes, but I will mention it here anyhow. I ran into a number of mostly random failures that would appear on one run and go away on the next. I traced the problem to the startup of httpd in test_functions. There is no sleep after the call to start httpd, and this results in the server sometimes not being ready when dependent tests start making requests. After adding a 'sleep 2' (as test_functions already does whenever it kills httpd), I was unable to reproduce the failures. There also appears to be another problem with test_functions involving the process used to kill httpd. The pattern seems to be to check for 'log/httpd.pid', issue a 'kill' using the pid cat'd from the file if the file exists, and then copy '/dev/null' to 'logs/httpd.pid'. This leaves an empty file which later allows the test for 'log/httpd.pid' to succeed when there is in fact no pid on which 'kill' can operate. The result is that 'kill' is called without an argument. This does not typically interfere with the actual tests, but does generate noise might concern people who run the tests. This problem also appears to be someone random. The test(s) on which the 'kill' is called without arguments varies; and in some cases the problem doesn't occur at all. Jim |
|
From: <CHO...@sp...> - 2003-06-28 02:40:44
|
Hi,
Like to enquire on how to change the fonts(size, colour) on the
search result page? I could only change the layout by editing the
header.html and footer.html however the font size and colour cannot be
changed.
Please advise.
Thank you so much
Regards
Jason Choo
|
|
From: Lachlan A. <lh...@us...> - 2003-06-28 00:20:33
|
Greetings,
The problem seems to be in htlib/HtDateTime.h with lines (156-159):
// From an integer (seconds from epoc)
HtDateTime(const int i) {SetDateTime((time_t)i); ToLocalTime();}
// From a time_t value and pointer
HtDateTime(time_t &t) {SetDateTime(t); ToLocalTime();}
Does anyone know what the purpose of the (const int i) version is?
In systems when time_t is an alias for int (like HPUX, it seems),
it causes problems. In other systems, it seems to have no effect,
other than perhaps allowing a "const" argument to be passed as
non-const (which is slightly dubious).
I won't change anything, because my changes to the database code seem
to have introduced a plethora of bugs :(
Jesse, if you comment out line 156, I think it should compile.
Cheers,
Lachlan
On Thu, 26 Jun 2003 01:46, J. op den Brouw wrote:
> UX 10.20
> gcc 2.8.1
> make 3.78
>=20
> make[1]: Entering directory
> `/pers/www/msql/Projects/Htdig/BUILD320/htdig-3.2.0b4-20030622/htnet'
> /bin/sh ../libtool --mode=3Dcompile g++ -DHAVE_CONFIG_H -I. -I.
> -I../include
> -DDEFAULT_CONFIG_FILE=3D\"/opt/htdig/htdig32/conf/htdig.conf\"
> -I../include -I../htlib -I../htnet -I../htcommon -I../htword -I../db
> -I../db -I/opt/htdig/lib/zlib/include -g -O2 -Wall -fno-rtti
> -fno-exceptions -c -o HtFile.lo `test -f 'HtFile.cc' || echo
> './'`HtFile.cc
> g++ -DHAVE_CONFIG_H -I. -I. -I../include
> -DDEFAULT_CONFIG_FILE=3D\"/opt/htdig/htdig32/conf/htdig.conf\"
> -I../include -I../htlib -I../htnet -I../htcommon -I../htword -I../db
> -I../db -I/opt/htdig/lib/zlib/include -g -O2 -Wall -fno-rtti
> -fno-exceptions -c HtFile.cc -o HtFile.o
> HtFile.cc: In method `enum Transport::DocStatus HtFile::Request()':
> HtFile.cc:255: call of overloaded `HtDateTime(int &)' is ambiguous
> ../htlib/HtDateTime.h:156: candidates are: HtDateTime::HtDateTime(int)
> ../htlib/HtDateTime.h:159: HtDateTime::HtDateTime(int &=
)
> ../htlib/HtDateTime.h:479: HtDateTime::HtDateTime(const
> HtDateTime &) <near match>
--=20
lh...@us...
ht://Dig developer DownUnder (http://www.htdig.org)
|
|
From: Patrick R. <pg...@vt...> - 2003-06-27 20:00:20
|
Hi folks, I just installed htdig-3.2.0b4-20030622, and discovered that it's not correctly handling Disallow: patterns from my robots.txt file. (I'm hoping this is the correct list to post this!) I have these lines in my robots.txt: User-agent: * Disallow: /WebObjects/ In my config file, I do NOT exclude /cgi-bin/ via exclude_urls. However, when I rundig -vvv, it tells me that URLs like the following are rejected due to being "forbidden by server robots.txt": href: http://www.mysite.edu/cgi-bin/WebObjects/blah/blah/blah This shouldn't happen. It should only be rejecting URLs *starting* with "/WebObjects/" (at least, that's my interpretation of what I read at http://www.robotstxt.org/wc/norobots.html). If I remove the "Disallow: /WebObjects/" line from robots.txt and rerun rundig, it now indexes those URLs. I never had this problem in 3.1.6. Has something changed? -- Patrick Robinson AHNR Info Technology, Virginia Tech pg...@vt... |
|
From: Neal R. <ne...@ri...> - 2003-06-26 17:26:04
|
Hey all, As you will see if you use the CVS tree (or on sunday with the new snapshot) I've separately checked in the Native Win32 code changes and updated the copyright notices and license notices to LGPL. Please get your compilers humming and pound on this. I made every effort to make sure the Win32 changes are transparent to non native Win32 builds. Note I used "#ifdef _MSC_VER" rather than _WIN32. Our cygwin users should remain unaffected. I'm redoing all the memory leak work as my patches were rusty. I'll keep you posted. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: J. op d. B. <ht...@op...> - 2003-06-25 15:51:49
|
UX 10.20 gcc 2.8.1 make 3.78 ./configure works!!!!! it finds the select. make breaks with the following: make[1]: Entering directory `/pers/www/msql/Projects/Htdig/BUILD320/htdig-3.2.0b4-20030622/htnet' /bin/sh ../libtool --mode=compile g++ -DHAVE_CONFIG_H -I. -I. -I../include -DDEFAULT_CONFIG_FILE=\"/opt/htdig/htdig32/conf/htdig.conf\" -I../include -I../htlib -I../htnet -I../htcommon -I../htword -I../db -I../db -I/opt/htdig/lib/zlib/include -g -O2 -Wall -fno-rtti -fno-exceptions -c -o HtFile.lo `test -f 'HtFile.cc' || echo './'`HtFile.cc g++ -DHAVE_CONFIG_H -I. -I. -I../include -DDEFAULT_CONFIG_FILE=\"/opt/htdig/htdig32/conf/htdig.conf\" -I../include -I../htlib -I../htnet -I../htcommon -I../htword -I../db -I../db -I/opt/htdig/lib/zlib/include -g -O2 -Wall -fno-rtti -fno-exceptions -c HtFile.cc -o HtFile.o HtFile.cc: In method `enum Transport::DocStatus HtFile::Request()': HtFile.cc:255: call of overloaded `HtDateTime(int &)' is ambiguous ../htlib/HtDateTime.h:156: candidates are: HtDateTime::HtDateTime(int) ../htlib/HtDateTime.h:159: HtDateTime::HtDateTime(int &) ../htlib/HtDateTime.h:479: HtDateTime::HtDateTime(const HtDateTime &) <near match> HtFile.cc:260: call of overloaded `HtDateTime(int &)' is ambiguous ../htlib/HtDateTime.h:156: candidates are: HtDateTime::HtDateTime(int) ../htlib/HtDateTime.h:159: HtDateTime::HtDateTime(int &) ../htlib/HtDateTime.h:479: HtDateTime::HtDateTime(const HtDateTime &) <near match> HtFile.cc:282: call of overloaded `HtDateTime(int &)' is ambiguous ../htlib/HtDateTime.h:156: candidates are: HtDateTime::HtDateTime(int) ../htlib/HtDateTime.h:159: HtDateTime::HtDateTime(int &) ../htlib/HtDateTime.h:479: HtDateTime::HtDateTime(const HtDateTime &) <near match> make[1]: *** [HtFile.lo] Error 1 make[1]: Leaving directory `/pers/www/msql/Projects/Htdig/BUILD320/htdig-3.2.0b4-20030622/htnet' make: *** [all-recursive] Error 1 [msql@chaos htdig-3.2.0b4-20030622]$ I'm not able to answer any mail, the receiving mail server is not reachable due to a hardware failure (have to sort that out). --Jesse |
|
From: Neal R. <ne...@ri...> - 2003-06-24 16:35:15
|
On Tue, 24 Jun 2003, Lachlan Andrew wrote: > I don't think it is an issue of "tweaking". As long as the > environment is not the *same* environment as the rest of the > database, it will not share the cache. We could have another > environment with all the same parameters. (However we would probably > not want a cache, since the file shouldn't be used often.) Do you know for certain that the environment is the same? It would be nice if I could duplicate this bug, but I've never been able to. This smells like something that should be handled in the WordDB class at the DB API level. 1) I notice that in WordDB there is a dbenv that is used with a BDB create function. There is also a set_cachesize() function in the C API. I'm wonder if we just set this variable to zero if that would have the desired effect. 2) I previously devised a 'cache' that works above the BDB API level for the WordDB class. This scheme would probably compensate for the loss of performance from eliminating the internal cache. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Neil K. <nk...@ma...> - 2003-06-24 15:08:44
|
> Odd... It dumps the output in a file /tmp/t_htsearchxxxxx. > Could you please re-run it and mail me that file? Contents of t_htsearch file appears after my .sig. It's possible to duplicate this running htsearch from the command line; = entering a query and a result format produces the same output as in the = test file: the output stops at '<form action=3D"'. There's no core dump, = nonzero exit or other indication that anything failed. The output just = stops.=20 --=20 Neil Kohl Manager, ACP Online nk...@ma...=20 t_htsearch10604: Content-type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html><head><title>Search results for 'also'</title></head> <body bgcolor=3D"#eef7ff"> <h2><img src=3D"@IMAGEDIR@/htdig.gif" alt=3D"ht://Dig"> Search results for 'also'</h2> <hr noshade size=3D"4"> <form method=3D"get" action=3D" Neil Kohl Manager, ACP Online =20 American College of Physicians nk...@ac... 215.351.2638, 800.523.1546 x2638 |
|
From: Lachlan A. <lh...@us...> - 2003-06-24 13:10:56
|
Greetings, Sorry for the delay in replying -- busy at work... Yes, I didn't think of the database location at all :( I don't think it is an issue of "tweaking". As long as the=20 environment is not the *same* environment as the rest of the=20 database, it will not share the cache. We could have another=20 environment with all the same parameters. (However we would probably=20 not want a cache, since the file shouldn't be used often.) Yes, we should eventually update the underlying BDB code, but perhaps=20 after 3.2.0b5 is out :) Cheers, Lachlan On Tue, 24 Jun 2003 09:06, Neal Richter wrote: > Here's Lachlan's diff to db/mp_cmpr.c > > < if(CDB_db_create(&dbp, dbenv, 0) !=3D 0) > --- > > >/* Use *standalone* database, to prevent recursion when writing > > pages */ > > /* from the cache, shared with other members of the > > environment */ > > if(CDB_db_create(&dbp, NULL, 0) !=3D 0) > > My hunch is that this is a rather 'blunt' fix. It seems likely > that their is a slight problem with the DB_ENV we use... maybe it > needs to be tweaked before the db_create call if the compression is > enabled?? > > The __db_env struct is fairly large, but most of it seems to be > function-pointers. There are a number of variables for Locking, > Logging, Transactions, Memory-pool, and some other flags. > > I'm looking to see what effect this will have. There are a number > of important looking fields in __db_env... some having to do with > db-filename sematics and location. > > I can't find much yet on what everything defaults to (if that is > the right term) when standalone is used. > > I also think we need to look at how we can merge our changes with > at least the next version 'up' of our BDB version at some point > this year. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Lachlan A. <lh...@us...> - 2003-06-24 12:51:38
|
On Tue, 24 Jun 2003 21:06, Gabriele Bartolini wrote: > > do a lot of conditional compilation or clean up the code and get > > people to use newer compilers. > > I agree with Geoff. I don't know though if at this time it is > better to wait for 3.2.0b5 to be out. Hear, hear! :) Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Lachlan A. <lh...@us...> - 2003-06-24 12:45:52
|
On Tue, 24 Jun 2003 05:06, Neil Kohl wrote: > Success! Excellent -- well done! > Note that one test -- t_htsearch -- failed > Output doesn't match "4 matches" > ../htsearch/htsearch -c > /home/neilk/src/htdig-3.2.0b4-20030615/test/conf/htdig.conf > 'words=3Dalso' >> /tmp/t_htsearch25459 -- > Simple search for 'also' Odd... It dumps the output in a file /tmp/t_htsearchxxxxx. Could you please re-run it and mail me that file? Thanks (yet again :) Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Gabriele B. <g.b...@co...> - 2003-06-24 11:06:06
|
Hi friends! > But IMHO, we should be pushing people towards newer compilers with > newer releases. It's pretty clear that new GCC releases will stop > compiling non-ISO C++, so we'll either need to do a lot of conditional > compilation or clean up the code and get people to use newer compilers. I agree with Geoff. Also, if you could be interested, thanks to my friend Marco, ht://Check since last sunday has got automatic checks for C++ standard library and conditional compilation which for fstream, iomanip, etc. and namespaces. If you are interested in this, I volounteer myself to port the code to this, maybe asking my friend Marco (mn...@sf...) to join me. I don't know though if at this time it is better to wait for 3.2.0b5 to be out. Let me know Ciao -Gabriele -- Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |
|
From: Neal R. <ne...@ri...> - 2003-06-23 23:07:11
|
Here's Lachlan's diff to db/mp_cmpr.c < if(CDB_db_create(&dbp, dbenv, 0) != 0) --- >/* Use *standalone* database, to prevent recursion when writing pages */ >/* from the cache, shared with other members of the environment */ > if(CDB_db_create(&dbp, NULL, 0) != 0) He is indeed talking about the DB_ENV "environment". The BDB book confirms that when the dbenv pointer is NULL the database is standalone.... not part of or using a Berkeley DB environment. My hunch is that this is a rather 'blunt' fix. It seems likely that their is a slight problem with the DB_ENV we use... maybe it needs to be tweaked before the db_create call if the compression is enabled?? The __db_env struct is fairly large, but most of it seems to be function-pointers. There are a number of variables for Locking, Logging, Transactions, Memory-pool, and some other flags. I'm looking to see what effect this will have. There are a number of important looking fields in __db_env... some having to do with db-filename sematics and location. I can't find much yet on what everything defaults to (if that is the right term) when standalone is used. I also think we need to look at how we can merge our changes with at least the next version 'up' of our BDB version at some point this year. Thanks. Neal On Mon, 23 Jun 2003, Geoff Hutchison wrote: > > The solution seems to be simply to make it a "standalone" database. > > > > Can anyone see any problems with that approach? Do we need the > > environment for anything? > > Like Neal, I'm not entirely sure I understand your use of "standalone." > IIRC, you're talking about the DB_ENV "environment" access to the > database. > > I do not think we currently use the Berkeley environment support for > anything much. IIRC, it's for client-server operation. But then again, > it looks like you want DB_ENV for the Berkeley-level locking and > transaction support, which we may want. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Neil K. <nk...@ma...> - 2003-06-23 19:07:12
|
Lachlan, Success! Note that one test -- t_htsearch -- failed; see output below my = .sig. I'm digging htdig.org right now and will post results tomorrow. --=20 Neil Kohl Manager, ACP Online nk...@ma... PASS: t_htdig tempWords: 'also:0 ' Boolean: 'also:0 ' initial: '' Fuzzy on: also searchWords: 'also:0 ' LogicalWords: also Pattern: also perform_push @0: also score: push @0 term:factor http://localhost:7400/set1/script.html=20 base_score 0.0909091 date_score 0 backlink_score 176.471 score 5.17932(5.17932), maxScore -1.79769e+308, minScore 1.79769e+308 Set maxScore =3D score Set minScore =3D score http://localhost:7400/set1/site2.html=20 base_score 4.54545 date_score 0 backlink_score 500 score 6.22564(6.22564), maxScore 5.17932, minScore 5.17932 Set maxScore =3D score http://localhost:7400/set1/site4.html=20 base_score 0.636364 date_score 0 backlink_score 60.6061 score 4.13104(4.13104), maxScore 6.22564, minScore 5.17932 Set minScore =3D score http://localhost:7400/set1/bad_local.htm=20 base_score 0.181818 date_score 0 backlink_score 153.846 score 5.04361(5.04361), maxScore 6.22564, minScore 4.13104 generateStars: doc, min, max 6.22564, 4.13104, 6.22564 generateStars: nStars 4 of 4 generateStars: doc, min, max 6.22564, 4.13104, 6.22564 generateStars: nStars 4 of 4 generateStars: doc, min, max 5.17932, 4.13104, 6.22564 generateStars: nStars 3 of 4 generateStars: doc, min, max 5.17932, 4.13104, 6.22564 generateStars: nStars 3 of 4 generateStars: doc, min, max 5.04361, 4.13104, 6.22564 generateStars: nStars 2 of 4 generateStars: doc, min, max 5.04361, 4.13104, 6.22564 generateStars: nStars 2 of 4 generateStars: doc, min, max 4.13104, 4.13104, 6.22564 generateStars: nStars 1 of 4 generateStars: doc, min, max 4.13104, 4.13104, 6.22564 generateStars: nStars 1 of 4 Output doesn't match "4 matches" ../htsearch/htsearch -c /home/neilk/src/htdig-3.2.0b4-20030615/test/conf/ht= dig.conf 'words=3Dalso' >> /tmp/t_htsearch25459 -- Simple search for 'also' FAIL: t_htsearch PASS: t_htmerge PASS: t_htnet PASS: t_htdig_local =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 1 of 14 tests failed =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> Lachlan Andrew <lh...@us...> 06/21/03 04:37AM >>> One day I'll get it right... Could you please delete the [0] from=20 lines 2301 and 2331? Thanks for your patience! Lachlan |
|
From: Geoff H. <ghu...@ws...> - 2003-06-23 18:24:19
|
> The solution seems to be simply to make it a "standalone" database. > > Can anyone see any problems with that approach? Do we need the > environment for anything? Like Neal, I'm not entirely sure I understand your use of "standalone." IIRC, you're talking about the DB_ENV "environment" access to the database. I do not think we currently use the Berkeley environment support for anything much. IIRC, it's for client-server operation. But then again, it looks like you want DB_ENV for the Berkeley-level locking and transaction support, which we may want. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2003-06-23 16:46:01
|
> I don't think Geoff was saying we *shouldn't* use 2.7. > It's just that we haven't actually *tested* it under 2.7. > If (when :) you can confirm that it still works under 2.8, Right. Sorry for the confusion. If it works under gcc-2.7, so much the better! But IMHO, we should be pushing people towards newer compilers with newer releases. It's pretty clear that new GCC releases will stop compiling non-ISO C++, so we'll either need to do a lot of conditional compilation or clean up the code and get people to use newer compilers. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2003-06-23 16:44:04
|
On Saturday, June 21, 2003, at 09:03 AM, Lachlan Andrew wrote: > have to implement a proper fix before the beta goes out. However, I > don't think I'll have time for that for the next two months :( > > Translation: I'm in favour of your checking it in. Ditto. -Geoff |
|
From: Neal R. <ne...@ri...> - 2003-06-23 14:41:14
|
Could you post a few more details? Do you have a patch I can play with? I'm looking for that term in both the New Riders Berkeley DB book and in the onone BDB documentation. Thanks. On Sun, 22 Jun 2003, Lachlan Andrew wrote: > Greetings all, > > I think I've finally found the *source* of the recursions etc in the > database compression. Once 3.2.0b5 is out, I'll remove all the hacks > to limit explicit recursion, and to keep the cache clean... > > The problem is with the freelist of pages used when the compressed > page is larger than a "real" page. It is part of the same > environment as the rest of the database, and so shares the cache. > That means that writing a page can cause access to the cache, which > may require writing dirty pages etc. > > The solution seems to be simply to make it a "standalone" database. > > Can anyone see any problems with that approach? Do we need the > environment for anything? > > Cheers, > Lachlan > > On Wed, 18 Jun 2003 22:36, Lachlan Andrew wrote: > > > I've just come across a database bug :( It was reporting > > WordKey::Compare: key length for a or b < info.num_length > > repeatedly when I ran a large dig without -i. > > > > I haven't tried repeating it yet, because the dig that produced it > > takes three days!! (It uses a rather inefficient > > external_transport) I'll try to replicate it using a more > > manageable data set. > > > -- > lh...@us... > ht://Dig developer DownUnder (http://www.htdig.org) > > > ------------------------------------------------------- > This SF.Net email is sponsored by: INetU > Attention Web Developers & Consultants: Become An INetU Hosting Partner. > Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! > INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Geoff H. <ghu...@us...> - 2003-06-22 07:14:23
|
STATUS of ht://Dig branch 3-2-x CHECKLIST FOR 3.2.0b5: * Apply memory leak patches (Neal) * Check bugs listed in bug-tracker... * Polish release docs (Geoff) * Must be able to (a) make check and (b) index www.htdig.org using "robotstxt_name: master-htdig" on all systems listed as "supported". Systems tested so far: - Mandrake 8.2, gcc 3.2 (lha, 21 May) - FreeBSD 4.6, gcc 2.95.3 (lha, 23 May) - Debian, Linux kernel 2.2.19, gcc 2.95.4 (lha, 23 May) - SunOS 5.8 = Solaris 2.8, gcc 3.1 (lha, 25 May) - SunOS 5.8 = Solaris 2.8, Sun cc with g++ 3.1 (lha, 29 May) - OS X (Jim, 30 May) Partly tested: - RedHat 8 (Jim, 1 June. make check requires tweaking for apache) - SunOS 5.8 = Solaris 2.8, gcc 2.95.2 (lha. Makes check minus apache, Digs small htdig.org. 27 May) - SunOS 5.8 = Solaris 2.8, Sun cc with g++ 2.95.2 (lha. Makes check minus apache, Digs small htdig.org. 2 June) - RedHat 7.3 (lha. Makes check minus apache. Digs small htdig.org. 25 May) - Alpha Debian (lha. Makes check minus apache. Digs small htdig.org. 25 May) To be tested: - HP-UX 10.20, gcc 2.8.1 (Jesse) - RedHat, other versions anyone? - RedHat AdvanceServer Itanium II (David Bannon) Known to have problems: - SGI/Irix 6.5.3 using SGI compilers <http://www.geocrawler.com/mail/msg.php3?msg_id=8025827&list=8825> RELEASES: 3.2.0b5: Next release, June 2003??? 3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease. 3.2.0b3: Released: 22 Feb 2001. 3.2.0b2: Released: 11 Apr 2000. 3.2.0b1: Released: 4 Feb 2000. (Please note that everything added here should have a tracker PR# so we can be sure they're fixed. Geoff is currently trying to add PR#s for what's currently here.) SHOWSTOPPERS: * Mifluz database errors are a severe problem (PR#428295) -- Does Neal's new zlib patch solve this for now? KNOWN BUGS: * Odd behavior with $(MODIFIED) and scores not working with wordlist_compress set but work fine without wordlist_compress. (the date is definitely stored correctly, even with compression on so this must be some sort of weird htsearch bug) PR#618737. * META descriptions are somehow added to the database as FLAG_TITLE, not FLAG_DESCRIPTION. (PR#618738) Can anyone reproduce this? I can't! -- Lachlan PENDING PATCHES (available but need work): * Additional support for Win32. * Memory improvements to htmerge. (Backed out b/c htword API changed.) * Mifluz merge. NEEDED FEATURES: * Quim's new htsearch/qtest query parser framework. * File/Database locking. PR#405764. TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Tests for new config file parser * Duplicate document detection while indexing * Major revisions to ExternalParser.cc, including fork/exec instead of popen, argument handling for parser/converter, allowing binary output from an external converter. * ExternalTransport needs testing of changes similar to ExternalParser. DOCUMENTATION: * List of supported platforms/compilers is ancient. (PR#405279) * Document all of htsearch's mappings of input parameters to config attributes to template variables. (Relates to PR#405278.) Should we make sure these config attributes are all documented in defaults.cc, even if they're only set by input parameters and never in the config file? * Split attrs.html into categories for faster loading. * Turn defaults.cc into an XML file for generating documentation and defaults.cc. * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. regex matching, database compression.) PRs# 405280 #405281. * TODO.html has not been updated for current TODO list and completions. I've tried. Someone "official" please check and remove this -- Lachlan * Htfuzzy could use more documentation on what each fuzzy algorithm does. PR#405714. * Document the list of all installed files and default locations. PR#405715. OTHER ISSUES: * Can htsearch actually search while an index is being created? * The code needs a security audit, esp. htsearch. PR#405765. |
|
From: Lachlan A. <lh...@us...> - 2003-06-22 03:01:18
|
Greetings all, I think I've finally found the *source* of the recursions etc in the=20 database compression. Once 3.2.0b5 is out, I'll remove all the hacks=20 to limit explicit recursion, and to keep the cache clean... The problem is with the freelist of pages used when the compressed=20 page is larger than a "real" page. It is part of the same=20 environment as the rest of the database, and so shares the cache. =20 That means that writing a page can cause access to the cache, which=20 may require writing dirty pages etc. The solution seems to be simply to make it a "standalone" database. Can anyone see any problems with that approach? Do we need the=20 environment for anything? Cheers, Lachlan On Wed, 18 Jun 2003 22:36, Lachlan Andrew wrote: > I've just come across a database bug :( It was reporting > WordKey::Compare: key length for a or b < info.num_length > repeatedly when I ran a large dig without -i. > > I haven't tried repeating it yet, because the dig that produced it > takes three days!! (It uses a rather inefficient > external_transport) I'll try to replicate it using a more > manageable data set. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Lachlan A. <lh...@us...> - 2003-06-21 14:04:05
|
On Fri, 20 Jun 2003 09:46, Neal Richter wrote: > =09There is pretty minimal impact for Unix code. The question is do > you want me to sit on this or go ahead and check it in? Greetings Neal, I've recently found that there is a big bug in the database code when=20 the size of a page is reduced, as when re-indexing. Rather than=20 hacking my earlier hack to limit the depth of recursion, we might=20 have to implement a proper fix before the beta goes out. However, I=20 don't think I'll have time for that for the next two months :( Translation: I'm in favour of your checking it in. > db/mp_alloc.c: > > I added two new functions CDB_get_mp_dirty_level & > CDB_set_mp_dirty_level and made the CDB___mp_dirty_level static to > db/mp_alloc.c. This is slightly cleaner that a true global > variable and prevented some dllexporting win32 crap. Yes, it was only ever meant to be a hack to tide us over until after=20 the next beta. That variable (and the config variable) will=20 disappear before 3.2.0b6... Cheers, Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Lachlan A. <lh...@us...> - 2003-06-21 08:38:34
|
One day I'll get it right... Could you please delete the [0] from=20 lines 2301 and 2331? Thanks for your patience! Lachlan On Sat, 21 Jun 2003 00:59, Neil Kohl wrote: > Well, the last patch has changed the error message from search.cc: > > "search.cc", line 2301: Error: An array must have at least one > element. > "search.cc", line 2331: Error: An array must have at least > one element. > > > Neil Kohl > Manager, ACP Online > American College of Physicians > nk...@ac... 215.351.2638, 800.523.1546 x2638 > > >>> Lachlan Andrew <lh...@us...> 06/20/03 02:55AM >>> > > Greetings Neil, > > I should have read the error message more carefully. Try the > attached patch (again, applied to the clean package). > > Thanks yet again for this, > Lachlan > > On Fri, 20 Jun 2003 02:27, Neil Kohl wrote: > > No complaints about ResultMatch. Still getting 2 errors in > > search.cc: > > > > "search.cc", line 2301: Error: An integer constant expression is > > required within the array subscript operator. > > > > Almost there... --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Neil K. <nk...@ma...> - 2003-06-20 14:59:17
|
Hi, Lachlan.=20 Well, the last patch has changed the error message from search.cc: "search.cc", line 2301: Error: An array must have at least one element. "search.cc", line 2331: Error: An array must have at least one element. Neil Kohl Manager, ACP Online =20 American College of Physicians nk...@ac... 215.351.2638, 800.523.1546 x2638 >>> Lachlan Andrew <lh...@us...> 06/20/03 02:55AM >>> Greetings Neil, I should have read the error message more carefully. Try the attached=20 patch (again, applied to the clean package). Thanks yet again for this, Lachlan On Fri, 20 Jun 2003 02:27, Neil Kohl wrote: > No complaints about ResultMatch. Still getting 2 errors in > search.cc: > > "search.cc", line 2301: Error: An integer constant expression is > required within the array subscript operator. > > Almost there... --=20 lh...@us...=20 ht://Dig developer DownUnder (http://www.htdig.org) |