You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Jim C. <li...@yg...> - 2003-07-11 02:54:43
|
On Thursday, July 10, 2003, at 03:58 PM, Ted Stresen-Reuter wrote: > [tedmasterweb:~] tedsr% cvs > -d:pserver:ano...@cv...:/cvsroot/htdig login > (Logging in to ano...@cv...) > CVS password: > cvs [login aborted]: recv() from server cvs.sourceforge.net: > Connection reset by peer > [tedmasterweb:~] tedsr% > > Any suggestions? Try again? ;) I have run into this a number of times with various sourceforge projects. In the past it has always gone away after trying another time or three. Jim |
|
From: Ted Stresen-R. <ted...@ma...> - 2003-07-10 21:58:24
|
Hi, I'm trying to log in to cvs on cvs.sourceforge.net to be able to check out the current working copy of htdig but there's what I get when I try to follow the instructions on the cvs page of the developer site: [tedmasterweb:~] tedsr% cvs -d:pserver:ano...@cv...:/cvsroot/htdig login (Logging in to ano...@cv...) CVS password: cvs [login aborted]: recv() from server cvs.sourceforge.net: Connection reset by peer [tedmasterweb:~] tedsr% Any suggestions? I would like to check out the source from the tree because I'm trying to learn how to use the GNU autoconf, automake, etc., suite of tools and would like to work with the actual source rather than a snapshot (as it is my understanding that the snapshots have all been 'libtooliezed', but I would love to be corrected if I'm wrong). Ted Stresen-Reuter |
|
From: Gilles D. <gr...@sc...> - 2003-07-08 22:30:41
|
According to Jim Cole: > On Friday, June 27, 2003, at 02:00 PM, Patrick Robinson wrote: > > I just installed htdig-3.2.0b4-20030622, and discovered that it's not > > correctly handling Disallow: patterns from my robots.txt file. (I'm > > hoping this is the correct list to post this!) > > > > I have these lines in my robots.txt: > > User-agent: * > > Disallow: /WebObjects/ > > > > In my config file, I do NOT exclude /cgi-bin/ via exclude_urls. > > However, when I rundig -vvv, it tells me that URLs like the following > > are rejected due to being "forbidden by server robots.txt": > > href: http://www.mysite.edu/cgi-bin/WebObjects/blah/blah/blah > > I am seeing the same behavior in the current CVS code. As currently > implemented, URL's are being checked for any occurrence of the disallow > string, without regard to location within the URL. > > > This shouldn't happen. It should only be rejecting URLs *starting* > > with "/WebObjects/" (at least, that's my interpretation of what I read > > at http://www.robotstxt.org/wc/norobots.html). > > I agree that this behavior does not seem to match that specified by the > standard. Correct. This has been reported before, and possible solutions discussed, but nobody followed through with implementing one. > > I never had this problem in 3.1.6. Has something changed? > > I believe some of the related code changed with the introduction of new > regex support. As it currently stands, the code is comparing the > disallow against the full URL, rather than just the path, and it is not > anchoring the comparison. Correct again. Either anchoring the comparison, or going back to using StringMatch instead of Regex, is the solution, but in either case, you must be sure you're always looking at only the path portion of the URL, not the full URL as the 3.2 code does now. > In case you want to give it a try, I am attaching a patch that seems to > correct the behavior of the robots code. I won't claim to have any deep > insight into this part of the code, so no guarantees and all of that. The problem with that patch is it seems to miss the case of IsDisallowed called by Server::push(), so there it would end up checking the full URL against the anchored patterns for the path, and you'd never get a match. Unless the tests in Retriever::IsValidURL() pre-screen all cases before attempting a push(), I think some disallowed URLs could slip through the cracks. A more self-contained fix is below. It sidesteps the whole issue by making a regex pattern that can match the whole URL, so minimal code changes are needed. I don't know how efficient this ends up being, though. I also haven't tested this beyond making sure the full pattern works in egrep, so please test this patch carefully before using. I'll await feedback before committing it. --- htdig/Server.cc.orig 2003-06-24 15:40:11.000000000 -0500 +++ htdig/Server.cc 2003-07-08 17:16:18.000000000 -0500 @@ -316,9 +316,13 @@ void Server::robotstxt(Document &doc) if (*rest) { if (pattern.length()) - pattern << '|' << rest; - else - pattern = rest; + pattern << '|'; + while (*rest) + { + if (strchr("^.[$()|*+?{\\", *rest)) + pattern << '\\'; + pattern << *rest++; + } } } // @@ -332,7 +336,9 @@ void Server::robotstxt(Document &doc) if (debug > 1) cout << "Pattern: " << pattern << endl; - _disallow.set(pattern, config->Boolean("case_sensitive")); + String fullpatt = "^[^:]*://[^/]*("; + fullpatt << pattern << ')'; + _disallow.set(fullpatt, config->Boolean("case_sensitive")); } -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Neal R. <ne...@ri...> - 2003-07-08 16:51:04
|
On Mon, 7 Jul 2003, Sandy MacKenzie wrote: > Geoff, > > Thanks for having a think about this. I think our problems are related > to htmerge but given that I had to make changes to get the code to > compile/run on 64 bit Solaris, I was concerned that there may be other > 64 bit issues. Our word.db files are getting greater than 2Gb > (uncompressed due to earlier encountered zlib errors). When we merger Have you tried the latest snapshots? Lachlan's recent fix for the recursion problem may help you. What kind of SUN machine are you using? Does anyone know if the sourceforge compile farm has 64bit Solaris boxes? Thanks! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-07-08 13:13:58
|
Thanks Neil. This is really odd, especially if standard output was still truncated=20 after <form action=3D" The program is clearly still running past that point. Perhaps stdout =20 isn't being flushed, or perhaps it encounters an error, which makes=20 it swallow all future output. I'll be a bit too busy to look into this until mid August. Anyone=20 else on the list is welcome to have a stab. Cheers, Lachlan On Tue, 8 Jul 2003 03:51, Neil Kohl wrote: > I was out of the office last week and couldn't devote any time to > this until today. Attached is stderr from running "htsearch > 'words=3Dtest&format=3Dlong'" --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Lachlan A. <lh...@us...> - 2003-07-08 13:03:16
|
Greetings, FWIW, there are also problems on Mandrake x86 when the database file=20 size is over 4Gb. It shows up as WordKey::Compare having a zero=20 length argument. Cheers, Lachlan On Tue, 8 Jul 2003 07:10, Geoff Hutchison wrote: > I'm not familiar with issues on Solaris, but ht://Dig has long been > "clean" on Alpha systems. So it should be 64-bit clean as-is. The > Berkeley DB code is most definitely 64-bit clean and handles > databases up to 4TB if you've got the hardware for it. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Sandy M. <sa...@sa...> - 2003-07-07 22:24:40
|
Geoff,
Thanks for having a think about this. I think our problems are related
to htmerge but given that I had to make changes to get the code to
compile/run on 64 bit Solaris, I was concerned that there may be other
64 bit issues. Our word.db files are getting greater than 2Gb
(uncompressed due to earlier encountered zlib errors). When we merger
large databases, we are encountering problems where searches on words
that return results from the pre-merged database do not return results
from the merged database. We only seemed to encounter these issues when
we got into the >4Gb memory area.
The reason we are using htmerge is that we need a composite index of
content available from both within and external to the university, and
also to allow us to spider faster. if you can suggest why we may be
having problems we'd be really grateful.
Thanks
Sandy
p.s.
We are using 3.2.0b4-20030126 and are compiling statically.
To make it compile for 64 bit with gcc, I changed a line in
include/htconfig.h from:
(I am not a c/c++ developer so these changes are largely based on
hunches)
/* Define this to the type of the third argument of getpeername() */
#define GETPEERNAME_LENGTH_T size_t
to:
/* Define this to the type of the third argument of getpeername() */
#define GETPEERNAME_LENGTH_T socklen_t
and in htlib/String.cc (to resolve a segmentation error)
void String::copy_data_from(const char *s, int len, int dest_offset)
{
memcpy(Data + dest_offset, s, len);
}
to:
void String::copy_data_from(const char *s, size_t len, size_t
dest_offset)
{
memcpy(Data + dest_offset, s, len);
}
On Monday, July 7, 2003, at 10:10 pm, Geoff Hutchison wrote:
>> Can anyone comment on what would be required to make the htdig 3.2b4
>> code 64 bit clean ?
>
> I'm not familiar with issues on Solaris, but ht://Dig has long been
> "clean" on Alpha systems. So it should be 64-bit clean as-is. The
> Berkeley DB code is most definitely 64-bit clean and handles databases
> up to 4TB if you've got the hardware for it.
>
>> I am merging large databases, and the htmerge requires >4Gb memory so
>> I have had to compile using -m64 (Solaris 8, gcc 3.2.1) but had to
>> make some small changes in the code to make it compile, and to avoid
>> a segmentation error.
>
> OK, what changes did you make exactly? What snapshot did you use? Are
> you attempting to compile/use shared libraries?
>
>> I am indexing > 400,000 pages - any thoughts on whether htdig 3.3 can
>> scale to this ?
>
> I'm curious why this is causing problems. I know people who have
> 32-bit systems that easily handle 400,000+ pages. How big are your
> databases exactly? Are your problems limited to htmerge? (In which
> case, I likely know the problem, and it's not due to 64-bit
> addressing.)
>
> -Geoff
>
> --
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/
>
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: Free pre-built ASP.NET sites including
> Data Reports, E-commerce, Portals, and Forums are available now.
> Download today and enter to win an XBOX or Visual Studio .NET.
> http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/
> 01
> _______________________________________________
> ht://Dig Developer mailing list:
> htd...@li...
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-dev
>
|
|
From: Geoff H. <ghu...@ws...> - 2003-07-07 21:10:20
|
> Can anyone comment on what would be required to make the htdig 3.2b4 > code 64 bit clean ? I'm not familiar with issues on Solaris, but ht://Dig has long been "clean" on Alpha systems. So it should be 64-bit clean as-is. The Berkeley DB code is most definitely 64-bit clean and handles databases up to 4TB if you've got the hardware for it. > I am merging large databases, and the htmerge requires >4Gb memory so > I have had to compile using -m64 (Solaris 8, gcc 3.2.1) but had to > make some small changes in the code to make it compile, and to avoid a > segmentation error. OK, what changes did you make exactly? What snapshot did you use? Are you attempting to compile/use shared libraries? > I am indexing > 400,000 pages - any thoughts on whether htdig 3.3 can > scale to this ? I'm curious why this is causing problems. I know people who have 32-bit systems that easily handle 400,000+ pages. How big are your databases exactly? Are your problems limited to htmerge? (In which case, I likely know the problem, and it's not due to 64-bit addressing.) -Geoff -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: Neil K. <nk...@ma...> - 2003-07-07 17:52:17
|
Hi Lachlan, I was out of the office last week and couldn't devote any time to this = until today. Attached is stderr from running "htsearch 'words=3Dtest&format= =3Dlong'" -- Neil Kohl Manager, ACP Online nk...@ac... >>> Lachlan Andrew <lh...@us...> 06/29/03 07:02AM >>> Greetings Neil, I'm baffled... Could you please apply the attached patch and post the=20 (stderr) output? Thanks, Lachlan On Wed, 25 Jun 2003 01:08, Neil Kohl wrote: > It's possible to duplicate this running htsearch from the command > line; entering a query and a result format produces the same output > as in the test file: the output stops at '<form action=3D"'. There's > no core dump, nonzero exit or other indication that anything > failed. The output just stops. --=20 lh...@us...=20 ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Jim C. <li...@yg...> - 2003-07-07 17:49:30
|
On Monday, July 7, 2003, at 09:55 AM, Apostoly Guillaume wrote: > For several reasons, i'm including the results of htsearch into a jsp > page. > Because of this, i want the "next page " and "previous page" links > that are > generated by htdig be converted from "/cgi-bin/htdig" to something like > "/jsp/my_search.jsp". > I'm sure this is a common question but I wasn't able to search the > mailing > list archive (error on the server). You might want to look at the script_name attribute. See http://www.htdig.org/attrs.html#script_name Jim |
|
From: Neal R. <ne...@ri...> - 2003-07-07 16:25:00
|
On Sat, 5 Jul 2003, Lachlan Andrew wrote: > On Wed, 2 Jul 2003 04:24, Neal Richter wrote: > > > It is possible to create two different environments for the > > different DBs in the htdig classes that control them. > > True, but the "_weakcmpr" database is internally created by mp_cmpr.c > for any compressed database -- ht://Dig knows nothing about it. Of > course, we could *change* the API to pass in two database > environments (one for the database proper and one for _weakcmpr), but > that is far from fixing it at the API level. Ahh.. We could create a frunction in mp_cmpr.c which makes a copy of the current DBENV and use this copy for _weakcmpr. We would need to figure out which DBENV variable fixes the problem. There are around 25 elements to the DBENV structure.. a fair number of these a function pointers and it seems unlikely to be those. > Yes, it is exactly Loic's code (mp_cmpr.c) that I was proposing to > fix. In the past I hacked mp_alloc to try to work around the bug, > but I had planned to back those changes out once b5 was out and > there was time for a real fix. OK. If we confine ourselves to hacking mp_cmpr.c as much as possible that's cool. As long as we're on the same page here. (pun alert) > That sounds impressive. OK -- I'll stop tinkering with the database > code... Not sure you'll think its impressive when you see the < 20 line patch ;-) I'll post it to the group later today. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Apostoly G. <Apo...@ma...> - 2003-07-07 15:55:54
|
Hi all, For several reasons, i'm including the results of htsearch into a jsp page. Because of this, i want the "next page " and "previous page" links that are generated by htdig be converted from "/cgi-bin/htdig" to something like "/jsp/my_search.jsp". I'm sure this is a common question but I wasn't able to search the mailing list archive (error on the server). Thanks by advance for your help. Guillaume. |
|
From: Gabriele B. <an...@ti...> - 2003-07-06 23:06:30
|
Hi guys, I forward this message of mine regarding 'configure' process patch. Cheers, -Gabriele |
|
From: Gabriele B. <bar...@in...> - 2003-07-06 21:36:03
|
Ciao Lachlan! >Is there a reason to do this before 3.2.0b5 is released? No ... :-P > If you are going to recreate configure, remember that it has been > manually >hacked to work with OS X, ready for the 'imminent' release a month or >two ago. Well ... however that's gonna be something we'll have to face anyway. Unfortunately I can't do much on OS X systems as I don't have any computer with it; also on SF compile farm the Darwin shell has been removed! >Perhaps I should face reality about the release, but I'm still >hopeful(ish). I understand what you mean but I'd rather send a patch to the group. Also many things have changed as far as autotools are concerned and many MACROs we use or other techniques have been declared 'deprecated'. I refer in particular to: - acconfig.h which should be substituted by calls to AC_DEFINE(QUOTED) - the htconfig.h.in must be changed to htconfig.h and the default config.h file should be created by autoheader There are other small changes that should be applied but I don't wanna talk about them now. Only when I have a patch ready maybe. For your curiosity, I just changed the configure.in file on my laptop and I was able to run: - autoscan - aclocal - autoconf - autoheader - automake and ... finally ... autoreconf without getting any single warning. I hope I can provide a patch very soon (I just came back and I did not have any Internet connection and I just need to cross my fingers now for CVS update). :-) Ciao ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Prato, Tuscany, Italy bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Geoff H. <ghu...@us...> - 2003-07-06 07:14:25
|
STATUS of ht://Dig branch 3-2-x CHECKLIST FOR 3.2.0b5: * Apply memory leak patches (Neal) * Check bugs listed in bug-tracker... * Polish release docs (Geoff) * Must be able to (a) make check and (b) index www.htdig.org using "robotstxt_name: master-htdig" on all systems listed as "supported". Systems tested so far: - Mandrake 8.2, gcc 3.2 (lha, 21 May) - FreeBSD 4.6, gcc 2.95.3 (lha, 23 May) - Debian, Linux kernel 2.2.19, gcc 2.95.4 (lha, 23 May) - SunOS 5.8 = Solaris 2.8, gcc 3.1 (lha, 25 May) - SunOS 5.8 = Solaris 2.8, Sun cc with g++ 3.1 (lha, 29 May) - OS X (Jim, 30 May) Partly tested: - RedHat 8 (Jim, 1 June. make check requires tweaking for apache) - SunOS 5.8 = Solaris 2.8, gcc 2.95.2 (lha. Makes check minus apache, Digs small htdig.org. 27 May) - SunOS 5.8 = Solaris 2.8, Sun cc with g++ 2.95.2 (lha. Makes check minus apache, Digs small htdig.org. 2 June) - RedHat 7.3 (lha. Makes check minus apache. Digs small htdig.org. 25 May) - Alpha Debian (lha. Makes check minus apache. Digs small htdig.org. 25 May) To be tested: - HP-UX 10.20, gcc 2.8.1 (Jesse) - RedHat, other versions anyone? - RedHat AdvanceServer Itanium II (David Bannon) Known to have problems: - SGI/Irix 6.5.3 using SGI compilers <http://www.geocrawler.com/mail/msg.php3?msg_id=8025827&list=8825> RELEASES: 3.2.0b5: Next release, July 2003 3.2.0b4: "In progress" -- snapshots called "3.2.0b4" until prerelease. 3.2.0b3: Released: 22 Feb 2001. 3.2.0b2: Released: 11 Apr 2000. 3.2.0b1: Released: 4 Feb 2000. (Please note that everything added here should have a tracker PR# so we can be sure they're fixed. Geoff is currently trying to add PR#s for what's currently here.) SHOWSTOPPERS: * Mifluz database errors are a severe problem (PR#428295) -- Does Neal's new zlib patch solve this for now? KNOWN BUGS: * Odd behavior with $(MODIFIED) and scores not working with wordlist_compress set but work fine without wordlist_compress. (the date is definitely stored correctly, even with compression on so this must be some sort of weird htsearch bug) PR#618737. * META descriptions are somehow added to the database as FLAG_TITLE, not FLAG_DESCRIPTION. (PR#618738) Can anyone reproduce this? I can't! -- Lachlan Me either. Let's remove the PR. -Geoff PENDING PATCHES (available but need work): * Additional support for Win32. (Neal) * Memory improvements to htmerge. (Backed out b/c htword API changed.) * Mifluz merge. NEEDED FEATURES: * Quim's new htsearch/qtest query parser framework. * File/Database locking. PR#405764. TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Tests for new config file parser * Duplicate document detection while indexing * Major revisions to ExternalParser.cc, including fork/exec instead of popen, argument handling for parser/converter, allowing binary output from an external converter. * ExternalTransport needs testing of changes similar to ExternalParser. DOCUMENTATION: * List of supported platforms/compilers is ancient. (PR#405279) * Document all of htsearch's mappings of input parameters to config attributes to template variables. (Relates to PR#405278.) Should we make sure these config attributes are all documented in defaults.cc, even if they're only set by input parameters and never in the config file? * Split attrs.html into categories for faster loading. * Turn defaults.cc into an XML file for generating documentation and defaults.cc. * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. regex matching, database compression.) PRs# 405280 #405281. * TODO.html has not been updated for current TODO list and completions. I've tried. Someone "official" please check and remove this -- Lachlan * Htfuzzy could use more documentation on what each fuzzy algorithm does. PR#405714. * Document the list of all installed files and default locations. PR#405715. OTHER ISSUES: * Can htsearch actually search while an index is being created? * The code needs a security audit, esp. htsearch. PR#405765. |
|
From: Lachlan A. <lh...@us...> - 2003-07-06 00:42:38
|
On Wed, 2 Jul 2003 04:24, Neal Richter wrote: > It is possible to create two different environments for the > different DBs in the htdig classes that control them. True, but the "_weakcmpr" database is internally created by mp_cmpr.c=20 for any compressed database -- ht://Dig knows nothing about it. Of=20 course, we could *change* the API to pass in two database=20 environments (one for the database proper and one for _weakcmpr), but=20 that is far from fixing it at the API level. > I am hesitant to put lots of time into tweaking BDB code > directly. > 2)BDB is a VERY widely used piece of software and it is > incredibly likely that most of the problems we encounter can be > fixed at the BDB API level in our classes. This of course excludes > our (Loic's) hacks to have ZLIB page compression. Yes, it is exactly Loic's code (mp_cmpr.c) that I was proposing to=20 fix. In the past I hacked mp_alloc to try to work around the bug,=20 but I had planned to back those changes out once b5 was out and=20 there was time for a real fix. > The more we tweak the more we > diverge from stock BDB code and the more work we make for ourselves > long term. Agreed. > I used an STL hash and improved insertion time in the WordDB > considerably. Probably due to queing up all the inserts in larger > batches reduces overhead. That sounds impressive. OK -- I'll stop tinkering with the database=20 code... Cheers :) Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Lachlan A. <lh...@us...> - 2003-07-05 08:38:23
|
Greetings Gabriele, Is there a reason to do this before 3.2.0b5 is released? If you are=20 going to recreate configure, remember that it has been manually=20 hacked to work with OS X, ready for the 'imminent' release a month or=20 two ago. Perhaps I should face reality about the release, but I'm still=20 hopeful(ish). Lachlan On Sat, 5 Jul 2003 03:06, Gabriele Bartolini wrote: > I have a few proposals to do: > > - check for C++ standard library > - check for C++ namespaces --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Gabriele B. <bar...@in...> - 2003-07-04 17:06:28
|
At 22.23 24/06/2003 +1000, Lachlan Andrew wrote:
> > I agree with Geoff. I don't know though if at this time it is
> > better to wait for 3.2.0b5 to be out.
>
>Hear, hear! :)
Hi there,
I gave a look at autotools stuff (autoconf, autoheader, autoscan,
automake ...) and I must confess that now life is sooooooo much *easier*
than a couple of years ago.
First (thanks to my friend Marco's work) I could port configure scripts
to a clearer situation: I remember when I first introduced it to ht://Check
it was more a cut+paste operation, sniffing code from basically Loic's work
on ht://Dig.
Now, especially with autoscan and autoheader, the situation is much better.
Having said this, I have a few proposals to do: during this week-end I
will bring my laptop to my countryhouse (where I don't have the Internet so
I may not answer you) and try and hack the code in order to include at
least these 2 things:
- check for C++ standard library
- check for C++ namespaces
Essentially these two m4 macros (taken from the configure archive) will
define 2 preprocessor variables in config.h (we must get rid of acconfig.h
according to new autoconf's directives because it is deprecated) called
HAVE_STD and HAVE_NAMESPACES and I'll modify the code whenever iostream.h
(and similar like 'iomanip', etc.) is called.
Sounds good?
However, I really hope I can contribute this patch early next week.
Ciao and thanks
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
maintainer
Current Location: Prato, Tuscany, Italy
bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
Inferno
|
|
From: <sa...@sa...> - 2003-07-02 14:45:18
|
Can anyone comment on what would be required to make the htdig 3.2b4 code 64 bit clean ? I am merging large databases, and the htmerge requires >4Gb memory so I have had to compile using -m64 (Solaris 8, gcc 3.2.1) but had to make some small changes in the code to make it compile, and to avoid a segmentation error. My concern is that there may be other 64 bit issues in the code which are causing corruptions. It seems that when the database sizes get really large, my merge results are unpredictable. Has anyone had success with a 64 bit compilation, or can give me any pointers to where there could be addressing problems in the code ? Could I be hitting other limits (e.g. in Berkeley DB) ? I am indexing > 400,000 pages - any thoughts on whether htdig 3.3 can scale to this ? Thanks Sandy |
|
From: Jim C. <li...@yg...> - 2003-07-02 06:48:06
|
On Friday, June 27, 2003, at 08:40 PM, CHO...@sp... wrote: > Hi, > Like to enquire on how to change the fonts(size, colour) on the > search result page? I could only change the layout by editing the > header.html and footer.html however the font size and colour cannot be > changed. Have you also made the necessary changes to the result template(s)? Controlling the look and feel of the output is discussed in the FAQ section. See http://www.htdig.org/FAQ.html#q4.2. Combined, the header, footer, and template files give you near complete control over what your output looks like. Jim |
|
From: Neal R. <ne...@ri...> - 2003-07-01 20:09:09
|
On Sun, 29 Jun 2003, Lachlan Andrew wrote: > Greetings Neal, > > On Wed, 25 Jun 2003 02:35, Neal Richter wrote: > > > Do you know for certain that the environment is the same? > > Isn't that the meaning of passing the DB_ENV argument to > CDB_db_create()? I'm not sure how else to make two databases share > an environment. It is possible to create two different environments for the different DBs in the htdig classes that control them. > > It would be nice if I could duplicate this bug, but I've never been > > able to. > > As Geoff pointed out recently, anyone with a sourceforce account can > get a compile farm account. That has a mac which had the problem. > See > <http://sourceforge.net/docman/display_doc.php?docid=762&group_id=1>. Good point. > > This smells like something that should be handled in the WordDB > > class at the DB API level. > > Hmm... Perhaps, but the original aim of the compression was to make > it transparent. Since it creates its own database, that seems to me > to be the place to fix things. Maybe. I am hesitant to put lots of time into tweaking BDB code directly. 1)It makes moving to new versions of BDB harder 2)BDB is a VERY widely used piece of software and it is incredibly likely that most of the problems we encounter can be fixed at the BDB API level in our classes. This of course excludes our (Loic's) hacks to have ZLIB page compression. It comes down to treating the entire db directory as something we touch as little as possible.. and contain our code tweaking to mp_cmpr.c and a few other files. The more we tweak the more we diverge from stock BDB code and the more work we make for ourselves long term. We are 7 versions behind on BDB (we use 3.0.55). The changes/additions to the final 3.X (3.3.11) version are attractive. http://www.sleepycat.com/download/patchlogs.shtml > I agree that totally disabling the environment is a bit drastic. It > should be possible to create a new environment containing *only* the > weak compression database, but with all "shared" fields (other than > the memory pool) copied from the environment of the main database's > environment. How would that sound? Worth a try. > Regarding implementing a separate cache, that is certainly possible, > but it has the disadvantage of duplicating existing code. (I'm a big > fan of avoiding code bloat.) Yep. It looks like WordDBCache is supposed to do this, but it doesn't seem to be doing much. I used an STL hash and improved insertion time in the WordDB considerably. Probably due to queing up all the inserts in larger batches reduces overhead. > Cheers, > Lachlan > -- > lh...@us... > ht://Dig developer DownUnder (http://www.htdig.org) > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 > _______________________________________________ > htdig-dev mailing list > htd...@li... > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Neal R. <ne...@ri...> - 2003-06-29 17:55:38
|
On Sun, 29 Jun 2003, Lachlan Andrew wrote: > Good work, Neal! > > Does the _MSC_VER imply Microsoft C? If so, do you have plans to > extend the port to, say, Borland C++ or DJCPP? Not sure. It would be low priority. Definetly want to get to the other things on my list: Improve WordDB index format Integrate new searchng code from Quim Porter Stemming more libhtdig work (htmerge, htpurge, htfuzzy) Start coverting code to use STL Start making everything UTF-8 clean. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-06-29 11:03:07
|
Greetings Neil, I'm baffled... Could you please apply the attached patch and post the=20 (stderr) output? Thanks, Lachlan On Wed, 25 Jun 2003 01:08, Neil Kohl wrote: > It's possible to duplicate this running htsearch from the command > line; entering a query and a result format produces the same output > as in the test file: the output stops at '<form action=3D"'. There's > no core dump, nonzero exit or other indication that anything > failed. The output just stops. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Lachlan A. <lh...@us...> - 2003-06-29 10:38:51
|
Good work, Neal! Does the _MSC_VER imply Microsoft C? If so, do you have plans to=20 extend the port to, say, Borland C++ or DJCPP? Cheers, Lachlan On Fri, 27 Jun 2003 03:24, Neal Richter wrote: > =09Note I used "#ifdef _MSC_VER" rather than _WIN32. Our cygwin > users should remain unaffected. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Lachlan A. <lh...@us...> - 2003-06-29 10:36:24
|
Greetings Neal, On Wed, 25 Jun 2003 02:35, Neal Richter wrote: > Do you know for certain that the environment is the same? Isn't that the meaning of passing the DB_ENV argument to =20 CDB_db_create()? I'm not sure how else to make two databases share=20 an environment. > It would be nice if I could duplicate this bug, but I've never been > able to. As Geoff pointed out recently, anyone with a sourceforce account can=20 get a compile farm account. That has a mac which had the problem. =20 See=20 <http://sourceforge.net/docman/display_doc.php?docid=3D762&group_id=3D1>. > This smells like something that should be handled in the WordDB > class at the DB API level. Hmm... Perhaps, but the original aim of the compression was to make=20 it transparent. Since it creates its own database, that seems to me=20 to be the place to fix things. I agree that totally disabling the environment is a bit drastic. It=20 should be possible to create a new environment containing *only* the=20 weak compression database, but with all "shared" fields (other than=20 the memory pool) copied from the environment of the main database's=20 environment. How would that sound? Regarding implementing a separate cache, that is certainly possible,=20 but it has the disadvantage of duplicating existing code. (I'm a big=20 fan of avoiding code bloat.) Cheers, Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |