You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Gilles D. <gr...@sc...> - 2001-11-27 21:09:16
|
According to Joe R. Jah: > On Fri, 23 Nov 2001, Gilles Detillieux wrote: > > So, unless there are objections from other developers, I'm planning to > > put this code into 3.1.6's htdig/Document.cc next week, as well as > > eventually into 3.2.0b4's htlib/HtDateTime.cc, to clear up all the > > problems we've had. I think it will allow us to completely do away > > with strptime and mktime. > > > > I'd appreciate it if you'd have a look at this code and offer your > > critique. > > (How) can it be applied as a patch to the last/next snapshot? Like this... Use "patch -p0 < this-message" in the htdig-3.1.6 source directory from the latest snapshot to use the new date parsing code. I'll probably post it to CVS today or tomorrow. --- htdig/Document.cc.orig Fri Sep 14 09:21:05 2001 +++ htdig/Document.cc Tue Nov 27 15:06:08 2001 @@ -184,62 +184,206 @@ Document::Url(char *u) } -//***************************************************************************** -// time_t Document::getdate(char *datestring) -// Convert a RFC850 date string into a time value +#define EPOCH 1970 + +// +// time_t parsedate(char *date) +// - converts RFC850 or RFC1123 date string into a time value // time_t -Document::getdate(char *datestring) +parsedate(char *date) { - struct tm tm; - time_t ret; - char *s; + char *s; + int day, month, year, hour, minute, second; // // Two possible time designations: - // Tuesday, 01-Jul-97 16:48:02 GMT + // Tuesday, 01-Jul-97 16:48:02 GMT (RFC850) // or - // Thu, 01 May 1997 00:40:42 GMT + // Thu, 01 May 1997 00:40:42 GMT (RFC1123) // - // We strip off the weekday before sending to strptime + // We strip off the weekday because we don't need it, and // because some servers send invalid weekdays! // (Some don't even send a weekday, but we'll be flexible...) - - s = strchr(datestring, ','); - if (s) - s++; + + s = date; + while (*s && *s != ',') + s++; + if (*s) + s++; else - s = datestring; + s = date; while (isspace(*s)) - s++; - if (strchr(s, '-') && mystrptime(s, "%d-%b-%y %T", &tm) || - mystrptime(s, "%d %b %Y %T", &tm)) - { - // correct for mystrptime, if %Y format saw only a 2 digit year - if (tm.tm_year < 0) - tm.tm_year += 1900; - tm.tm_yday = 0; // clear these to prevent problems in strftime() - tm.tm_wday = 0; - - if (debug > 2) - { - cout << "Translated " << datestring << " to "; - char buffer[100]; - // Leave out %a for weekday, because we don't set it anymore... - //strftime(buffer, sizeof(buffer), "%a, %d %b %Y %T", &tm); - // Let's just do away with strftime() altogether for this... - //strftime(buffer, sizeof(buffer), "%d %b %Y %T", &tm); - sprintf(buffer, "%4d-%02d-%02d %02d:%02d:%02d", tm.tm_year+1900, - tm.tm_mon+1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec); - cout << buffer << " (" << tm.tm_year << ")" << endl; - } -#if HAVE_TIMEGM - ret = timegm(&tm); -#else - ret = mytimegm(&tm); -#endif - } - else + s++; + + // get day... + if (!isdigit(*s)) + return 0; + day = 0; + while (isdigit(*s)) + day = day * 10 + (*s++ - '0'); + if (day > 31) + return 0; + while (*s == '-' || isspace(*s)) + s++; + + // get month... + switch (*s++) { + case 'J': case 'j': + switch (*s++) { + case 'A': case 'a': + month = 1; + s++; + break; + case 'U': case 'u': + switch (*s++) { + case 'N': case 'n': + month = 6; + break; + case 'L': case 'l': + month = 7; + break; + default: + return 0; + } + break; + default: + return 0; + } + break; + case 'F': case 'f': + month = 2; + s += 2; + break; + case 'M': case 'm': + switch (*s++) { + case 'A': case 'a': + switch (*s++) { + case 'R': case 'r': + month = 3; + break; + case 'Y': case 'y': + month = 5; + break; + default: + return 0; + } + break; + default: + return 0; + } + break; + case 'A': case 'a': + switch (*s++) { + case 'P': case 'p': + month = 4; + s++; + break; + case 'U': case 'u': + month = 8; + s++; + break; + default: + return 0; + } + break; + case 'S': case 's': + month = 9; + s += 2; + break; + case 'O': case 'o': + month = 10; + s += 2; + break; + case 'N': case 'n': + month = 11; + s += 2; + break; + case 'D': case 'd': + month = 12; + s += 2; + break; + default: + return 0; + } + while (*s == '-' || isspace(*s)) + s++; + + // get year... + if (!isdigit(*s)) + return 0; + year = 0; + while (isdigit(*s)) + year = year * 10 + (*s++ - '0'); + if (year < 69) + year += 2000; + else if (year < 1900) + year += 1900; + else if (year >= 19100) // seen some programs do it, why not check? + year -= (19100-2000); + while (isspace(*s)) + s++; + + // get hour... + if (!isdigit(*s)) + return 0; + hour = 0; + while (isdigit(*s)) + hour = hour * 10 + (*s++ - '0'); + if (hour > 23) + return 0; + while (*s == ':' || isspace(*s)) + s++; + + // get minute... + if (!isdigit(*s)) + return 0; + minute = 0; + while (isdigit(*s)) + minute = minute * 10 + (*s++ - '0'); + if (minute > 59) + return 0; + while (*s == ':' || isspace(*s)) + s++; + + // get second... + if (!isdigit(*s)) + return 0; + second = 0; + while (isdigit(*s)) + second = second * 10 + (*s++ - '0'); + if (second > 59) + return 0; + while (*s == ':' || isspace(*s)) + s++; + + // + // Calculate date as seconds since 01 Jan 1970 00:00:00 GMT + // This is based somewhat on the date calculation code in NetBSD's + // cd9660_node.c code, for which I was unable to find a reference. + // It works, though! + // + return (time_t) (((((367L*year - 7L*(year+(month+9)/12)/4 + - 3L*(((year)+((month)+9)/12-1)/100+1)/4 + + 275L*(month)/9 + day) - + (367L*EPOCH - 7L*(EPOCH+(1+9)/12)/4 + - 3L*((EPOCH+(1+9)/12-1)/100+1)/4 + + 275L*1/9 + 1)) + * 24 + hour) * 60 + minute) * 60 + second); +} + + +//***************************************************************************** +// time_t Document::getdate(char *datestring) +// Convert a RFC850 date string into a time value +// +time_t +Document::getdate(char *datestring) +{ + time_t ret; + + ret = parsedate(datestring); + if (!ret) { if (debug > 2) { @@ -249,13 +393,12 @@ Document::getdate(char *datestring) ret = time(0); // This isn't the best, but it works. *fix* } if (debug > 2) - { - cout << "And converted to "; - struct tm *tm2 = gmtime(&ret); + { + struct tm *tm = gmtime(&ret); char buffer[100]; - strftime(buffer, sizeof(buffer), "%a, %d %b %Y %T", tm2); - cout << buffer << endl; - } + strftime(buffer, sizeof(buffer), "%a, %d %b %Y %T", tm); + cout << "Converted " << datestring << " to " << buffer << endl; + } return ret; } -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Michael C. <Mic...@ir...> - 2001-11-26 20:22:42
|
Hello, I'm running the htdig 2.7 binary on a 2.6 system. htdig and htmerge = seem to run perfectly fine, yet when I run htsearch from the web page I = get an error in my apache log (shows as a 500 error) [date time] [error] Unrecognized character \x7F at /opt/apache/cgi-bin/htsa= rch line 1, <IN> line 107 #------------------------------------------------# Taking a stab in the dark here it looks like apache is trying to run = htsearch as a cgi-script, which it isn't. Any way to get this to run???? thanks... Michael Clarke IRD Open Systems Team Level 4, Telecom House 13-27 Manners Street Wellington Phone: +64 (04) 8031423 Mobile: +64 021 455 218 email: mic...@ir... email: ma...@ir... |
From: Martin R. <m....@te...> - 2001-11-26 09:50:55
|
Hi @ all, I've got a problem again: In my index.html (contemporary start-url) i link all files like this: <a href="index2.php?h_sNextSite=something.php"></a> This has to be in that form because I get on this way some variables in the output-url like http://server/index2.php?h_sNextSite=something.php&variable1=something&v ariable2=somthing_else and so on. I get also a session-id as variable like ....&session=12345, but I don't want to have it. Can I exclude this piece of the url? It is always in form of $session=[the ID]. THANKS!!! Martin (m....@te...) |
From: Michael C. <Mic...@ir...> - 2001-11-26 00:33:12
|
What a pickle I am having.. Firstly I grabbed the latest beta of htdig 3.20b3 (or something-a-rather), = compiled and ran make, and it dies with string.cc errors. Was referred to = an older snapshot vversion, and this dies during configure (3.16). Grabbed = 3.15 and that says you need autoconf. Get autoconf, configure and make and = it says you need m4. Get m4, compile and make, dies. Then I grab a 2.7 package, untar and unzip, and place where I need items = to be placed. Change the conf file, and run htdig and htmerge, they seem = to run fine. SO at this stage I am digging http://infoweb/ and it does that fine, = htmegre does what it does fine, so now I go to search: http://myhost.htdig/search.html=20 and it returns an internal server error, logs state: [date time...][error] Out of memory during "large" request for 1052672 = bytes, total sbrk() is 6617464 bytes at /usr/local/lib/perl5/site_perl/5.6.= 1/sun4-solaris/Apache/Registry.pm line 103. Whats wrong? What I realllllly neeeeedd is for someone on a Solaris 2.6 or solaris 2.7 = system to make me a package for it. I need to dig on http://infoweb/ htdocs are in /webdocs/prodn/ apache is in /opt/apache/ cgi-bin is /opt/apache/cgi-bin/ htdig is in /opt/apache/htdig/ search is in /webdocs/prodn/htdig/search.html htsearch is in /opt/apache/cgi-bin/htsearch db is in /opt/apache/htdig/db common is in /opt/apache/htdig/common/ conf is in /opt/apache/htdig/conf/ bin is in /opt/apache/htdig/bin/ can any one make this for me or tell me a quick fix for my memory = problem???????=20 Thanks anyone - can they email a package or a tentative reply to ma...@ir...=20 thanks. Michael Clarke IRD Open Systems Team Level 4, Telecom House 13-27 Melling Street Wellington Phone: +64 (04) 8031423 Mobile: +64 021 455 218 email: mic...@ir... email: ma...@ir... |
From: Michael C. <Mic...@ir...> - 2001-11-25 21:01:59
|
Was there to be a new release of ht-dig today or yeserday (26-11-2001- NZ = time) that deals with gcc3.0 compilation problems (amongst others), Anyone = know when this will be available? Michael Clarke IRD Open Systems Team Level 4, Telecom House 13-27 Melling Street Wellington Phone: +64 (04) 8031423 Mobile: +64 021 455 218 email: mic...@ir... email: ma...@ir... |
From: Joe R. J. <jj...@cl...> - 2001-11-25 20:54:28
|
Hi Geoff, According to the ChangeLog file this snapshot was last changed on November 3, but Gilles indicated last week that he had committed several fixes and features to the CVS tree. Any ideas? Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Ionut N. <io...@ef...> - 2001-11-25 14:15:09
|
On Sat, 2001-11-24 at 06:44, Geoff Hutchison wrote: > The point should be made here that the attributes are no longer as > significant (and indeed obsolete in 3.2.0bX and later) because htsearch is > now doing The Right Thing (TM) and decoding/encoding *all* SGML entities > as appropriate. Ah, great ! So from 3.2 no more translations in htdig, right ? Only escapings in htsearch. Ionut Nistor io...@ef... |
From: Geoff H. <ghu...@us...> - 2001-11-25 08:13:20
|
STATUS of ht://Dig branch 3-2-x RELEASES: 3.2.0b4: In progress 3.2.0b3: Released: 22 Feb 2001. 3.2.0b2: Released: 11 Apr 2000. 3.2.0b1: Released: 4 Feb 2000. SHOWSTOPPERS: KNOWN BUGS: * Odd behavior with $(MODIFIED) and scores not working with wordlist_compress set but work fine without wordlist_compress. (the date is definitely stored correctly, even with compression on so this must be some sort of weird htsearch bug) * Not all htsearch input parameters are handled properly: PR#648. Use a consistant mapping of input -> config -> template for all inputs where it makes sense to do so (everything but "config" and "words"?). * If exact isn't specified in the search_algorithms, $(WORDS) is not set correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can we fix this?) * META descriptions are somehow added to the database as FLAG_TITLE, not FLAG_DESCRIPTION. (PR#859) PENDING PATCHES (available but need work): * Additional support for Win32. * Memory improvements to htmerge. (Backed out b/c htword API changed.) * MySQL patches to 3.1.x to be forward-ported and cleaned up. (Should really only attempt to use SQL for doc_db and related, not word_db) NEEDED FEATURES: * Field-restricted searching. * Return all URLs. * Handle noindex_start & noindex_end as string lists. * Handle local_urls through file:// handler, for mime.types support. * Handle directory redirects in RetrieveLocal. * Merge with mifluz TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Turn on URL parser test as part of test suite. * htsearch phrase support tests * Tests for new config file parser * Duplicate document detection while indexing * Major revisions to ExternalParser.cc, including fork/exec instead of popen, argument handling for parser/converter, allowing binary output from an external converter. * ExternalTransport needs testing of changes similar to ExternalParser. DOCUMENTATION: * List of supported platforms/compilers is ancient. * Add thorough documentation on htsearch restrict/exclude behavior (including '|' and regex). * Document all of htsearch's mappings of input parameters to config attributes to template variables. (Relates to PR#648.) Also make sure these config attributes are all documented in defaults.cc, even if they're only set by input parameters and never in the config file. * Split attrs.html into categories for faster loading. * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. phrase searching, regex matching, external parsers and transport methods, database compression.) * TODO.html has not been updated for current TODO list and completions. OTHER ISSUES: * Can htsearch actually search while an index is being created? (Does Loic's new database code make this work?) * The code needs a security audit, esp. htsearch * URL.cc tries to parse malformed URLs (which causes further problems) (It should probably just set everything to empty) This relates to PR#348. |
From: Joe R. J. <jj...@cl...> - 2001-11-24 09:03:14
|
On Fri, 23 Nov 2001, Geoff Hutchison wrote: > Date: Fri, 23 Nov 2001 23:42:26 -0500 (EST) > From: Geoff Hutchison <ghu...@ws...> > To: Gilles Detillieux <gr...@sc...> > Cc: "ht://Dig developers list" <htd...@li...> > Subject: Re: [htdig-dev] to-do list for 3.1.6 > > On Fri, 23 Nov 2001, Gilles Detillieux wrote: > > I'm at my parent's place for Thanksgiving and so I haven't had a chance to > go through all the mail. I'll be back up to speed on Sunday. but a few > points: > > > 2. support for compiling with gcc 3.0 > > This should already be taken care of. The changes were v. minor. At the > least, it compiles with gcc 3.0.x on Linux. > > > 3. support for description_meta_tag_names attribute > > 4. handle noindex_start & noindex_end as string lists > > 5. a "match all documents" mechanism in htsearch > > 6. a way of specifying relative date ranges in htsearch > > 7. release notes > > 8. merge maindocs updates into htdoc and vice versa > > > Geoff, I hope you can do something about the configure tests, because I'm > > I think based on what Joe has said about the new 3.2 configure tests, I > feel OK about trying to backport this. The new Solaris issue is pretty > minor, but aggrivating. I was not precise in that message; Steps I took were slightly different from FAQ#5.14; here's what I did: . Removed htlib/regex.c . Removed htlib/regex.h . Removed references to regex.o in htlib/Makefile (in 3.1.6-1111-1) . Removed references to regex.h in htlib/Makefile (in 3.2.0b4-1111-1) Also attached is a list of error and warning lines in the config.log of 3.1.6-111101, just in case;) > > Finally, 7 and 8 are standard release-time procedures, which I'd really > > like to hand off to you, Geoff, if I can. > > I'm sure I can manage those. I'll start on some of these when I get back. > > -Geoff Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Geoff H. <ghu...@ws...> - 2001-11-24 04:50:27
|
On Fri, 23 Nov 2001, Gilles Detillieux wrote: > translated characters are reencoded as SGML entities in search results, > so in 3.1.6 the attributes are turned on by default, and 3.2.0b3 gets > rid of these attributes altogether. The point should be made here that the attributes are no longer as significant (and indeed obsolete in 3.2.0bX and later) because htsearch is now doing The Right Thing (TM) and decoding/encoding *all* SGML entities as appropriate. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
From: Geoff H. <ghu...@ws...> - 2001-11-24 04:48:18
|
On Fri, 23 Nov 2001, Gilles Detillieux wrote: I'm at my parent's place for Thanksgiving and so I haven't had a chance to go through all the mail. I'll be back up to speed on Sunday. but a few points: > 2. support for compiling with gcc 3.0 This should already be taken care of. The changes were v. minor. At the least, it compiles with gcc 3.0.x on Linux. > 3. support for description_meta_tag_names attribute > 4. handle noindex_start & noindex_end as string lists > 5. a "match all documents" mechanism in htsearch > 6. a way of specifying relative date ranges in htsearch > 7. release notes > 8. merge maindocs updates into htdoc and vice versa > Geoff, I hope you can do something about the configure tests, because I'm I think based on what Joe has said about the new 3.2 configure tests, I feel OK about trying to backport this. The new Solaris issue is pretty minor, but aggrivating. > Finally, 7 and 8 are standard release-time procedures, which I'd really > like to hand off to you, Geoff, if I can. I'm sure I can manage those. I'll start on some of these when I get back. -Geoff |
From: Joe R. J. <jj...@cl...> - 2001-11-24 02:19:47
|
On Fri, 23 Nov 2001, Gilles Detillieux wrote: > Date: Fri, 23 Nov 2001 17:58:31 -0600 (CST) > From: Gilles Detillieux <gr...@sc...> > To: "ht://Dig developers list" <htd...@li...> > Subject: [htdig-dev] workaround for mktime/strptime/LC_TIME problems > > So, unless there are objections from other developers, I'm planning to > put this code into 3.1.6's htdig/Document.cc next week, as well as > eventually into 3.2.0b4's htlib/HtDateTime.cc, to clear up all the > problems we've had. I think it will allow us to completely do away > with strptime and mktime. > > I'd appreciate it if you'd have a look at this code and offer your > critique. (How) can it be applied as a patch to the last/next snapshot? Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Gilles D. <gr...@sc...> - 2001-11-23 23:58:37
|
OK, after getting increasingly fed up with all the problems we've had over the past few years with parsing dates in HTTP headers, using mktime, strptime, and a whole mess of different variations of format strings that don't quite always work on all systems, I've decided to go back to square one and write my own code to do this without any help from C library functions that aren't quite portable enough. It's not pretty, but it should solve a lot of the portability problems we've had. It should also make it possible to remove the hack in the locale handling, where we set LC_TIME back to the "C" locale, which had been known to cause locale problems on some systems. So, unless there are objections from other developers, I'm planning to put this code into 3.1.6's htdig/Document.cc next week, as well as eventually into 3.2.0b4's htlib/HtDateTime.cc, to clear up all the problems we've had. I think it will allow us to completely do away with strptime and mktime. I'd appreciate it if you'd have a look at this code and offer your critique. #include <ctype.h> #include <time.h> #define EPOCH 1970 // // time_t parsedate(char *date) // - converts RFC850 or RFC1123 date string into a time value // time_t parsedate(char *date) { char *s; int day, month, year, hour, minute, second; // // Two possible time designations: // Tuesday, 01-Jul-97 16:48:02 GMT (RFC850) // or // Thu, 01 May 1997 00:40:42 GMT (RFC1123) // // We strip off the weekday because we don't need it, and // because some servers send invalid weekdays! // (Some don't even send a weekday, but we'll be flexible...) s = date; while (*s && *s != ',') s++; if (*s) s++; else s = date; while (isspace(*s)) s++; // get day... if (!isdigit(*s)) return 0; day = 0; while (isdigit(*s)) day = day * 10 + (*s++ - '0'); if (day > 31) return 0; while (*s == '-' || isspace(*s)) s++; // get month... switch (*s++) { case 'J': case 'j': switch (*s++) { case 'A': case 'a': month = 1; s++; break; case 'U': case 'u': switch (*s++) { case 'N': case 'n': month = 6; break; case 'L': case 'l': month = 7; break; default: return 0; } break; default: return 0; } break; case 'F': case 'f': month = 2; s += 2; break; case 'M': case 'm': switch (*s++) { case 'A': case 'a': switch (*s++) { case 'R': case 'r': month = 3; break; case 'Y': case 'y': month = 5; break; default: return 0; } break; default: return 0; } break; case 'A': case 'a': switch (*s++) { case 'P': case 'p': month = 4; s++; break; case 'U': case 'u': month = 8; s++; break; default: return 0; } break; case 'S': case 's': month = 9; s += 2; break; case 'O': case 'o': month = 10; s += 2; break; case 'N': case 'n': month = 11; s += 2; break; case 'D': case 'd': month = 12; s += 2; break; default: return 0; } while (*s == '-' || isspace(*s)) s++; // get year... if (!isdigit(*s)) return 0; year = 0; while (isdigit(*s)) year = year * 10 + (*s++ - '0'); if (year < 69) year += 2000; else if (year < 1900) year += 1900; else if (year >= 19100) // seen some programs do it, why not check? year -= (19100-2000); while (isspace(*s)) s++; // get hour... if (!isdigit(*s)) return 0; hour = 0; while (isdigit(*s)) hour = hour * 10 + (*s++ - '0'); if (hour > 23) return 0; while (*s == ':' || isspace(*s)) s++; // get minute... if (!isdigit(*s)) return 0; minute = 0; while (isdigit(*s)) minute = minute * 10 + (*s++ - '0'); if (minute > 59) return 0; while (*s == ':' || isspace(*s)) s++; // get second... if (!isdigit(*s)) return 0; second = 0; while (isdigit(*s)) second = second * 10 + (*s++ - '0'); if (second > 59) return 0; while (*s == ':' || isspace(*s)) s++; // // Calculate date as seconds since 01 Jan 1970 00:00:00 GMT // This is based somewhat on the date calculation code in NetBSD's // cd9660_node.c code, for which I was unable to find a reference. // It works, though! // return (time_t) (((((367L*year - 7L*(year+(month+9)/12)/4 - 3L*(((year)+((month)+9)/12-1)/100+1)/4 + 275L*(month)/9 + day) - (367L*EPOCH - 7L*(EPOCH+(1+9)/12)/4 - 3L*((EPOCH+(1+9)/12-1)/100+1)/4 + 275L*1/9 + 1)) * 24 + hour) * 60 + minute) * 60 + second); } #ifdef TEST #include <stdio.h> main() { char buf[100]; time_t t; while (fgets(buf, sizeof(buf), stdin)) { t = parsedate(buf); fputs(ctime(&t), stdout); } return 0; } #endif -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-11-23 23:16:36
|
OK, I think 3.1.6 is very solid and stable now, but there are some finishing touches it needs. Here's where I really have to plead for help, because I really can't put any more time into this than I have already. Here's my to-do list for 3.1.6. 1. better configure tests for: - regex problems on BSDi - getpeername length argument type on Solaris - avoid C library mktime() on OpenBSD 2. support for compiling with gcc 3.0 3. support for description_meta_tag_names attribute 4. handle noindex_start & noindex_end as string lists 5. a "match all documents" mechanism in htsearch 6. a way of specifying relative date ranges in htsearch 7. release notes 8. merge maindocs updates into htdoc and vice versa Geoff, I hope you can do something about the configure tests, because I'm really out of my element with that. I do have an alternate workaround for the mktime problem, though, which I'll propose in another message. If I understand correctly, Geoff, you do now have a system with gcc 3.0, and are planning to do that part too (or is this done already?). The description_meta_tag_names attribute is a pretty simple addition, which I might do because it sounds like a good idea that's been requested more than once, but if I can't do it before release I see it as expendible. 4 to 6 are expendible too, but again they're good ideas that have been very frequently requested (5 & 6 together would make a dandy "what's new" facility). Finally, 7 and 8 are standard release-time procedures, which I'd really like to hand off to you, Geoff, if I can. So, what do you think? Any volunteers for 3 to 6? If not, should we just forget about them? Geoff, are you willing/able to do the other items? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-11-23 21:59:18
|
According to Ionut Nistor: > On Fri, 2001-11-23 at 20:18, Gilles Detillieux wrote: > > According to Ionut Nistor: > > > htdig does supports (afaik) 3 translations: > > > > > > 1. lg & gt (< >) > > > 2. amp (&) > > > 3. quot (") > > > > That's actually 4 - lt & gt are 2 separate entities. htdig also handles > > ™ (153) in the 3.1.x code, which I think is non-standard, and the > > full ISO Latin 1 set of entities from 160 to 255 ( - ÿ) in > > both 3.1.x and 3.2 betas. > Did not know about those - I assume they are done by default as there is > no config option for them. Yes. They all used to be done by default. The config attributes that turn off lt, gt, amp and quot were added as an afterthought because the translated characters caused problems in search results. As of 3.1.5, the translated characters are reencoded as SGML entities in search results, so in 3.1.6 the attributes are turned on by default, and 3.2.0b3 gets rid of these attributes altogether. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Ionut N. <io...@ef...> - 2001-11-23 21:08:45
|
On Fri, 2001-11-23 at 20:18, Gilles Detillieux wrote: > According to Ionut Nistor: > > > > htdig does supports (afaik) 3 translations: > > > > 1. lg & gt (< >) > > 2. amp (&) > > 3. quot (") > > That's actually 4 - lt & gt are 2 separate entities. htdig also handles > ™ (153) in the 3.1.x code, which I think is non-standard, and the > full ISO Latin 1 set of entities from 160 to 255 ( - ÿ) in > both 3.1.x and 3.2 betas. Did not know about those - I assume they are done by default as there is no config option for them. > > > > Is it possible/desirable ? > > I'm inclined to agree that the 2nd approach is better. htdig currently I thought so too. It would seem logical not to have numerous conversions in the htdig, but to handle escapes instead (they ca > The problem with not translating is it would make word matches more > difficult, when words have accented character entities embedded in them. Maybe trying to escape HTML/XHTML style the words would help ? No, problems will probably appear with some other (non-HTML) files that were indexed and which may use some other encodings. > The entities would probably have to be translated to Unicode or UTF-8 > for word matching, and search words would have to be similarly encoded. > All of this would entail major rewriting of htdig and htsearch! So, yes, > it is desirable, and possible if we have the volunteers to do it (which > we don't right now), but not simple and straightforward. Hmm.. you are right. Will think about it too - also look through the code a bit (I really don't know much about how it works - just installed it first 2 weeks ago). > > The current approach works for the most part, but is not ideal. Support > for the ' entity would be easy to add, but all the other new entities > in XHTML define characters above 255, so they won't work in the current > 8-bit only, locale-specific approach. Yup, that's why the 2nd approach would seem simple - though the maching becomes a problem. Thanks for the reply. Ionut Nistor io...@ef... |
From: Gilles D. <gr...@sc...> - 2001-11-23 18:19:07
|
According to Ionut Nistor: > I have posted a (wish)bug a couple of days ago regarding HTML > translations performed by htdig (#484345). I should ave brought the > issue on the list first (as Gilles Detillieux suggessted), so I'll just > bring up the issue now. > > htdig does supports (afaik) 3 translations: > > 1. lg & gt (< >) > 2. amp (&) > 3. quot (") That's actually 4 - lt & gt are 2 separate entities. htdig also handles ™ (153) in the 3.1.x code, which I think is non-standard, and the full ISO Latin 1 set of entities from 160 to 255 ( - ÿ) in both 3.1.x and 3.2 betas. > However, there are some more escapes that I think would be helpful to > have. > > For instance, ' (apostrophe '). > Gilles said ' is not supported in HTML - that is correct; however, > xhtml1.0 brings in XML well formed documents - in XML, you cannot use ' > - ' is escaped as ' > > XHTML1.0 notes can be found at: http://www.w3.org/TR/xhtml1/ > look at A2. Entity sets - special characters > http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent Now, there are some references we can sink our teeth into! Thanks. > The problem is that there are many more escape sequences (in the > &something; style); there are some ways to do it > 1. by having a translation table - in which case htdig will translate > everything there so that htsearch will not misescape them while > displaying results (e.g. from a XHTML source file which has say € > when searching the browser will display € instead of the euro sign > cause htsearch escapes € into &euro;). > 2. Eliminate translations from htdig; htsearch will have to stop > escaping what is found in the DB in the &something; form. > > I think the second way is better. > > I'm not sure if I explained clearly I'll try to explain again if > necessary. > > Is it possible/desirable ? I'm inclined to agree that the 2nd approach is better. htdig currently uses the first approach, which is better for database size, but there are a few problems. First of all, htsearch can't distinguish between what text was translated from entities, and what was originally entered as a single character, so it sometimes gets them wrong in results. This problem is compounded by the fact that it only uses an 8-bit encoding, so when mixing documents with different encodings, mixups occur. The problem with not translating is it would make word matches more difficult, when words have accented character entities embedded in them. The entities would probably have to be translated to Unicode or UTF-8 for word matching, and search words would have to be similarly encoded. All of this would entail major rewriting of htdig and htsearch! So, yes, it is desirable, and possible if we have the volunteers to do it (which we don't right now), but not simple and straightforward. The current approach works for the most part, but is not ideal. Support for the ' entity would be easy to add, but all the other new entities in XHTML define characters above 255, so they won't work in the current 8-bit only, locale-specific approach. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Ionut N. <io...@ef...> - 2001-11-23 15:54:54
|
Hello I have posted a (wish)bug a couple of days ago regarding HTML translations performed by htdig (#484345). I should ave brought the issue on the list first (as Gilles Detillieux suggessted), so I'll just bring up the issue now. htdig does supports (afaik) 3 translations: 1. lg & gt (< >) 2. amp (&) 3. quot (") However, there are some more escapes that I think would be helpful to have. For instance, ' (apostrophe '). Gilles said ' is not supported in HTML - that is correct; however, xhtml1.0 brings in XML well formed documents - in XML, you cannot use ' - ' is escaped as ' XHTML1.0 notes can be found at: http://www.w3.org/TR/xhtml1/ look at A2. Entity sets - special characters http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent The problem is that there are many more escape sequences (in the &something; style); there are some ways to do it 1. by having a translation table - in which case htdig will translate everything there so that htsearch will not misescape them while displaying results (e.g. from a XHTML source file which has say € when searching the browser will display € instead of the euro sign cause htsearch escapes € into &euro;). 2. Eliminate translations from htdig; htsearch will have to stop escaping what is found in the DB in the &something; form. I think the second way is better. I'm not sure if I explained clearly I'll try to explain again if necessary. Is it possible/desirable ? Thank you, Ionut Nistor io...@ef... |
From: Gilles D. <gr...@sc...> - 2001-11-22 20:40:14
|
According to Michael Clarke: > Ok, I have encountered a problem when compiling on Solaris 2.6. ... > c++ -c -DDEFAULT_CONFIG_FILE=\"/opt/www/htdig/conf/htdig.conf\" -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 String.cc > In file included from /usr/local/include/g++-v3/backward/stream.h:32, > from String.cc:17: > /usr/local/include/g++-v3/backward/iostream.h:35: using directive `ostream' > introduced ambiguous type `ostream' > htString.h: In function `std::ostream& operator<<(std::ostream&, String&)': > htString.h:159: `char*String::Data' is private > String.cc:544: within this context > String.cc: At global scope: > String.cc:619: prototype for `void String::debug(std::ostream&)' does not match > any in class `String' > htString.h:127: candidate is: void String::debug(ostream&) > make[1]: *** [String.o] Error 1 > make[1]: Leaving directory `/jumpstart/tmp/intra-servers/intra-servers/htdig/htdig-3.1.5/htlib' > make: *** [all] Error 1 I suggest you try the 3.1.6 snapshot in http://www.htdig.org/files/snapshots/ It should fix the problems with the definitions of streams. If you can wait till this coming Sunday's snapshot, you'll also get a few more bug fixes from this past week. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-11-22 16:45:38
|
According to Bernier, Melanie: > > I have installed htdig and I have a little problem with German Umlaut= . I > > can search for words with Umlaut without any problem. When I search = for > > say 'C34644' (a file containing Umlaut), the results from htdig comes= back > > with strange characters instead of Umlaut (for example, I get a circl= e (=D8) > > instead of =FC, or I get a bit =C4 instead of a small =E4), and it se= ems to > > return that kind of results only for word documents. What could be t= he > > problem? The problem is MS Word doesn't use ISO-8859-1 (Latin 1) encoding for characters with accents. The doc2html.pl script uses catdoc to decode the Word documents into plain text, which works fine for ASCII text, but when accents are involved it doesn't automatically map to the encoding you want. With catdoc, you have -s and -d options to specify the source and destination character sets. I've found that by using catdoc -scp1250 -d8859-1 file.doc I can get accents to come out correctly on one of the few Word documents I have that contain accents. This document happens to use cp1250 as its internal character set. You may need to experiment to find the right options for your documents. When you figure out the right options, you can put them into the command line for catdoc in doc2html.pl. --=20 Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~g= rdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Michael C. <Mic...@ir...> - 2001-11-22 01:26:08
|
No need to read on if someone can: Compile htdig for solaris 2.6 and help with a link to download from. = htdocs dir is /webdocs/prodn/, apache is /opt/apache/cgi-bin/, htdocs = install dir /opt/apache/htdig/ Anyway, if no one can do that then read on: Ok, I have encountered a problem when compiling on Solaris 2.6. Configure = runs perfect, then when it comes to make, I get the following (Am going = into full detail here): ###########################################################################= ###### ./configure=20 loading cache ./config.cache checking for a BSD compatible install... ./install-sh -c checking whether build environment is sane... yes checking whether make sets ${MAKE}... yes checking for working aclocal... missing checking for working autoconf... missing checking for working automake... missing checking for working autoheader... missing checking for working makeinfo... missing configuring ht://Dig version 3.1.5 checking for gcc... gcc checking whether the C compiler (gcc ) works... yes checking whether the C compiler (gcc ) is a cross-compiler... no checking whether we are using GNU C... yes checking whether gcc accepts -g... yes checking for c++... c++ checking whether the C++ compiler (c++ ) works... yes checking whether the C++ compiler (c++ ) is a cross-compiler... yes checking whether we are using GNU C++... yes checking whether c++ accepts -g... yes checking for ranlib... ranlib checking for ar... /usr/ccs/bin//ar checking for sh... /bin/sh checking for sed... /usr/bin/sed checking for sort... /usr/bin/sort checking for find... /usr/bin/find checking for gunzip... /usr/local/bin//gunzip checking for tar... tar checking for acroread... /usr/local/bin/acroread checking for sendmail... /usr/lib/sendmail checking how to run the C preprocessor... gcc -E checking for AIX... no checking for socket in -lsocket... yes checking for t_accept in -lnsl... yes checking for deflate in -lz... yes checking for ANSI C header files... yes checking whether time.h and sys/time.h may both be included... yes checking how to run the C++ preprocessor... c++ -E checking for fcntl.h... yes checking for limits.h... yes checking for malloc.h... yes checking for sys/file.h... yes checking for sys/ioctl.h... yes checking for sys/time.h... yes checking for unistd.h... yes checking for getopt.h... no checking for strings.h... yes checking for zlib.h... yes checking for alloca.h... yes checking for sys/select.h... yes checking for fstream.h... yes checking for working const... yes checking whether struct tm is in sys/time.h or time.h... time.h checking for strdup... yes checking for strerror... yes checking for strstr... yes checking for localtime_r... yes checking for timegm... no checking whether we need gethostname() prototype?... yes checking how to call getpeername?... int updating cache ./config.cache creating ./config.status creating CONFIG creating Makefile creating Makefile.config creating htcommon/Makefile creating htlib/Makefile creating htdig/Makefile creating htmerge/Makefile creating htnotify/Makefile creating htfuzzy/Makefile creating htsearch/Makefile creating makedp creating include/htconfig.h include/htconfig.h is unchanged configuring in db/dist running /bin/sh ./configure --cache-file=3D../.././config.cache = --srcdir=3D. loading cache ../.././config.cache checking if building in the top-level directory... checking for a BSD = compatible install... ./install-sh -c checking host system type... sparc-sun-solaris2.6 checking if --enable-debug option specified... no checking for cc... gcc checking for gcc... gcc checking for gcc... gcc checking whether the C compiler (gcc -O ) works... yes checking whether the C compiler (gcc -O ) is a cross-compiler... no checking whether we are using GNU C... yes checking whether gcc accepts -g... yes checking if --enable-diagnostic option specified... no checking if --enable-cxx option specified... no checking if --enable-compat185 option specified... no checking if --enable-dump185 option specified... no checking for ar... /usr/ccs/bin//ar checking for chmod... /usr/bin/chmod checking for cp... /usr/bin/cp checking for mkdir... /usr/bin/mkdir checking for ranlib... /usr/ccs/bin//ranlib checking for rm... /usr/bin/rm checking for sh... /sbin/sh checking for strip... /usr/ccs/bin//strip checking how to run the C preprocessor... gcc -E checking for ANSI C header files... yes checking for ssize_t... yes checking whether byte ordering is bigendian... yes checking for working const... yes checking for st_blksize in struct stat... yes checking whether stat file-mode macros are broken... no checking for mode_t... yes checking for off_t... yes checking for pid_t... yes checking for size_t... yes checking for dirent.h that defines DIR... yes checking for opendir in -ldir... no checking for sys/select.h... yes checking for sys/time.h... yes checking for getcwd... yes checking for getopt... yes checking for memcmp... yes checking for memcpy... yes checking for memmove... yes checking for raise... yes checking for snprintf... yes checking for strerror... yes checking for strsep... no checking for vsnprintf... yes checking for getuid... yes checking for pread... yes checking for pstat_getdynamic... no checking for sysconf... yes checking for shmget... yes checking for mmap... yes checking for munmap... yes checking for qsort... yes checking for select... yes checking for sigfillset... yes checking for int type sprintf return value... yes checking if --disable-bigfile option specified... no checking for spinlocks... solaris/func checking for u_char... yes checking for u_short... yes checking for u_int... yes checking for u_long... yes checking for u_int8_t... unsigned char checking for u_int16_t... unsigned short checking for int16_t... yes checking for u_int32_t... unsigned int checking for int32_t... yes checking if --enable-test option specified... no creating ./config.status creating Makefile creating include.tcl creating db.h creating db_int.h creating db_185.h creating config.h config.h is unchanged Now you must run 'make' followed by 'make install' prodn galadriel: [/jumpstart/tmp/intra-servers/intra-servers/htdig/htdig-3.= 1.5] > make make[1]: Entering directory `/jumpstart/tmp/intra-servers/intra-servers/htd= ig/htdig-3.1.5/db/dist' gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_compare.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_conv.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_curadj.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_cursor.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_delete.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_open.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_page.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_put.ctche checking if building in the gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_rec.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_recno.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_rsearch.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_search.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_split.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/bt_stat.c gcc -c -O -I. -I./../include -D_REENTRANT ../btree/btree_auto.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db.c gcc -c -O -I. -I./../include -D_REENTRANT ../common/db_appinit.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_am.c gcc -c -O -I. -I./../include -D_REENTRANT ../common/db_apprec.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_auto.c gcc -c -O -I. -I./../include -D_REENTRANT ../common/db_byteorder.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_conv.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_dispatch.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_dup.c gcc -c -O -I. -I./../include -D_REENTRANT ../common/db_err.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_iface.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_join.c gcc -c -O -I. -I./../include -D_REENTRANT ../common/db_log2.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_overflow.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_pr.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_rec.c gcc -c -O -I. -I./../include -D_REENTRANT ../common/db_region.c gcc -c -O -I. -I./../include -D_REENTRANT ../db/db_ret.c gcc -c -O -I. -I./../include -D_REENTRANT ../common/db_salloc.c gcc -c -O -I. -I./../include -D_REENTRANT ../common/db_shash.c gcc -c -O -I. -I./../include -D_REENTRANT ../dbm/dbm.c gcc -c -O -I. -I./../include -D_REENTRANT ../hash/hash.c gcc -c -O -I. -I./../include -D_REENTRANT ../hash/hash_auto.c gcc -c -O -I. -I./../include -D_REENTRANT ../hash/hash_conv.c gcc -c -O -I. -I./../include -D_REENTRANT ../hash/hash_dup.c gcc -c -O -I. -I./../include -D_REENTRANT ../hash/hash_func.c gcc -c -O -I. -I./../include -D_REENTRANT ../hash/hash_page.c gcc -c -O -I. -I./../include -D_REENTRANT ../hash/hash_rec.c gcc -c -O -I. -I./../include -D_REENTRANT ../hash/hash_stat.c gcc -c -O -I. -I./../include -D_REENTRANT ../hsearch/hsearch.c gcc -c -O -I. -I./../include -D_REENTRANT ../lock/lock.c gcc -c -O -I. -I./../include -D_REENTRANT ../lock/lock_conflict.c gcc -c -O -I. -I./../include -D_REENTRANT ../lock/lock_deadlock.c gcc -c -O -I. -I./../include -D_REENTRANT ../lock/lock_util.c gcc -c -O -I. -I./../include -D_REENTRANT ../lock/lock_region.c gcc -c -O -I. -I./../include -D_REENTRANT ../log/log.c gcc -c -O -I. -I./../include -D_REENTRANT ../log/log_archive.c gcc -c -O -I. -I./../include -D_REENTRANT ../log/log_auto.c gcc -c -O -I. -I./../include -D_REENTRANT ../log/log_compare.c gcc -c -O -I. -I./../include -D_REENTRANT ../log/log_findckp.c gcc -c -O -I. -I./../include -D_REENTRANT ../log/log_get.c gcc -c -O -I. -I./../include -D_REENTRANT ../log/log_put.c gcc -c -O -I. -I./../include -D_REENTRANT ../log/log_rec.c gcc -c -O -I. -I./../include -D_REENTRANT ../log/log_register.c gcc -c -O -I. -I./../include -D_REENTRANT ../mp/mp_bh.c gcc -c -O -I. -I./../include -D_REENTRANT ../mp/mp_fget.c gcc -c -O -I. -I./../include -D_REENTRANT ../mp/mp_fopen.c gcc -c -O -I. -I./../include -D_REENTRANT ../mp/mp_fput.c gcc -c -O -I. -I./../include -D_REENTRANT ../mp/mp_fset.c gcc -c -O -I. -I./../include -D_REENTRANT ../mp/mp_open.c gcc -c -O -I. -I./../include -D_REENTRANT ../mp/mp_pr.c gcc -c -O -I. -I./../include -D_REENTRANT ../mp/mp_region.c gcc -c -O -I. -I./../include -D_REENTRANT ../mp/mp_sync.c gcc -c -O -I. -I./../include -D_REENTRANT ../mutex/mutex.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_abs.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_alloc.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_config.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_dir.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_fid.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_fsync.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_map.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_oflags.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_open.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_rpath.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_rw.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_seek.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_sleep.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_spin.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_stat.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_tmpdir.c gcc -c -O -I. -I./../include -D_REENTRANT ../os/os_unlink.c gcc -c -O -I. -I./../include -D_REENTRANT ../txn/txn.c gcc -c -O -I. -I./../include -D_REENTRANT ../txn/txn_auto.c gcc -c -O -I. -I./../include -D_REENTRANT ../txn/txn_rec.c gcc -c -O -I. -I./../include -D_REENTRANT ../xa/xa.c gcc -c -O -I. -I./../include -D_REENTRANT ../xa/xa_db.c gcc -c -O -I. -I./../include -D_REENTRANT ../xa/xa_map.c gcc -c -O -I. -I./../include -D_REENTRANT ../clib/strsep.c /usr/bin/rm -f libdb.a /usr/ccs/bin//ar cr libdb.a bt_compare.o bt_conv.o bt_curadj.o bt_cursor.o = bt_delete.o bt_open.o bt_page.o bt_put.o bt_rec.o bt_ recno.o bt_rsearch.o bt_search.o bt_split.o bt_stat.o btree_auto.o db.o = db_appinit.o db_am.o db_apprec.o db_auto.o db_byteorder. o db_conv.o db_dispatch.o db_dup.o db_err.o db_iface.o db_join.o db_log2.o = db_overflow.o db_pr.o db_rec.o db_region.o db_ret.o d b_salloc.o db_shash.o dbm.o hash.o hash_auto.o hash_conv.o hash_dup.o = hash_func.o hash_page.o hash_rec.o hash_stat.o hsearch.o l ock.o lock_conflict.o lock_deadlock.o lock_util.o lock_region.o log.o = log_archive.o log_auto.o log_compare.o log_findckp.o log_g et.o log_put.o log_rec.o log_register.o mp_bh.o mp_fget.o mp_fopen.o = mp_fput.o mp_fset.o mp_open.o mp_pr.o mp_region.o mp_sync.o mutex.o os_abs.o os_alloc.o os_config.o os_dir.o os_fid.o os_fsync.o = os_map.o os_oflags.o os_open.o os_rpath.o os_rw.o os_seek. o os_sleep.o os_spin.o os_stat.o os_tmpdir.o os_unlink.o txn.o txn_auto.o = txn_rec.o xa.o xa_db.o xa_map.o strsep.o test ! -f /usr/ccs/bin//ranlib || /usr/ccs/bin//ranlib libdb.a gcc -c -O -I. -I./../include -D_REENTRANT ../db_archive/db_archive.c gcc -c -O -I. -I./../include -D_REENTRANT ../clib/err.c gcc -c -O -I. -I./../include -D_REENTRANT ../clib/getlong.c gcc -o db_archive db_archive.o err.o getlong.o libdb.a -lthread=20 gcc -c -O -I. -I./../include -D_REENTRANT ../db_checkpoint/db_checkpoint.c= gcc -o db_checkpoint db_checkpoint.o err.o getlong.o libdb.a -lthread=20 gcc -c -O -I. -I./../include -D_REENTRANT ../db_deadlock/db_deadlock.c gcc -o db_deadlock db_deadlock.o err.o getlong.o libdb.a -lthread=20 gcc -c -O -I. -I./../include -D_REENTRANT ../db_dump/db_dump.c gcc -o db_dump db_dump.o err.o getlong.o libdb.a -lthread=20 gcc -c -O -I. -I./../include -D_REENTRANT ../db_load/db_load.c gcc -o db_load db_load.o err.o getlong.o libdb.a -lthread=20 gcc -c -O -I. -I./../include -D_REENTRANT ../db_printlog/db_printlog.c gcc -o db_printlog db_printlog.o err.o getlong.o libdb.a -lthread=20 gcc -c -O -I. -I./../include -D_REENTRANT ../db_recover/db_recover.c gcc -o db_recover db_recover.o err.o getlong.o libdb.a -lthread=20 gcc -c -O -I. -I./../include -D_REENTRANT ../db_stat/db_stat.c gcc -o db_stat db_stat.o err.o getlong.o libdb.a -lthread=20 make[1]: Leaving directory `/jumpstart/tmp/intra-servers/intra-servers/htdi= g/htdig-3.1.5/db/dist' make[1]: Entering directory `/jumpstart/tmp/intra-servers/intra-servers/htd= ig/htdig-3.1.5/htlib' c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Confi guration.cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Conne ction.cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Datab ase.cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Dicti onary.cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 DB2_d b.cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 IntOb ject.cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 List. cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Objec t.cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Parse dString.cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Queue .cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Quote dStringList.cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Stack .cc c++ -c -DDEFAULT_CONFIG_FILE=3D\"/opt/www/htdig/conf/htdig.conf\" = -I../htlib -I../htcommon -I../db/dist -I../include -g -O2 Strin g.cc In file included from /usr/local/include/g++-v3/backward/stream.h:32, from String.cc:17: /usr/local/include/g++-v3/backward/iostream.h:35: using directive = `ostream'=20 introduced ambiguous type `ostream' htString.h: In function `std::ostream& operator<<(std::ostream&, String&)':= htString.h:159: `char*String::Data' is private String.cc:544: within this context String.cc: At global scope: String.cc:619: prototype for `void String::debug(std::ostream&)' does not = match=20 any in class `String' htString.h:127: candidate is: void String::debug(ostream&) make[1]: *** [String.o] Error 1 make[1]: Leaving directory `/jumpstart/tmp/intra-servers/intra-servers/htdi= g/htdig-3.1.5/htlib' make: *** [all] Error 1 ###########################################################################= ###### As you can see it gets an error right at the end there. In desperate need of help! Info: Compiling on Solaris 2.6, $LD_LIBRARY-PATH set to /usr/local/lib Michael Clarke IRD Open Systems Team Level 2, ComputerLand House 17-21 Dixon Street Wellington Phone: +64 (04) 8014676 Mobile: +64 021 455 218 email: mic...@ir... |
From: Gilles D. <gr...@sc...> - 2001-11-21 19:31:01
|
According to Joe R. Jah: > Sorry it took such a long time to respond, but I have been very busy > lately. It is not easy to prove a negative; however, I have tried a few > times to make 3.1.6 miss indexing files in stable snapshots of my site > without success;) > > Here is a comparison of the latest 3.1.6 snapshot on a snapshot of my site > -- 163 HTML-only documents -- with 3.1.6-072901: > > _______3.1.6-072901 + Armstrong patch + ssl.4_______ > htdig: Start digging: Sun Nov 11 18:15:43 PST 2001 > htmerge: Start merging: Sun Nov 11 18:16:16 PST 2001 33 seconds > htmerge: Total word count: 13171 > htmerge: Total documents: 163 > htmerge: Total doc db size (in K): 1888 > -------------------------8<------------------------- > __________3.1.6-111101 + ssl.5 + FAQ#5.14___________ > htdig: Start digging: Sun Nov 11 18:19:19 PST 2001 > htmerge: Start merging: Sun Nov 11 18:20:58 PST 2001 99 seconds > htmerge: Total word count: 13171 > htmerge: Total documents: 163 > htmerge: Total doc db size (in K): 1888 > -------------------------8<------------------------- > CPU: 350 MHz Pentium > RAM: 384 Megs > OS: BSDi-4.2 > > They both index the exact number of documents; this is as conclusive a > result as I can produce. The only difference is the the time they take. > > Incidentally, ssl.4 fails to apply to the latest snapshot because of the > recent changes to Connection.cc. I have modified the patch to apply > cleanly to the latest snapshot of 3.1.6: > > ftp://ftp.ccsf.org/htdig-patches/3.1.6/ssl.5 Thanks for all your efforts, Joe. I can't exactly boast about my response times lately either. You're right that it's not easy to prove a negative, but I'm satisfied that what we were seeing before is most likely due to uncontrolled variables, rather than parser bugs. It's very strange that the latest 3.1.6 snapshot is 3 times slower, but that could be entirely due to regex being much less efficient than rx on BSD. Funny thing is the rx code that came with older versions of htdig was much, much slower than the GNU regex code. Maybe BSD's regex is based on the old rx code we had been using, while their rx code uses algorithms similar to GNU regex. It would be nice if the library developers got their act together and came up with one solid, standard, efficient implementation of both, so that it wouldn't matter which API your code used. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Geoff H. <ghu...@us...> - 2001-11-18 08:13:21
|
STATUS of ht://Dig branch 3-2-x RELEASES: 3.2.0b4: In progress 3.2.0b3: Released: 22 Feb 2001. 3.2.0b2: Released: 11 Apr 2000. 3.2.0b1: Released: 4 Feb 2000. SHOWSTOPPERS: KNOWN BUGS: * Odd behavior with $(MODIFIED) and scores not working with wordlist_compress set but work fine without wordlist_compress. (the date is definitely stored correctly, even with compression on so this must be some sort of weird htsearch bug) * Not all htsearch input parameters are handled properly: PR#648. Use a consistant mapping of input -> config -> template for all inputs where it makes sense to do so (everything but "config" and "words"?). * If exact isn't specified in the search_algorithms, $(WORDS) is not set correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can we fix this?) * META descriptions are somehow added to the database as FLAG_TITLE, not FLAG_DESCRIPTION. (PR#859) PENDING PATCHES (available but need work): * Additional support for Win32. * Memory improvements to htmerge. (Backed out b/c htword API changed.) * MySQL patches to 3.1.x to be forward-ported and cleaned up. (Should really only attempt to use SQL for doc_db and related, not word_db) NEEDED FEATURES: * Field-restricted searching. * Return all URLs. * Handle noindex_start & noindex_end as string lists. * Handle local_urls through file:// handler, for mime.types support. * Handle directory redirects in RetrieveLocal. * Merge with mifluz TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Turn on URL parser test as part of test suite. * htsearch phrase support tests * Tests for new config file parser * Duplicate document detection while indexing * Major revisions to ExternalParser.cc, including fork/exec instead of popen, argument handling for parser/converter, allowing binary output from an external converter. * ExternalTransport needs testing of changes similar to ExternalParser. DOCUMENTATION: * List of supported platforms/compilers is ancient. * Add thorough documentation on htsearch restrict/exclude behavior (including '|' and regex). * Document all of htsearch's mappings of input parameters to config attributes to template variables. (Relates to PR#648.) Also make sure these config attributes are all documented in defaults.cc, even if they're only set by input parameters and never in the config file. * Split attrs.html into categories for faster loading. * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. phrase searching, regex matching, external parsers and transport methods, database compression.) * TODO.html has not been updated for current TODO list and completions. OTHER ISSUES: * Can htsearch actually search while an index is being created? (Does Loic's new database code make this work?) * The code needs a security audit, esp. htsearch * URL.cc tries to parse malformed URLs (which causes further problems) (It should probably just set everything to empty) This relates to PR#348. |
From: Geoff H. <ghu...@ws...> - 2001-11-16 04:50:06
|
At 1:12 PM -0600 11/15/01, John Elser wrote: >The two words wouldn't have to be next to each other to find a >match, but they would have to be in the same sentence. This is certainly on the wish list. As well, we'd like to do "proximity scoring" where words that fall closer together in an AND or OR query would get higher scoring. >Just curious as to when the version 3.2.0 will be released as a >stable release. Any idea of an approximate time frame? One month, >6 months, etc... When it's done. No, seriously. I can't belive it will take less than 6 months unless some additional developers step forward to help out. -- -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
From: John E. <jE...@ck...> - 2001-11-15 19:02:29
|
Just a suggestion to a possible enhancement to an upcoming version of = Ht-Dig is to have the capability of "near" searching. It was requested = in our office that a "near" search based on X number of words away from = the search word OR within the same sentence OR within the same paragraph = as the search word. For example, if I keyed in "summary judgment" and = selected within the same sentence. The two words wouldn't have to be = next to each other to find a match, but they would have to be in the = same sentence. Just curious as to when the version 3.2.0 will be released as a stable = release. Any idea of an approximate time frame? One month, 6 months, = etc... Thanks, John |