You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Geoff H. <ghu...@ws...> - 2001-12-01 23:57:45
|
At 12:06 PM -0600 11/30/01, Gilles Detillieux wrote: >This is the part I find a bit troubling, but I don't know what we >can do about it. I don't know why Armstrong's patch, which uses rx >instead of regex, causes htdig to run 2-3 times faster, unless there >are other changes between 092301 and 112501 that account for much of >this, but it could well be just implementation efficiencies in one >library and not in the other. Ages and ages ago, I remember a small war over the most efficient regex implementation in other forums. At the time, I believe rx was considered the fastest for most things. So when I was working on htdig and saw rx, I wasn't surprised. Then we had the wonderful discovery that using the system regex instead of rx improved life for building the endings db for almost everyone. I'd be interested if a modification to the 3.2 code to use rx like the original Armstrong patch would give Joe a similar speed boost--this might be an interesting experiment. I haven't done extensive timing tests and don't have the time required to do them on my Linux box. But I can't believe there's a 3x difference on Linux. I'll see if I can dig up some autoconf tricks for switching between various regex implementations. If it's buried in HtRegex.* we can hide the changes from the rest of the code. -- -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
From: Joe R. J. <jj...@cl...> - 2001-12-01 10:39:15
|
On Fri, 30 Nov 2001, Gilles Detillieux wrote: > Date: Fri, 30 Nov 2001 12:06:04 -0600 (CST) > From: Gilles Detillieux <gr...@sc...> > To: Joe R. Jah <jj...@cl...> > Cc: "ht://Dig developers list" <htd...@li...> > Subject: Re: [htdig-dev] to-do list for 3.1.6 > > > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > > rm htlib/regex.* > > remove reference to regex.o in htlib/Makefile > > #undef HAVE_BROKEN_REGEX in include/htconfig.h > > > > htdig: Start digging: Thu Nov 29 22:22:32 PST 2001 > > htmerge: Start merging: Thu Nov 29 22:24:14 PST 2001 104 seconds > ... > > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > > rm htlib/regex.* > > remove reference to regex.o in htlib/Makefile > > #define HAVE_BROKEN_REGEX in include/htconfig.h > > > > htdig: Start digging: Thu Nov 29 22:25:33 PST 2001 > > htmerge: Start merging: Thu Nov 29 22:27:12 PST 2001 99 seconds > ... > > I don't think the difference between 99 and 104 seconds is significant. > This confirms my suspicion that the HAVE_BROKEN_REGEX doesn't do a > whole lot. To be sure, though, I think we'd need timings for 112501 + > parsedate.0 + ssl.6, remove reference to regex.o in htlib/Makefile, #undef > AND #define HAVE_BROKEN_REGEX (i.e. two tests) in include/htconfig.h > (but don't remove htlib/regex.h). I suspect the timings for both will > be like the 2nd test above, around 143 sec. ___________________ 112501 + parsedate.0 + ssl.6 ___________________ remove reference to regex.o in htlib/Makefile #define HAVE_BROKEN_REGEX in include/htconfig.h htdig: Start digging: Sat Dec 1 00:10:58 PST 2001 htmerge: Start merging: Sat Dec 1 00:12:44 PST 2001 106 htmerge: Total word count: 13159 htmerge: Total documents: 163 htmerge: Total size of documents (in K): 1904 ___________________ 112501 + parsedate.0 + ssl.6 ___________________ remove reference to regex.o in htlib/Makefile #undef HAVE_BROKEN_REGEX in include/htconfig.h htdig: Start digging: Sat Dec 1 00:18:55 PST 2001 htmerge: Start merging: Sat Dec 1 00:20:38 PST 2001 103 htmerge: Total word count: 13159 htmerge: Total documents: 163 htmerge: Total size of documents (in K): 1904 ____________________________________________________________________ > I suspect the difference between the 143 and the 99-104 sec is due > to the inclusion of the bundled regex.h even though you're using > the C library regex.o code. It's a wonder this works at all, but > there does seem to be some impact on performance. I am not sure how that 143 came about last time; I can't reproduce it any more;-/ > > ____________________ 092301 + Armstrong + ssl.4 ____________________ > > htdig: Start digging: Fri Nov 30 00:18:06 PST 2001 > > htmerge: Start merging: Fri Nov 30 00:18:44 PST 2001 38 seconds > ... > > This is the part I find a bit troubling, but I don't know what we > can do about it. I don't know why Armstrong's patch, which uses rx > instead of regex, causes htdig to run 2-3 times faster, unless there > are other changes between 092301 and 112501 that account for much of > this, but it could well be just implementation efficiencies in one > library and not in the other. I reported the difference in indexing time to the list the very first time url_rewrite_rules was integrated in the code. I don't believe at that time anything else had changed in the code. > In your tests above, do you make use of url_rewrite_rules? If so, > how do the timings change if you don't use it? ___________________ 112501 + parsedate.0 + ssl.6 ___________________ remove reference to regex.o in htlib/Makefile #define HAVE_BROKEN_REGEX in include/htconfig.h no url_rewrite_rules htdig: Start digging: Sat Dec 1 00:40:09 PST 2001 htmerge: Start merging: Sat Dec 1 00:40:34 PST 2001 25 seconds htmerge: Total word count: 13159 htmerge: Total documents: 163 htmerge: Total size of documents (in K): 1904 rundig: end rundig: Sat Dec 1 00:40:39 PST 2001 ___________________ 112501 + parsedate.0 + ssl.6 ___________________ remove reference to regex.o in htlib/Makefile #undef HAVE_BROKEN_REGEX in include/htconfig.h no url_rewrite_rules htdig: Start digging: Sat Dec 1 00:28:50 PST 2001 htmerge: Start merging: Sat Dec 1 00:29:10 PST 2001 20 seconds htmerge: Total word count: 13159 htmerge: Total documents: 163 htmerge: Total size of documents (in K): 1904 ____________________________________________________________________ Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Gilles D. <gr...@sc...> - 2001-11-30 18:29:03
|
According to Marco Nenciarini: > I am the italian mirror maintainer, and I will report you that your ftp site > doesn't work at all. > > My logs says that your ftp site is down from 5 Nov 2001 Yes, we know. SourceForge has disabled project FTP services. We now recommend the use of HTTP over FTP on the htdig.org site. If you can use HTTP to get your mirror, you'll still be able to make it available via either or both FTP and HTTP. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-11-30 18:06:14
|
According to Joe R. Jah: > Here is some statistics that may address your questions;) > > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > Without any change. > htdig: Start digging: Thu Nov 29 23:27:58 PST 2001 > htmerge: Start merging: Thu Nov 29 23:33:35 PST 2001 337 seconds ... > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > remove reference to regex.o in htlib/Makefile > > htdig: Start digging: Thu Nov 29 23:42:15 PST 2001 > htmerge: Start merging: Thu Nov 29 23:44:38 PST 2001 143 seconds ... OK, obviously removing the bundled regex and making sure it doesn't get into htlib.a has a big impact. > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > rm htlib/regex.* > remove reference to regex.o in htlib/Makefile > #undef HAVE_BROKEN_REGEX in include/htconfig.h > > htdig: Start digging: Thu Nov 29 22:22:32 PST 2001 > htmerge: Start merging: Thu Nov 29 22:24:14 PST 2001 104 seconds ... > ___________________ 112501 + parsedate.0 + ssl.6 ___________________ > rm htlib/regex.* > remove reference to regex.o in htlib/Makefile > #define HAVE_BROKEN_REGEX in include/htconfig.h > > htdig: Start digging: Thu Nov 29 22:25:33 PST 2001 > htmerge: Start merging: Thu Nov 29 22:27:12 PST 2001 99 seconds ... I don't think the difference between 99 and 104 seconds is significant. This confirms my suspicion that the HAVE_BROKEN_REGEX doesn't do a whole lot. To be sure, though, I think we'd need timings for 112501 + parsedate.0 + ssl.6, remove reference to regex.o in htlib/Makefile, #undef AND #define HAVE_BROKEN_REGEX (i.e. two tests) in include/htconfig.h (but don't remove htlib/regex.h). I suspect the timings for both will be like the 2nd test above, around 143 sec. I suspect the difference between the 143 and the 99-104 sec is due to the inclusion of the bundled regex.h even though you're using the C library regex.o code. It's a wonder this works at all, but there does seem to be some impact on performance. > ____________________ 092301 + Armstrong + ssl.4 ____________________ > htdig: Start digging: Fri Nov 30 00:18:06 PST 2001 > htmerge: Start merging: Fri Nov 30 00:18:44 PST 2001 38 seconds ... This is the part I find a bit troubling, but I don't know what we can do about it. I don't know why Armstrong's patch, which uses rx instead of regex, causes htdig to run 2-3 times faster, unless there are other changes between 092301 and 112501 that account for much of this, but it could well be just implementation efficiencies in one library and not in the other. In your tests above, do you make use of url_rewrite_rules? If so, how do the timings change if you don't use it? Thanks for the feedback. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Joe R. J. <jj...@cl...> - 2001-11-30 08:33:57
|
On Thu, 29 Nov 2001, Gilles Detillieux wrote: > Date: Thu, 29 Nov 2001 11:55:08 -0600 (CST) > From: Gilles Detillieux <gr...@sc...> > To: Joe R. Jah <jj...@cl...> > Cc: Geoff Hutchison <ghu...@ws...>, "ht://Dig developers list" <htd...@li...> > Subject: Re: [htdig-dev] to-do list for 3.1.6 > > Wait a minute. I'm almost positive that the problem on BSDi was not the > use of the system's regex.h, but rather the use of the regex code that's > bundled with htdig! Can you do me a favour and have a look at what the > value of HAVE_BROKEN_REGEX is in include/htconfig.h? If it's #define'd, > please try to #undef it and recompile/reinstall htdig, and let us know > how that impacts digging time. The selection of the regex code has to > be an all or nothing thing. If you use the bundled code, all source > files that use regex.h should use the bundled one, but if you use the C > library regex code, then all source files should use the system's regex.h. > If you mix and match the two, you're likely to run into problems. > > I think we need to fix htfuzzy/EndingsDB.cc to check the setting > of HAVE_BROKEN_REGEX and use the appropriate header file. Come to > think of it, I think there's a problem with how HAVE_BROKEN_REGEX is > handled in htlib/HtRegex.h too, because simply using #include <regex.h> > doesn't guarantee that the compiler won't use the bundled one instead, > as the Makefile.config file puts a -I../htlib in the compiler flags. > I think to make this all work, we need to rename the bundled regex.h to > something like htregex.h to avoid conflicts, as well as put some hooks > in the bundled regex.c code to disable it all if you need to use the > C library code instead. What do you think, Geoff? > > However, Joe, if you did remove both htlib/regex.c and htlib/regex.h > as you said you did, then you should be safely using all the C library > code, and not the bundled code, so it should be good. > > I also had a look at the HAVE_BROKEN_REGEX on my Red Hat 4.2 system, > and surprisingly it was defined. I say surprisingly because previously > I had tried to manually force it to use the C library regex, as Joe > does by removing the bundled code and removing the reference to regex.o > in htlib/Makefile, and that had caused htfuzzy and htsearch to crash. > That was on 3.1.4 or 2.1.5, I think. Anyway, 3.1.6 isn't crashing on this > system, so I'd see this as further evidence that the HAVE_BROKEN_REGEX > handling is not working. So, we need to fix the usage of the flag. > We also need to fix the test for this situation, because it should not > be defining this flag on my RH 4.2 system. Finally, I think we need a > better name for it, as it implies that the C library regex is broken, > when in fact the problem on BSDi systems is that the bundled regex code > conflicts with the libraries in some way. > > In the end, it might make sense to have a configure option to override > the automatic test for this, because I'm not convinced it will work in > all cases. (However, to the best of my recollection, it is only BSDi > systems that have a problem with the bundled regex code.) Here is some statistics that may address your questions;) ___________________ 112501 + parsedate.0 + ssl.6 ___________________ Without any change. htdig: Start digging: Thu Nov 29 23:27:58 PST 2001 htmerge: Start merging: Thu Nov 29 23:33:35 PST 2001 337 seconds htmerge: Total word count: 13160 htmerge: Total documents: 163 htmerge: Total size of documents (in K): 1904 ____________________________________________________________________ ___________________ 112501 + parsedate.0 + ssl.6 ___________________ remove reference to regex.o in htlib/Makefile htdig: Start digging: Thu Nov 29 23:42:15 PST 2001 htmerge: Start merging: Thu Nov 29 23:44:38 PST 2001 143 seconds htmerge: Total word count: 13160 htmerge: Total documents: 163 htmerge: Total size of documents (in K): 1904 ____________________________________________________________________ ___________________ 112501 + parsedate.0 + ssl.6 ___________________ rm htlib/regex.* remove reference to regex.o in htlib/Makefile #undef HAVE_BROKEN_REGEX in include/htconfig.h htdig: Start digging: Thu Nov 29 22:22:32 PST 2001 htmerge: Start merging: Thu Nov 29 22:24:14 PST 2001 104 seconds htmerge: Total word count: 13160 htmerge: Total documents: 163 htmerge: Total size of documents (in K): 1904 ____________________________________________________________________ ___________________ 112501 + parsedate.0 + ssl.6 ___________________ rm htlib/regex.* remove reference to regex.o in htlib/Makefile #define HAVE_BROKEN_REGEX in include/htconfig.h htdig: Start digging: Thu Nov 29 22:25:33 PST 2001 htmerge: Start merging: Thu Nov 29 22:27:12 PST 2001 99 seconds htmerge: Total word count: 13160 htmerge: Total documents: 163 htmerge: Total size of documents (in K): 1904 ____________________________________________________________________ ____________________ 092301 + Armstrong + ssl.4 ____________________ htdig: Start digging: Fri Nov 30 00:18:06 PST 2001 htmerge: Start merging: Fri Nov 30 00:18:44 PST 2001 38 seconds htmerge: Total word count: 13160 htmerge: Total documents: 163 htmerge: Total doc db size (in K): 1904 ____________________________________________________________________ Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Gilles D. <gr...@sc...> - 2001-11-29 18:11:18
|
According to Iosif Fettich: > Just an addition to make aware the problem to whoever would attack the > translations pitfalls: > > it's rather long since I worked out a patch to make htdig fit in our > needs. I never got the time to really get involved and put some serious > work in this; however, the problem is till there and if anyone will get > involved, maybe it would be worth being aware of it. > > I'm speaking for Romanian as language used in indexed documents. > > Since there always are problems with ISO-8859-2 chars, many users actually > choose not to use them at all. In consequence, spellings with accented > chars or with there unaccented counterparts are used in mixed fashion. > > The approach we took was to simply transform _all_ accented chars in their > unaccented counterparts before indexing; the same of course before > searching. > > Without the ability to do that, I'm afraid that our indexing wouldn't > have been as successfull as it proved to be. > > It's true that, in this way, we cannot search for example only for the > accented words, not showing the others - but users proved to be much > more resilient in getting some more (slightly missed) hits, than not > getting the relevant ones... > > Even if kept only as an option, the possibility to work like that > definitely should be present in future versions of htdig. The problem is that transforming accented characters to unaccented is an encoding-specific task, so to do even this much, htdig would need to be aware of which encoding is used, and tranform characters appropriately. This takes us back to the idea of an accent_map attribute that would allow users to specify the transformations they need, but that's a big job that wouldn't integrate that neatly into the current code. (Sigh!) It would also be desirable to have a configurable list of entities to decode to specific characters, rather than the current hard-coded set of iso-8859-1 entities. That would allow you to set htdig up to properly use entities for iso-8859-2 or other encodings, converting the entities to the 8-bit encoding of your choice. At a bare minimum, we need an attribute for enabling or disabling the decoding of SGML entities for iso-8859-1 characters, in both 3.1.6 and 3.2.0b4. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-11-29 17:55:24
|
According to Joe R. Jah: > On Tue, 27 Nov 2001, Gilles Detillieux wrote: > > OK, so for either 3.2.0b4 or 3.1.6, you need to manually override > > things to get rid of the bundled regex code, right? > > That's right. There are two files left in 3.1.6 that use system's > regex.h: > > HtRegex.h > EndingsDB.cc > > I suspect HtRegex.h to be the cause of tripling dig time in 3.1.6;-/ Wait a minute. I'm almost positive that the problem on BSDi was not the use of the system's regex.h, but rather the use of the regex code that's bundled with htdig! Can you do me a favour and have a look at what the value of HAVE_BROKEN_REGEX is in include/htconfig.h? If it's #define'd, please try to #undef it and recompile/reinstall htdig, and let us know how that impacts digging time. The selection of the regex code has to be an all or nothing thing. If you use the bundled code, all source files that use regex.h should use the bundled one, but if you use the C library regex code, then all source files should use the system's regex.h. If you mix and match the two, you're likely to run into problems. I think we need to fix htfuzzy/EndingsDB.cc to check the setting of HAVE_BROKEN_REGEX and use the appropriate header file. Come to think of it, I think there's a problem with how HAVE_BROKEN_REGEX is handled in htlib/HtRegex.h too, because simply using #include <regex.h> doesn't guarantee that the compiler won't use the bundled one instead, as the Makefile.config file puts a -I../htlib in the compiler flags. I think to make this all work, we need to rename the bundled regex.h to something like htregex.h to avoid conflicts, as well as put some hooks in the bundled regex.c code to disable it all if you need to use the C library code instead. What do you think, Geoff? However, Joe, if you did remove both htlib/regex.c and htlib/regex.h as you said you did, then you should be safely using all the C library code, and not the bundled code, so it should be good. I also had a look at the HAVE_BROKEN_REGEX on my Red Hat 4.2 system, and surprisingly it was defined. I say surprisingly because previously I had tried to manually force it to use the C library regex, as Joe does by removing the bundled code and removing the reference to regex.o in htlib/Makefile, and that had caused htfuzzy and htsearch to crash. That was on 3.1.4 or 2.1.5, I think. Anyway, 3.1.6 isn't crashing on this system, so I'd see this as further evidence that the HAVE_BROKEN_REGEX handling is not working. So, we need to fix the usage of the flag. We also need to fix the test for this situation, because it should not be defining this flag on my RH 4.2 system. Finally, I think we need a better name for it, as it implies that the C library regex is broken, when in fact the problem on BSDi systems is that the bundled regex code conflicts with the libraries in some way. In the end, it might make sense to have a configure option to override the automatic test for this, because I'm not convinced it will work in all cases. (However, to the best of my recollection, it is only BSDi systems that have a problem with the bundled regex code.) -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Joe R. J. <jj...@cl...> - 2001-11-29 17:24:15
|
On Thu, 29 Nov 2001, Gilles Detillieux wrote: > Date: Thu, 29 Nov 2001 11:15:48 -0600 (CST) > From: Gilles Detillieux <gr...@sc...> > To: Joe R. Jah <jj...@cl...> > Cc: Gilles Detillieux <gr...@sc...>, "ht://Dig developers list" <htd...@li...> > Subject: Re: [htdig-dev] workaround for mktime/strptime/LC_TIME problems > > > > Like this... Use "patch -p0 < this-message" in the htdig-3.1.6 source > > > directory from the latest snapshot to use the new date parsing code. > > > I'll probably post it to CVS today or tomorrow. > > > > It's in the patch archives: > > > > ftp://ftp.ccsf.org/htdig-patches/3.1.6/mktime-strptime-LC_TIME.0 > > I'd suggest a more meaningful name for the patch, though, like parsedate.0 Done; it is now: ftp://ftp.ccsf.org/htdig-patches/3.1.6/parsedate.0 Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Gilles D. <gr...@sc...> - 2001-11-29 17:15:55
|
According to Joe R. Jah: > On Tue, 27 Nov 2001, Gilles Detillieux wrote: > > Date: Tue, 27 Nov 2001 15:09:07 -0600 (CST) > > From: Gilles Detillieux <gr...@sc...> > > To: Joe R. Jah <jj...@cl...> > > Cc: Gilles Detillieux <gr...@sc...>, > "ht://Dig developers list" <htd...@li...> > > Subject: Re: [htdig-dev] workaround for mktime/strptime/LC_TIME problems > > > > Like this... Use "patch -p0 < this-message" in the htdig-3.1.6 source > > directory from the latest snapshot to use the new date parsing code. > > I'll probably post it to CVS today or tomorrow. > > It's in the patch archives: > > ftp://ftp.ccsf.org/htdig-patches/3.1.6/mktime-strptime-LC_TIME.0 > > This patch causes ssl.5 to fail. I modified ssl.5 by removing the bit: > __________________________htdig/Document.cc_____________________________ > @@ -220,6 +220,7 @@ > tm.tm_year += 1900; > tm.tm_yday = 0; // clear these to prevent problems in strftime() > tm.tm_wday = 0; > + tm.tm_isdst = -1; > > if (debug > 2) > { > ________________________________________________________________________ > > It seems to be irrelevant after your patch. The resulting ssl patch: > > ftp://ftp.ccsf.org/htdig-patches/3.1.6/ssl.6 Thanks, Joe. Yes, all the tm structure manipulation is gone from Document::getdate() now, after my patch, so it needs to be removed from any other patch that is applied to Document.cc. I was mistaken earlier, though, about my patch making it possible to remove the 'setlocale(LC_TIME, "C");' from htlib/Configuration.cc. That call is still needed, because this locale setting will still affect the strftime() call that generates the If-Modified-Since header, so we can't allow LC_TIME to be set to another locale. So, the Configuration.cc locale handling has to stay as it is, but we do still solve the problems we had with mktime and strptime. I'd suggest a more meaningful name for the patch, though, like parsedate.0 -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Marco N. <mn...@pr...> - 2001-11-29 07:33:32
|
Hi all, I am the italian mirror maintainer, and I will report you that your ftp site doesn't work at all. My logs says that your ftp site is down from 5 Nov 2001 Best Regards --=20 --------------------------------------------------------------------- | Marco Nenciarini | Debian/GNU Linux Developer - Plug Member | | mn...@pr... | http://www.prato.linux.it/~mnencia | --------------------------------------------------------------------- Key fingerprint =3D FED9 69C7 9E67 21F5 7D95 5270 6864 730D F095 E5E4 |
From: Michael C. <Mic...@ir...> - 2001-11-28 21:03:16
|
You commented that an error message would be more usefull that just = 'dies'. These error messages had been dealt with before by this mailing = list so I felt no need to go re-invent the wheel - it was more so that = people could 'remember' my previous problems and hopefully get up to speed = with my current situation. The recent problem I had where the server returned an unrecognized = character problem was to do with how I handled cgi-bin files (as perl = scripts). The work around around this I found on a very usefull site that = suggested that I set up a secondary cgi-bin (ie htdig-bin) and give it no = handlers, and set AllowOveride None. Worked perfect, on all 3 development / production Servers. AM away laughing for the time be, thanks again for the help. Michael Clarke IRD Open Systems Team Level 4, Telecom House 13-27 Manners Street Wellington Phone: +64 (04) 8031423 Mobile: +64 021 455 218 email: mic...@ir... email: ma...@ir... |
From: Gilles D. <gr...@sc...> - 2001-11-28 20:47:22
|
According to Michael Clarke: > Firstly I grabbed the latest beta of htdig 3.20b3 (or > something-a-rather), compiled and ran make, and it dies with string.cc > errors. Was referred to an older snapshot vversion, and this dies during > configure (3.16). Grabbed 3.15 and that says you need autoconf. Get > autoconf, configure and make and it says you need m4. Get m4, compile > and make, dies. Generally, exact error messages tend to be more useful than "dies". > Then I grab a 2.7 package, untar and unzip, and place where I need > items to be placed. Change the conf file, and run htdig and htmerge, > they seem to run fine. Which package would this be? I hope you're not running that ancient 3.1.3 binary package for sparc solaris on htdig.org/files/binaries/. > SO at this stage I am digging http://infoweb/ and it does that fine, > htmegre does what it does fine, so now I go to search: > > http://myhost.htdig/search.html > > > and it returns an internal server error, logs state: > > [date time...][error] Out of memory during "large" request for 1052672 bytes, total sbrk() is 6617464 bytes at /usr/local/lib/perl5/site_perl/5.6.1/sun4-solaris/Apache/Registry.pm line 103. > > Whats wrong? Well, given that htsearch is a compiled C++ program, and not a Perl script, the presence of perl-related errors in the server log should suggest that the server is not doing the right thing. See http://www.htdig.org/FAQ.html#q5.23 > What I realllllly neeeeedd is for someone on a Solaris 2.6 or solaris > 2.7 system to make me a package for it. > > I need to dig on http://infoweb/ > htdocs are in /webdocs/prodn/ > apache is in /opt/apache/ > cgi-bin is /opt/apache/cgi-bin/ > htdig is in /opt/apache/htdig/ > search is in /webdocs/prodn/htdig/search.html > htsearch is in /opt/apache/cgi-bin/htsearch > db is in /opt/apache/htdig/db > common is in /opt/apache/htdig/common/ > conf is in /opt/apache/htdig/conf/ > bin is in /opt/apache/htdig/bin/ > > can any one make this for me or tell me a quick fix for my memory problem??????? > > Thanks anyone - can they email a package or a tentative reply to > > ma...@ir... Well, I don't have a Solaris system to play with, so I can't help you there. In any case, no package you get will fix the server configuration problem above. You'll need to tweak your Apache configuration. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Ionut N. <io...@ef...> - 2001-11-28 18:47:00
|
> The "right thing" to do would be to either not decode SGML entities > at all, but somehow compensate for that in the word matching, or to > decode all standard or proposed entities UNAMBIGUOUSLY so that you can > map them back correctly in htsearch. This means not being limited to > 256 characters in a single byte. htsearch would then have to be aware > of the encoding used on output, and map the characters to the correct > single character or SGML encoding as appropriate. Thanks a lot Gilles, I'll keep monitoring the list for news on that. Ionut Nistor io...@ef... |
From: Iosif F. <ife...@ne...> - 2001-11-28 08:55:40
|
Just an addition to make aware the problem to whoever would attack the translations pitfalls: it's rather long since I worked out a patch to make htdig fit in our needs. I never got the time to really get involved and put some serious work in this; however, the problem is till there and if anyone will get involved, maybe it would be worth being aware of it. I'm speaking for Romanian as language used in indexed documents. Since there always are problems with ISO-8859-2 chars, many users actually choose not to use them at all. In consequence, spellings with accented chars or with there unaccented counterparts are used in mixed fashion. The approach we took was to simply transform _all_ accented chars in their unaccented counterparts before indexing; the same of course before searching. Without the ability to do that, I'm afraid that our indexing wouldn't have been as successfull as it proved to be. It's true that, in this way, we cannot search for example only for the accented words, not showing the others - but users proved to be much more resilient in getting some more (slightly missed) hits, than not getting the relevant ones... Even if kept only as an option, the possibility to work like that definitely should be present in future versions of htdig. Thank you. Iosif Fettich |
From: Joe R. J. <jj...@cl...> - 2001-11-28 07:40:33
|
On Tue, 27 Nov 2001, Gilles Detillieux wrote: > Date: Tue, 27 Nov 2001 16:14:28 -0600 (CST) > From: Gilles Detillieux <gr...@sc...> > To: Joe R. Jah <jj...@cl...> > Cc: Geoff Hutchison <ghu...@ws...>, "ht://Dig developers list" <htd...@li...> > Subject: Re: [htdig-dev] to-do list for 3.1.6 > > OK, so for either 3.2.0b4 or 3.1.6, you need to manually override > things to get rid of the bundled regex code, right? That's right. There are two files left in 3.1.6 that use system's regex.h: HtRegex.h EndingsDB.cc I suspect HtRegex.h to be the cause of tripling dig time in 3.1.6;-/ Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Joe R. J. <jj...@cl...> - 2001-11-28 06:59:48
|
On Tue, 27 Nov 2001, Gilles Detillieux wrote: > Date: Tue, 27 Nov 2001 17:07:38 -0600 (CST) > From: Gilles Detillieux <gr...@sc...> > To: Joe R. Jah <jj...@cl...> > Cc: Geoff Hutchison <ghu...@us...>, htd...@li... > Subject: Re: [htdig-dev] Current Status as of snapshot 3.1.6-112501 > > Arrggh! Something has gone wrong with the snapshot script, obviously. > I suspected something was up last week when we got a few complaints > about the 3.1.6 snapshot needing autoconf to build, so I knew there was > a problem with some file times. It seems now that it's not getting > its CVS updates correctly. The patch below will get you up to date. > (Use patch -p1 for this one.) Thank you. It is in the patch archives for redundancy:) ftp://ftp.ccsf.org/htdig-patches/3.1.6/111101-112501 Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Joe R. J. <jj...@cl...> - 2001-11-28 06:43:09
|
On Tue, 27 Nov 2001, Gilles Detillieux wrote: > Date: Tue, 27 Nov 2001 15:09:07 -0600 (CST) > From: Gilles Detillieux <gr...@sc...> > To: Joe R. Jah <jj...@cl...> > Cc: Gilles Detillieux <gr...@sc...>, "ht://Dig developers list" <htd...@li...> > Subject: Re: [htdig-dev] workaround for mktime/strptime/LC_TIME problems > > Like this... Use "patch -p0 < this-message" in the htdig-3.1.6 source > directory from the latest snapshot to use the new date parsing code. > I'll probably post it to CVS today or tomorrow. It's in the patch archives: ftp://ftp.ccsf.org/htdig-patches/3.1.6/mktime-strptime-LC_TIME.0 This patch causes ssl.5 to fail. I modified ssl.5 by removing the bit: __________________________htdig/Document.cc_____________________________ @@ -220,6 +220,7 @@ tm.tm_year += 1900; tm.tm_yday = 0; // clear these to prevent problems in strftime() tm.tm_wday = 0; + tm.tm_isdst = -1; if (debug > 2) { ________________________________________________________________________ It seems to be irrelevant after your patch. The resulting ssl patch: ftp://ftp.ccsf.org/htdig-patches/3.1.6/ssl.6 Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Gilles D. <gr...@sc...> - 2001-11-28 04:00:04
|
According to Michael Clarke: > Hello, I'm running the htdig 2.7 binary on a 2.6 system. htdig and > htmerge seem to run perfectly fine, yet when I run htsearch from the > web page I get an error in my apache log (shows as a 500 error) > > [date time] [error] Unrecognized character \x7F at /opt/apache/cgi-bin/htsarch line 1, <IN> line 107 > > #------------------------------------------------# > > Taking a stab in the dark here it looks like apache is trying to run > htsearch as a cgi-script, which it isn't. Any way to get this to run???? Sounds an awful lot like what http://www.htdig.org/FAQ.html#q5.7 deals with. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-11-28 03:48:39
|
According to Martin Resch: > In my index.html (contemporary start-url) i link all files like this: > > <a href="index2.php?h_sNextSite=something.php"></a> > > This has to be in that form because I get on this way some variables in > the output-url like > > http://server/index2.php?h_sNextSite=something.php&variable1=something&v > ariable2=somthing_else and so on. > > I get also a session-id as variable like ....&session=12345, but I don't > want to have it. Can I exclude this piece of the url? It is always in > form of $session=[the ID]. You'd need to grab the latest 3.1.6 or 3.2.0b4 snapshot from http://www.htdig.org/files/snapshots/ and use the url_rewrite_rules attribute in either of those. See htdoc/attrs.html in the source for information on how to use this attribute. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Geoff H. <ghu...@ws...> - 2001-11-27 23:32:04
|
On Tue, 27 Nov 2001, Gilles Detillieux wrote: > > According to the ChangeLog file this snapshot was last changed on November > > 3, but Gilles indicated last week that he had committed several fixes and > > features to the CVS tree. Any ideas? > > Arrggh! Something has gone wrong with the snapshot script, obviously. > I suspected something was up last week when we got a few complaints > about the 3.1.6 snapshot needing autoconf to build, so I knew there was These are two issues. 1) Snapshot problem--turns out that SF decided to invalidate my SSH key recently. So the scripts were failing because it was asking for my password... Should be fixed now. 2) Autoconf: The snapshots come from CVS, which doesn't necessarily preserve dates. So often configure is older than configure.in from the CVS copy. Not much we can do about this, except make a real "pre-release" or "release" version where everything is checked. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
From: Geoff H. <ghu...@ws...> - 2001-11-27 23:24:18
|
On Tue, 27 Nov 2001, Gilles Detillieux wrote: > However, the gcc 3.0 fixes should be in there already, according to > what Geoff said. See http://www.htdig.org/files/snapshots/ Yes, this is correct. I put them in on 11/3. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
From: Gilles D. <gr...@sc...> - 2001-11-27 23:11:35
|
According to Michael Clarke: > Was there to be a new release of ht-dig today or yeserday (26-11-2001- > NZ time) that deals with gcc3.0 compilation problems (amongst others), > Anyone know when this will be available? There wasn't a new release scheduled for this weekend, but there should have been a new snapshot. Unfortunately, the snapshot script didn't seem to grab the last week's CVS updates, so it's still a week behind. However, the gcc 3.0 fixes should be in there already, according to what Geoff said. See http://www.htdig.org/files/snapshots/ -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-11-27 23:07:50
|
According to Joe R. Jah: > According to the ChangeLog file this snapshot was last changed on November > 3, but Gilles indicated last week that he had committed several fixes and > features to the CVS tree. Any ideas? Arrggh! Something has gone wrong with the snapshot script, obviously. I suspected something was up last week when we got a few complaints about the 3.1.6 snapshot needing autoconf to build, so I knew there was a problem with some file times. It seems now that it's not getting its CVS updates correctly. The patch below will get you up to date. (Use patch -p1 for this one.) diff -rup htdig-3.1.6-112501/ChangeLog htdig-3.1.6/ChangeLog --- htdig-3.1.6-112501/ChangeLog Sun Nov 11 03:17:22 2001 +++ htdig-3.1.6/ChangeLog Wed Nov 21 12:55:12 2001 @@ -1,3 +1,27 @@ +Wed Nov 21 12:54:42 2001 Gilles Detillieux <gr...@sc...> + + * htdoc/rundig.html: Added note about effect of changing database_base. + + * htmerge/docs.cc (convertDocs): Changed confusing message about + total doc db size in stats. + +Wed Nov 21 11:37:52 2001 Gilles Detillieux <gr...@sc...> + + * htsearch/TemplateList.cc (createFromString), htdoc/attrs.html: + Treat template_map as a _quoted_ string list. Change <i> tags to + the HTML-4.0 compliant <em> tags in builtin-long template. + +Tue Nov 20 17:13:27 2001 Gilles Detillieux <gr...@sc...> + + * htlib/String.cc (String, append, sub): Added checks for negative + lengths or start position to make code more fault-tolerant. + +Tue Nov 20 16:37:26 2001 Gilles Detillieux <gr...@sc...> + + * htfuzzy/Synonym.cc (createDB): Check for lines with less than + 2 words, to avoid segfault caused by calling Database::Put() with + negative length for data field. + Sat Nov 3 23:55:00 2001 Geoff Hutchison <ghu...@ws...> * htlib/htString.h: Add #include for ostream.h to solve compile diff -rup htdig-3.1.6-112501/htdoc/attrs.html htdig-3.1.6/htdoc/attrs.html --- htdig-3.1.6-112501/htdoc/attrs.html Sun Nov 4 03:17:19 2001 +++ htdig-3.1.6/htdoc/attrs.html Wed Nov 21 11:33:21 2001 @@ -7624,7 +7624,7 @@ <em>type:</em> </dt> <dd> - string list + quoted string list </dd> <dt> <em>used by:</em> @@ -8800,7 +8800,7 @@ </dl> <hr size="4" noshade> <!-- hhmts start --> -Last modified: $Date: 2001/11/02 18:29:55 $ +Last modified: $Date: 2001/11/21 17:33:20 $ <!-- hhmts end --> </body> </html> diff -rup htdig-3.1.6-112501/htdoc/rundig.html htdig-3.1.6/htdoc/rundig.html --- htdig-3.1.6-112501/htdoc/rundig.html Tue Sep 18 10:53:21 2001 +++ htdig-3.1.6/htdoc/rundig.html Wed Nov 21 12:55:25 2001 @@ -155,7 +155,10 @@ <a href="attrs.html#database_dir">database_dir</a> or <a href="attrs.html#common_dir">common_dir</a> attributes (you'll need to make the corresponding changes to the DBDIR - and COMMONDIR variables in the script), if you decide to + and COMMONDIR variables in the script), if you change the + <a href="attrs.html#database_base">database_base</a> + attribute (there's and embedded "db." filename in the + script), if you decide to use other fuzzy algorithms that need their own databases rebuilt, or if you change the names of the endings or synonyms databases or source files. Before customizing the @@ -181,7 +184,7 @@ </dl> <hr size="4" noshade> - Last modified: $Date: 2001/09/18 15:53:21 $ + Last modified: $Date: 2001/11/21 18:55:25 $ <br> <a href="http://sourceforge.net/"> <img src="http://sourceforge.net/sflogo.php?group_id=4593&type=1" width="88" height="31" border="0" alt="SourceForge Logo"></a> diff -rup htdig-3.1.6-112501/htfuzzy/Synonym.cc htdig-3.1.6/htfuzzy/Synonym.cc --- htdig-3.1.6-112501/htfuzzy/Synonym.cc Wed Mar 31 15:25:12 1999 +++ htdig-3.1.6/htfuzzy/Synonym.cc Tue Nov 20 16:42:24 2001 @@ -5,7 +5,7 @@ // // #if RELEASE -static char RCSid[] = "$Id: Synonym.cc,v 1.3.2.2 1999/03/31 21:25:12 grdetil Exp $"; +static char RCSid[] = "$Id: Synonym.cc,v 1.3.2.3 2001/11/20 22:42:25 grdetil Exp $"; #endif #include "Synonym.h" @@ -74,6 +74,16 @@ Synonym::createDB(Configuration &config) while (fgets(input, sizeof(input), fl)) { StringList sl(input, " \t\r\n"); + if (sl.Count() < 2) + { + if (debug) + { + cout << "htfuzzy/synonyms: Rejected line with less than 2 words: " + << input << endl; + cout.flush(); + } + continue; + } for (int i = 0; i < sl.Count(); i++) { data = 0; diff -rup htdig-3.1.6-112501/htlib/String.cc htdig-3.1.6/htlib/String.cc --- htdig-3.1.6-112501/htlib/String.cc Thu Jul 5 11:26:35 2001 +++ htdig-3.1.6/htlib/String.cc Tue Nov 20 17:15:31 2001 @@ -1,7 +1,7 @@ // // Implementation of String class // -// $Id: String.cc,v 1.16.2.5 2001/07/05 16:26:35 ghutchis Exp $ +// $Id: String.cc,v 1.16.2.6 2001/11/20 23:15:32 grdetil Exp $ // // Part of the ht://Dig package <http://www.htdig.org/> // Copyright (c) 1995-2001 The ht://Dig Group @@ -10,7 +10,7 @@ // <http://www.gnu.org/copyleft/gpl.html> // #if RELEASE -static char RCSid[] = "$Id: String.cc,v 1.16.2.5 2001/07/05 16:26:35 ghutchis Exp $"; +static char RCSid[] = "$Id: String.cc,v 1.16.2.6 2001/11/20 23:15:32 grdetil Exp $"; #endif @@ -61,7 +61,7 @@ String::String(char *s, int len) { Allocated = 0; Length = 0; - if (s && len != 0) + if (s && len > 0) copy(s, len, len); } @@ -143,7 +143,7 @@ void String::append(char *s) void String::append(char *s, int slen) { - if (!s || !slen) + if (!s || slen <= 0) return; // if ( slen == 1 ) @@ -258,7 +258,7 @@ int String::as_integer(int def) String String::sub(int start, int len) const { - if (start > Length) + if (start > Length || start < 0 || len < 0) return 0; if (len > Length - start) diff -rup htdig-3.1.6-112501/htmerge/docs.cc htdig-3.1.6/htmerge/docs.cc --- htdig-3.1.6-112501/htmerge/docs.cc Mon Mar 22 17:39:30 1999 +++ htdig-3.1.6/htmerge/docs.cc Wed Nov 21 12:50:50 2001 @@ -3,7 +3,7 @@ // // Indexing the "doc_db" database by id-number in "doc_index". // -// $Id: docs.cc,v 1.14.2.2 1999/03/22 23:39:30 grdetil Exp $ +// $Id: docs.cc,v 1.14.2.3 2001/11/21 18:50:50 grdetil Exp $ // // @@ -106,7 +106,7 @@ convertDocs(char *doc_db, char *doc_inde if (stats) { cout << "htmerge: Total documents: " << document_count << endl; - cout << "htmerge: Total doc db size (in K): "; + cout << "htmerge: Total size of documents (in K): "; cout << docdb_size / 1024 << endl; } diff -rup htdig-3.1.6-112501/htsearch/TemplateList.cc htdig-3.1.6/htsearch/TemplateList.cc --- htdig-3.1.6-112501/htsearch/TemplateList.cc Thu Feb 17 14:46:13 2000 +++ htdig-3.1.6/htsearch/TemplateList.cc Wed Nov 21 11:40:44 2001 @@ -1,50 +1,23 @@ // // TemplateList.cc // -// Implementation of TemplateList -// -// $Log: TemplateList.cc,v $ -// Revision 1.4.2.3 2000/02/17 20:46:13 grdetil -// * installdir/htdig.conf: quote all HTML tag parameters. -// * htsearch/TemplateList.cc (createFromString), installdir/long.html, -// installdir/short.html: Use $&(URL) in templates. -// -// Revision 1.4.2.2 2000/02/17 16:49:48 grdetil -// silly little typo. -// -// Revision 1.4.2.1 2000/02/17 16:46:26 grdetil -// [ Improve htsearch's HTML 4.0 compliance ] -// * htsearch/TemplateList.cc (createFromString): Use file name rather -// than internal name to select builtin-* templates, use $&(TITLE) in -// templates and quote HTML tag parameters. -// * installdir/long.html, installdir/short.html: Use $&(TITLE) in -// templates and quote HTML tag parameters. -// * htsearch/Display.cc (setVariables): quote all HTML tag parameters -// in generated select lists. -// * installdir/footer.html, installdir/header.html, -// installdir/nomatch.html, installdir/search.html, -// installdir/syntax.html, installdir/wrapper.html: -// Use $&(var) where appropriate, and quote HTML tag parameters. -// -// Revision 1.4 1999/01/17 20:29:37 ghutchis -// Ensure template_map config has three members for each template we add, -// contributed by <tl...@mb...>. -// -// Revision 1.3 1998/09/10 04:16:26 ghutchis -// -// More bug fixes. -// -// Revision 1.1 1997/02/03 17:11:05 turtle -// Initial revision -// +// TemplateList: As it sounds--a list of search result templates. Reads the +// configuration and any template files from disk, then retrieves +// the relevant template for display. +// +// Part of the ht://Dig package <http://www.htdig.org/> +// Copyright (c) 1995-2001 The ht://Dig Group +// For copyright details, see the file COPYING in your distribution +// or the GNU Public License version 2 or later +// <http://www.gnu.org/copyleft/gpl.html> // #if RELEASE -static char RCSid[] = "$Id: TemplateList.cc,v 1.4.2.3 2000/02/17 20:46:13 grdetil Exp $"; +static char RCSid[] = "$Id: TemplateList.cc,v 1.4.2.5 2001/11/21 17:40:45 grdetil Exp $"; #endif #include "TemplateList.h" -#include <URL.h> -#include <StringList.h> +#include "URL.h" +#include "QuotedStringList.h" //***************************************************************************** TemplateList::TemplateList() @@ -86,7 +59,7 @@ TemplateList::get(char *internalName) int TemplateList::createFromString(char *str) { - StringList sl(str, "\t \r\n"); + QuotedStringList sl(str, "\t \r\n"); String display, internal, file; Template *t; @@ -109,7 +82,7 @@ TemplateList::createFromString(char *str s << "<dl><dt><strong><a href=\"$&(URL)\">$&(TITLE)</a></strong>"; s << "$(STARSLEFT)\n"; s << "</dt><dd>$(EXCERPT)<br>\n"; - s << "<i><a href=\"$&(URL)\">$&(URL)</a></i>\n"; + s << "<em><a href=\"$&(URL)\">$&(URL)</a></em>\n"; s << " <font size=\"-1\">$(MODIFIED), $(SIZE) bytes</font>\n"; s << "</dd></dl>\n"; t->setMatchTemplate(s); -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-11-27 22:47:37
|
According to Ionut Nistor: > On Sat, 2001-11-24 at 06:44, Geoff Hutchison wrote: > > The point should be made here that the attributes are no longer as > > significant (and indeed obsolete in 3.2.0bX and later) because htsearch is > > now doing The Right Thing (TM) and decoding/encoding *all* SGML entities > > as appropriate. > Ah, great ! So from 3.2 no more translations in htdig, right ? Only > escapings in htsearch. Sorry, you're both wrong about this. Ionut, what Geoff said is htdig decodes and encodes all SGML entities as appropriate. That's something quite different than saying "no more translations in htdig". The fact is htdig 3.2 decodes all the same SGML entities as 3.1.x does, with the exception of ™. That doesn't solve the problem you had originally reported. Geoff, what 3.2 does is also not "The Right Thing" either, as there are a few remaining problems: 1) htdig only handles the 4 basic character entity references (ASCII characters) <, >, &, and ", as well as the ISO-8859-1 characters. Other entities, such as Greek, match and other symbols, as well as other accented characters (e.g. š) are not converted, but the "&" in them is converted to "&" by htsearch. This is a problem in 3.1 and 3.2. 2) Because other accented characters aren't converted, dealing with non-ISO-8859-1 accents is a problem for word matches. Even if the indexing system has working locales, and the source documents use the appropriate encoding, only encoded accented characters will be matched in the word search. SGML-encoded characters in the source documents won't be treated as equivalent to their single character encodings. Again, this is a problem with both 3.1 and 3.2. 3) When using non-Latin-1 encodings, e.g. ISO-8859-2, htdig still translates entities like é to the ISO-8859-1 8-bit character, and it goes in the database that way. So, if displayed using a different encoding in htsearch (3.1.x) it won't display as a e with acute accent, but as whatever character has that same encoding in Latin 2. In 3.2, htsearch maps the upper-half of the character set back to SGML entities, so this problem won't occur, but a much worse problem does occur - all properly encoded Latin 2 characters are mapped to Latin 1 SGML entities. This is still a big, unresolved bug in 3.2. The "right thing" to do would be to either not decode SGML entities at all, but somehow compensate for that in the word matching, or to decode all standard or proposed entities UNAMBIGUOUSLY so that you can map them back correctly in htsearch. This means not being limited to 256 characters in a single byte. htsearch would then have to be aware of the encoding used on output, and map the characters to the correct single character or SGML encoding as appropriate. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Gilles D. <gr...@sc...> - 2001-11-27 22:14:41
|
According to Joe R. Jah: > On Fri, 23 Nov 2001, Geoff Hutchison wrote: > > On Fri, 23 Nov 2001, Gilles Detillieux wrote: > > > Geoff, I hope you can do something about the configure tests, because I'm > > > > I think based on what Joe has said about the new 3.2 configure tests, I > > feel OK about trying to backport this. The new Solaris issue is pretty > > minor, but aggrivating. > > I was not precise in that message; Steps I took were slightly different > from FAQ#5.14; here's what I did: > > . Removed htlib/regex.c > . Removed htlib/regex.h > . Removed references to regex.o in htlib/Makefile (in 3.1.6-1111-1) > . Removed references to regex.h in htlib/Makefile (in 3.2.0b4-1111-1) > > Also attached is a list of error and warning lines in the config.log of > 3.1.6-111101, just in case;) OK, so for either 3.2.0b4 or 3.1.6, you need to manually override things to get rid of the bundled regex code, right? Geoff, I believe that you had already backported the 3.2 tests you had made. At least, as far as I can recall, you hadn't revised the 3.2 tests since that backport. The problem, as we had discussed back in early October, is that these tests are too simple to catch the rather subtle conflict on BSDI systems. On Oct. 4, you suggested checking the system type for "*-*-bsdi*", to make an explicit exception to the test for these systems. As far as I know, this hasn't been done in either 3.2.x or 3.1.x. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |