You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Gilles D. <gr...@sc...> - 2002-09-06 15:24:33
|
According to Stephan Hartmann:
> Hi developers,
>
> when i give a start_url with port 8080 (tomcat) and the webapp's servlet
> sends a redirect, htdig does not get any further. The reason seems to be that
> htdig does not include the port in the Host header of the first HTTP-Request.
> Example:
>
> the start_url is http://localhost:8080/mywebapp/myservlet/
>
> htdig sends this request:
>
> GET /robots.txt HTTP/1.0
> User-Agent: htdig/3.1.5 (myemail)
> Host: localhost
>
> i think, Host should be localhost:8080 instead. At least mozilla does this.
>
> Now if the servlet sends a redirect, it does send it without the port what
> leads to a wrong redirect.
>
> Can anybody confirm this behavior?
Yes. It's fixed in 3.1.6:
Fri Sep 14 09:18:38 2001 Gilles Detillieux <gr...@sc...>
* htdig/Document.cc (RetrieveHTTP): Add port to Host: header when
port is not default, as per RFC2616(14.23). Fixes bug #459969.
--
Gilles R. Detillieux E-mail: <gr...@sc...>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
|
|
From: Gilles D. <gr...@sc...> - 2002-09-06 14:59:05
|
According to Brian White: > At 09:10 6/09/2002, Gilles Detillieux wrote: > >Well, while going with POSIX-compliant locking would help with > >portability, I'm not sure all systems currently supported by 3.1.x > >are fully POSIX-compliant either, so it may be that some only support > >flock(), or even perhaps no locking at all. Some configure tests for > >various locking schemes should be implemented, so the code uses what > >the system provides, or no locking at all if nothing appropriate is found. > > I already have "locked" and "unlocked" versions of the code, managed by > an #ifdef - I would just have to add a -D__NO_FILE_LOCKING__ or something > like that. Yes. Ideally, though, it would be automated via configure tests. For example, you test for the flock() call and define HAVE_FLOCK in htconfig.h. Then the code uses #ifdef HAVE_FLOCK. Similarly, you define something like HAVE_FCNTL_LOCK if that capability exists. That test is a bit more complex, as it's not just testing for the existance of a library function. > I assume this means I would need to create a patch for the configure > script - any tips on how to do that? Is that monster *really* maintained > solely by hand or is there some tools for it? It's generated from configure.in by the autoconf program. So, configure.in is the monster we maintain by hand, which isn't quite as big and scary as the configure script itself. Still, you need to learn enough about autoconf to get by, which is more than I know at this point. > > > 3) It should be simple enough to create a patch that works with 3.2.x, > > > judging by a quick look at the latest Display.cc in the CVS repository. > > > > > > I *would* like to get it rolled into 3.1.x if I can. I am > > > more than willing to make any changes required to make this > > > happen. > > > >I think it would be good to see this in the 3.2 CVS tree, with the > >appropriate configure tests. I'm still a bit lukewarm on the addition > >of the "init" input parameter to htsearch. It seems the absense of a > >"page" parameter would mean the same thing, wouldn't it? > > You know, I hadn't even thought of that. The only disadvantage to it > is that it isn't explicit - I can see someone setting "Page=1" for their > initial search and wondering why their logging doesn't work. The only > way around this would be documentation, with notes > > 1) Where the "page" parameter is discuseed > 2) Where the logging attributes are discussed > 3) In the FAQs > > Otherwise - Yes! Perfect! Not to mention writing attrs.html entries (and links in cf_by????.html) for all the new attributes. This is of course easier in 3.2. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Stephan H. <be...@be...> - 2002-09-06 14:49:00
|
Hi developers, when i give a start_url with port 8080 (tomcat) and the webapp's servlet sends a redirect, htdig does not get any further. The reason seems to be that htdig does not include the port in the Host header of the first HTTP-Request. Example: the start_url is http://localhost:8080/mywebapp/myservlet/ htdig sends this request: GET /robots.txt HTTP/1.0 User-Agent: htdig/3.1.5 (myemail) Host: localhost i think, Host should be localhost:8080 instead. At least mozilla does this. Now if the servlet sends a redirect, it does send it without the port what leads to a wrong redirect. Can anybody confirm this behavior? Bye, Stephan |
|
From: Gilles D. <gr...@sc...> - 2002-09-06 14:48:44
|
According to Joe R. Jah: > Would you please list the patches you have already committed to CVS, and > those you may, so that we can carry over the rest as patches to 3.1.7 > folder. Actually, I haven't begun committing changes to CVS yet. As for 3.1.6, I'll probably wait until I have a sufficiently large and complete to-do list, and a good chunk of time I can devote to the task (which I don't have now), and then get a flurry of commits happening. All I have right now is a to-do list of 23 bug fixes, some of which exist in patches and some of which need to be written still. I also need to go through the set of existing patches to see what's ready to use as-is, what needs tweaking/configure test/documentation, and what I'll exclude. > Here is the list as of: Thu Sep 5 17:38:47 PDT 2002: > > Patch # of downloads > ----- -------------- > ssl.9 193 > timet_enddate.1 182 > Makefile.0 78 > documentation.1 68 > metadate.0 65 > redirect.0 51 > documentation.2 50 > NUL.0 47 > AdjustableLoggingPatch.tar.gz 44 > fileSpace.1 42 > titleSpace.0 36 > multiple-noindex.1 34 > Date-viewing.0 32 > time_t.0 27 > gcc-3.1.0 22 > ExecutionTime.0 9 > ExternalParser-max_doc_size.0 7 > htnotifyNull.0 6 Offhand, I'd break them down thus... Include as-is Leave out ------------- --------- Date-viewing.0 AdjustableLoggingPatch.tar.gz ExternalParser-max_doc_size.0 ExecutionTime.0 Makefile.0 multiple-noindex.1 documentation.1 ssl.9 documentation.2 titleSpace.0 gcc-3.1.0 metadate.0 redirect.0 time_t.0 timet_enddate.1 Unsure/needs work ----------------- fileSpace.1 (new feature, needs docs, but simple/clean/portable & in demand) NUL.0 (needs config attribute & docs, adds overhead) htnotifyNull.0 (still has problems with in.bad() handling) For those who are interested, my current, sketchy to-do list for 3.1.7 is... - back out Gabriele's CVS changes of Aug 13 - fix "not HTML" error message to something like "unknown Content-type". + server_wait_time is currently misspelled in cf_byname.html + string list description explains quoted string list in cf_types.html + htmerge -m is unclear (fixed in maindocs) + Marchand's patch to htsearch/Display.cc (fix enddate bug) + Marchand's patch to Makefile.config.in (use DEFS) + fix parsedcdate() in Retriever.cc to allow '-' after year - fix parsedcdate() in Retriever.cc to handle server's local timezone + "dc.date.modified" handling patch (May 17) - handle -ve scores and/or locations in WordList::Word() - fix parsers not to overflow location calc (find e-mail about this) - handle location_factor attr. in WordList::Word(), check bounds - checks for -ve scores in Display.cc - better handling of multimatch_factor, using a new count field in DocMatch - keep docdb records for noindex docs, just not words, so updates check these - don't delete ANCHOR just because it's not in excerpt - better handling of sup & sub tags in HTML.cc, optionally treat as punctuation - new catdoc link: http://www.ice.ru/~vitus/catdoc/ in contrib/* parsers - less verbose output from htnotify -v, require 2 or more v's for that - Martin Vorlaender's VMS patches + patch #548448 dealing with unsigned time_t (Apr 25) - handle nulls in text/* files (convert to space) where + means fixed in maindocs or a complete patch, and - means needs work. As you can see, my to-do list and the list of patches I want aren't even mutually complete, though most patches I want are mentioned in the to-do list. I've no doubt missed some things on my list that have been discussed as important/urgent before, but never got around to noting them. If anyone wants to help complete the list, or better yet knock off (i.e. implement) some items on the list, more power to you. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Geoff H. <ghu...@ws...> - 2002-09-06 14:45:50
|
On Fri, 6 Sep 2002, Brian White wrote: > I was looking at defaults.cc and I was wondering if > it might be better managing the info as an XML file > and then using that as a basis for generating > defaults.cc and the HTML docs. Yes, this would actually be quite wonderful. Currently, it's hard to "validate" changes you make to defaults.cc. It's also a minor pain to insert and format HTML, since it has to be properly escaped. (No, it's not a big deal, but XML would obviously be easier.) > Disadvantages > * Part of the build process for exexcutabe would require > perl to exist No, not really. We have lots of "autogenerated" files in 3.2. You'd only need Perl if you modified defaults.xml and needed to generate the new defaults.cc. > * After rabbiting on like this I now have to decide > if I willing to put my money where my mouth is..... Yes, now that's the question. :-) Formatting defaults.cc into defaults.xml isn't hard and I'd be glad to do that with some emacs macros. But I'd be glad to accept this change if you (or someone) will write the defaults_generate.pl script. Ideally the script would have some nice error-checking to tell you if you've left out a field, etc. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: Geoff H. <ghu...@ws...> - 2002-09-06 14:35:26
|
On Fri, 6 Sep 2002, Brian White wrote: > Ok - my desire to get it into 3.1.x is based around the fact that > we have installed 3.1.6 at a large client site, with the AdjustableLogging > patch installed. In fact, it was written for their installation. > It makes long term support slightly easier if the product is *fully* off the > shelf. No offense, but there are a variety of packaging mechanisms which will also add a patch (.rpm, .deb, etc.) for various local modifications. I also would agree with Gilles that your patch seems like a rather large feature to be adding when we really want to "finish" 3.1.x releases. > However, that said - getting it into the 3.2 CVS tree means by the > time it ever becomes a genuine issue, a stable 3.2 release should > be available for use. I would be happy with that. OK, then let's talk about getting your patch into the 3.2.0b4 snapshots. If we get it in shortly, we'll probably have a beta release or two to catch any portability problems. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: <no...@so...> - 2002-09-06 11:01:30
|
Patches item #605517, was opened at 2002-09-06 13:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=304593&aid=605517&group_id=4593 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin Vorlaender (martinv2) Assigned to: Nobody/Anonymous (nobody) Summary: fix for SSL patch to 3.1.6 Initial Comment: I applied the SSL patch from ftp://ftp.ccsf.org/htdig- patches/3.1.6/ssl.9 to the VMS port, and hit the following showstopper: On platforms without a /dev/u?random device or an EGD daemon (e.g. VMS ;-), the SSL PRNG is seeded from a file. For this to work, the application must call RAND_load_file() or else a connect fails with an "PRNG not seeded" error message (new behaviour since OpenSSL 0.9.5). When I insert this call into htlib/Connection.cc's Connection::initSSL, SSL connections do work. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=304593&aid=605517&group_id=4593 |
|
From: Jim C. <gre...@yg...> - 2002-09-06 03:51:31
|
Gilles Detillieux's bits of Thu, 5 Sep 2002 translated to: >According to Jim Cole: >> I think there is a bug in htnotify's readPreAndPostamble(). Both >> htnotify_prefix_file and htnotify_suffix_file have a default >> value of "", but the code only checks for NULL when examining the >> values of prefixfile and suffixfile. The code then proceeds to >> create ifstream objects using the default values. Finally, the >> streams are checked with 'if (! in.bad())'; however the ifstream >> constructor sets failbit, rather than badbit, when it is unable >> to open the specified file. The result is that the code drops >> into a while loop and starts extracting from an undefined stream >> object. ... >Yes, this was reported just a few weeks ago, and a patch was provided. >See ftp://ftp.ccsf.org/htdig-patches/3.1.6/ Sorry. I somehow missed that post. However, even with the patch I believe the code in both 3.1.6 and 3.2.x is incorrect. If the name of a non-existent file was provided, the same problem with infinite looping could occur. As I understand the standard, the check of in.bad() is of no use with regard to whether the file was opened successfully. If on the other hand in.good() is checked, then it would ensure that neither badbit nor failbit is set. Jim |
|
From: Brian W. <bw...@st...> - 2002-09-06 02:50:47
|
I was looking at defaults.cc and I was wondering if
it might be better managing the info as an XML file
and then using that as a basis for generating
defaults.cc and the HTML docs.
The fields are
struct ConfigDefaults
{
char *name; // Name of the attribute
char *value; // Default value
char *type; // Type of the value (string, integer, boolean)
char *programs; // Whitespace separated list of programs/modules
using this attribute
char *block; // Configuration block this can be used in (can be
blank)
char *version; // Version that introduced the attribute
char *category; // Attribute category (to split documentation)
char *example; // Example usage of the attribute (HTML)
char *description; // Long description of the attribute (HTML)
};
I can see programming uses for name, value, type and maybe programs.
I assume all the rest is just for documentation
It would be simple enough to write a perl script that
extracted the necesary fields to create defaults.cc,
that only had what was actually needed for the program,
and then something a bit cleverer written to create the
HTML pages.
( I just noticed the perl script that uses
defaults.cc to generate the doc pages )
Advantages
* It would put all the default info into a
much easier to edit and documentable format
* It would make it much clearer which values were
required in the code and which were there
for documentation.
* It would reduce the size of the executable by
about 80000 characters ( 80K or maybe 160 K)
Disadvantages
* Part of the build process for exexcutabe would require
perl to exist
* The current system, be it a bit clunky to my eyes,
does work, and does solve the problem of trying
to maintain concurrently the code version and
the documentation version of the attributes.
* After rabbiting on like this I now have to decide
if I willing to put my money where my mouth is.....
Regs
Brian
-------------------------
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML & XML
Phone: +612-93197901
Web: http://www.steptwo.com.au/
Email: bw...@st...
Content Management Requirements Toolkit
112 CMS requirements, ready to cut-and-paste
|
|
From: Brian W. <bw...@st...> - 2002-09-06 01:33:41
|
At 09:10 6/09/2002, Gilles Detillieux wrote: > > 2) If the issue is the portability of flock, would it be > > acceptable if I changed it over to using fcntl? > > > > (Mr Google threw up the follwoing page which says that "fcntl() is the > > only POSIX-compliant locking mechanism, and is therefore the only > > truly portable lock" > > > > http://www.erlenstar.demon.co.uk/unix/faq_3.html > > ) > >Well, while going with POSIX-compliant locking would help with >portability, I'm not sure all systems currently supported by 3.1.x >are fully POSIX-compliant either, so it may be that some only support >flock(), or even perhaps no locking at all. Some configure tests for >various locking schemes should be implemented, so the code uses what >the system provides, or no locking at all if nothing appropriate is found. I already have "locked" and "unlocked" versions of the code, managed by an #ifdef - I would just have to add a -D__NO_FILE_LOCKING__ or something like that. I assume this means I would need to create a patch for the configure script - any tips on how to do that? Is that monster *really* maintained solely by hand or is there some tools for it? > > 3) It should be simple enough to create a patch that works with 3.2.x, > > judging by a quick look at the latest Display.cc in the CVS repository. > > > > I *would* like to get it rolled into 3.1.x if I can. I am > > more than willing to make any changes required to make this > > happen. > >I think it would be good to see this in the 3.2 CVS tree, with the >appropriate configure tests. I'm still a bit lukewarm on the addition >of the "init" input parameter to htsearch. It seems the absense of a >"page" parameter would mean the same thing, wouldn't it? You know, I hadn't even thought of that. The only disadvantage to it is that it isn't explicit - I can see someone setting "Page=1" for their initial search and wondering why their logging doesn't work. The only way around this would be documentation, with notes 1) Where the "page" parameter is discuseed 2) Where the logging attributes are discussed 3) In the FAQs Otherwise - Yes! Perfect! >As for 3.1.x, though, here are my thoughts. I'm quite adament about >not wanting to put out a 3.1.8 release. So, that means I have to be >very adament about getting 3.1.7 right, with no new bugs or portability >problems. To do that, I think I'm going to need to put my foot down as >far as the feature freeze, and insist that only bug fixes go into 3.1.7, >and no new features. The only discussed new feature for 3.1.7 that I >haven't completely ruled out yet is location_factor, because it's tied >to some bug fixes in WordList::Word() anyway, and had been planned for >3.1.6 but fell through the cracks. I may drop this attribute anyway, >and stick to just bug fixes. Ok - my desire to get it into 3.1.x is based around the fact that we have installed 3.1.6 at a large client site, with the AdjustableLogging patch installed. In fact, it was written for their installation. It makes long term support slightly easier if the product is *fully* off the shelf. However, that said - getting it into the 3.2 CVS tree means by the time it ever becomes a genuine issue, a stable 3.2 release should be available for use. I would be happy with that. >-- >Gilles R. Detillieux E-mail: <gr...@sc...> >Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ >Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------- Brian White Step Two Designs Pty Ltd Knowledge Management Consultancy, SGML & XML Phone: +612-93197901 Web: http://www.steptwo.com.au/ Email: bw...@st... Content Management Requirements Toolkit 112 CMS requirements, ready to cut-and-paste |
|
From: Joe R. J. <jj...@cl...> - 2002-09-06 00:53:30
|
On Thu, 5 Sep 2002, Gilles Detillieux wrote:
> Date: Thu, 5 Sep 2002 18:10:53 -0500 (CDT)
> From: Gilles Detillieux <gr...@sc...>
> To: Brian White <bw...@st...>
> Cc: htd...@li...
> Subject: Re: [htdig-dev] Adjustable logging patch.
>
> As for 3.1.x, though, here are my thoughts. I'm quite adament about
> not wanting to put out a 3.1.8 release. So, that means I have to be
> very adament about getting 3.1.7 right, with no new bugs or portability
> problems. To do that, I think I'm going to need to put my foot down as
> far as the feature freeze, and insist that only bug fixes go into 3.1.7,
> and no new features. The only discussed new feature for 3.1.7 that I
> haven't completely ruled out yet is location_factor, because it's tied
> to some bug fixes in WordList::Word() anyway, and had been planned for
> 3.1.6 but fell through the cracks. I may drop this attribute anyway,
> and stick to just bug fixes.
Would you please list the patches you have already committed to CVS, and
those you may, so that we can carry over the rest as patches to 3.1.7
folder. Here is the list as of: Thu Sep 5 17:38:47 PDT 2002:
Patch # of downloads
----- --------------
ssl.9 193
timet_enddate.1 182
Makefile.0 78
documentation.1 68
metadate.0 65
redirect.0 51
documentation.2 50
NUL.0 47
AdjustableLoggingPatch.tar.gz 44
fileSpace.1 42
titleSpace.0 36
multiple-noindex.1 34
Date-viewing.0 32
time_t.0 27
gcc-3.1.0 22
ExecutionTime.0 9
ExternalParser-max_doc_size.0 7
htnotifyNull.0 6
Regards,
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah jj...@cl...
|
|
From: Gilles D. <gr...@sc...> - 2002-09-05 23:11:19
|
According to Brian White: > >According to J. op den Brouw: > > > It's a nice patch for those who cannot use syslog facilities, but > > > the patch removes the syslog logging feature. It would be nice > > > to select one of them (or have them both) on compile or run time > > > basis. > > > > > > It's also a patch against 3.1.6. It would be nice if there's a > > > patch for 3.2.0b4-xxxx too. > > > > > > Furthermore, I see a flock() call somewhere. AFAIK, different > > > OS-es use different names and parameter lists. Example > > > > > > HP-UX: int lockf(int fildes, int function, off_t size); > > > Linux 2.2: int flock(int fd, int operation); > > > >I hadn't noticed when I looked at the patch that it completely removed > >the ability to log to syslog(). That's one more reason to reject > >it for 3.1.x. I rejected it over concerns about portability, as you > >pointed out. I don't think it's appropriate for inclusion in 3.1.7 > >either for that reason. > > Ok. > > 1) The patch does not remove the ability to do syslog. In my notes > that go with the patch it says: > > > * logging_file ( Default: none ) > > > > If this is set to "none", then it will log using syslog, otherwise > > this will be assumed to be the path to the log file > > > The whole way it is set up, it uses the existing default > behaviour if it isn't explicitly activated. Good. I didn't recall seeing any red flags go up in regards to this last time I looked at your patch, but that was a while ago. I didn't review your patch when Jesse made this statement, so I took his word for it. > 2) If the issue is the portability of flock, would it be > acceptable if I changed it over to using fcntl? > > (Mr Google threw up the follwoing page which says that "fcntl() is the > only POSIX-compliant locking mechanism, and is therefore the only > truly portable lock" > > http://www.erlenstar.demon.co.uk/unix/faq_3.html > ) Well, while going with POSIX-compliant locking would help with portability, I'm not sure all systems currently supported by 3.1.x are fully POSIX-compliant either, so it may be that some only support flock(), or even perhaps no locking at all. Some configure tests for various locking schemes should be implemented, so the code uses what the system provides, or no locking at all if nothing appropriate is found. > 3) It should be simple enough to create a patch that works with 3.2.x, > judging by a quick look at the latest Display.cc in the CVS repository. > > I *would* like to get it rolled into 3.1.x if I can. I am > more than willing to make any changes required to make this > happen. I think it would be good to see this in the 3.2 CVS tree, with the appropriate configure tests. I'm still a bit lukewarm on the addition of the "init" input parameter to htsearch. It seems the absense of a "page" parameter would mean the same thing, wouldn't it? As for 3.1.x, though, here are my thoughts. I'm quite adament about not wanting to put out a 3.1.8 release. So, that means I have to be very adament about getting 3.1.7 right, with no new bugs or portability problems. To do that, I think I'm going to need to put my foot down as far as the feature freeze, and insist that only bug fixes go into 3.1.7, and no new features. The only discussed new feature for 3.1.7 that I haven't completely ruled out yet is location_factor, because it's tied to some bug fixes in WordList::Word() anyway, and had been planned for 3.1.6 but fell through the cracks. I may drop this attribute anyway, and stick to just bug fixes. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gilles D. <gr...@sc...> - 2002-09-05 21:30:02
|
According to Geoff Hutchison: > I had a brief brainstorm on my run today as far as profiling the > indexing. Obviously htword/mifluz performance still needs to improve > significantly. But another slowdown relative to 3.1 is from the way 3.2 > treats hopcounts. To ensure that restricting indexes by hopcount works > correctly, the "queue" for URLs is really a priority queue. URLs with > lower hopcounts move up the heap. Of course this requires some sorting > and some overhead. > > Right now, I don't think this needs to happen *unless* we're restricting > indexing based on hopcount. So the proposal is that when we're not > restricting by hopcount, the Server objects would switch back to the > previous system (i.e. no sorting). > > I think this should shave a few percent off of indexing. Does this seem > like an OK idea? Can anyone come up with an example where this would be > a Bad Idea(tm)? I can't think of a problem offhand. Sounds reasonable to me. Of course, you probably understand this aspect of the code better than any of us. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gilles D. <gr...@sc...> - 2002-09-05 20:54:46
|
According to Gabriele Bartolini: > Ciao Romain, > > as far as I know, now htdig doesn't support it yet, but you could > easily hack the code to make it work. I have something to complain about > this way of negotiating a request by the CMS, because HTTP says the when > no Accept is given, every media type is accepted by the client, but ... > it's ok. > > However, I think this is a good point to analyse for the 3.2 code. We > should somehow let the Web server know what kind of media types htdig is > able to understand, by listing all of them (default ones plus those > managed through external parses' help). > > What d'u think guys? Well, I certainly don't have a problem with htdig 3.2 having support for the Accept header in its requests. In fact, it does sound like a good idea. However, Romain's web site is broken! htdig 3.1.5 is an HTTP/1.0 client, and in RFC 1945, which defines the HTTP 1.0 protocol, the Accept request header is only mentioned in an appendix, where it states that this "... header field can be used to indicate a list..." (Note: can be used, not MUST be used!) I.e. this is not to be treated as a required header, and many HTTP/1.0 clients will not put out this header. Any server that requires this of an HTTP/1.0 client is broken. Even RFC 2068, which defines HTTP/1.1, says "... can be used ...", and also "If no Accept header field is present, then it is assumed that the client accepts all media types." If a web site cannot render content properly without the Accept header, it is not compliant with this standard. Fixing htdig to work around this bug may allow htdig to index the site, but it won't prevent problems with other standards- compliant web clients navigating this site, if they happen not to put out this header either. Workarounds for bugs like this should be a last resort, when it's impossible to fix the real problem, and not a first resort to avoid even attempting to get at the problem. > Il mer, 2002-08-28 alle 15:14, rl...@bn... ha scritto: > > I want to index my web site using htdig. > > > > However, my web site, using a CMS , needs the "Accept " HTTP Header, in > > order to render the dynamic content properly. > > > > htdig does not send this Header. > > > > How can I define custom HTTP Headers for the robot : > > using htdig.conf ? > > modifying the source code ? > > > > PS: > > I am using a compiled htdig v3.1.5 on an AIX v4.3 box -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gilles D. <gr...@sc...> - 2002-09-05 16:43:57
|
According to Jim Cole: > I think there is a bug in htnotify's readPreAndPostamble(). Both > htnotify_prefix_file and htnotify_suffix_file have a default > value of "", but the code only checks for NULL when examining the > values of prefixfile and suffixfile. The code then proceeds to > create ifstream objects using the default values. Finally, the > streams are checked with 'if (! in.bad())'; however the ifstream > constructor sets failbit, rather than badbit, when it is unable > to open the specified file. The result is that the code drops > into a while loop and starts extracting from an undefined stream > object. > > The problem doesn't occur in the 3.2 branch because in addition > to checking for NULL prefixfile/suffixfile, the code also checks > the values of *prefixfile and *suffixfile. Yes, this was reported just a few weeks ago, and a patch was provided. See ftp://ftp.ccsf.org/htdig-patches/3.1.6/ -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gabriele B. <g.b...@co...> - 2002-09-05 14:57:43
|
Hi guys, here is a patch for Cookies support in ht://Dig 3.1.6. I already sent it to the patch e-mail, but I thought it would have been useful to warn you too. Ciao -Gabriele -- Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |
|
From: Tony J. <tja...@mg...> - 2002-09-04 16:53:18
|
Hi Geoff, If found it. i must put if the form <input type=3D'hidden' name=3D"restrict"= =20 value=3D"http://NDD/"> But now when i search with number (ex : 2001), the result is "not found" Have you got any ideas ? Best reagrds At 12:28 PM 9/4/02 -0400, Geoff Hutchison wrote: >Bonjour Tony, > >If I understand correctly, you want to index the whole site and then also >have search forms which restrict the search somewhat. Depending on how >complicated the subset is, you can either do two things: > >1) Have two separate databases (and configuration files) >See <http://www.htdig.org/FAQ.html#q4.4> for more. >2) Use the restrict and exclude fields of the search form to filter search >results by URL. This is best when you have something like: > >http://www.foo.com/ >http://www.foo.com/mail-archives/ > >(and you know that you want a search all within the mail-archives >directory). > >Is this what you're interested in? > >-- >-Geoff Hutchison >Williams Students Online >http://wso.williams.edu/ > > >On Mon, 2 Sep 2002, Tony Jarriault wrote: > > > > > Hi, > > I am french developper, and i want to index my site such as Index= server > > "Microsoft" > > > > I would like to be able to index a site in entirety then on the same=20 > site a > > repertoire. In order to have 2 search engines on the same=20 > site. Including > > one being more precise on a given subject. > > > > Is this possible? If so, how then I to make? > > > > Thank you by advance > > > > > > Tony > > > > > > ----------------------------------------------------------------------- > > Service webmaster : mailto:web...@mg... > > Tel : 01-34-49-06-69 > > MGN : http://www.mgn.fr > > ----------------------------------------------------------------------- > > > > Tony Jarriault > > mailto:tj...@mg... > > Tel : 01-34-49-06-43 > > MATRA GLOBAL NETSERVICES > > Societ=E9 du groupe PROSODIE > > 8, rue Grange Dame Rose > > 78140 V=E9lizy > > ----------------------------------------------------------------------- Service webmaster : mailto:web...@mg... Tel : 01-34-49-06-69 MGN : http://www.mgn.fr ----------------------------------------------------------------------- Tony Jarriault mailto:tj...@mg... Tel : 01-34-49-06-43 MATRA GLOBAL NETSERVICES Societ=E9 du groupe PROSODIE 8, rue Grange Dame Rose 78140 V=E9lizy |
|
From: Geoff H. <ghu...@ws...> - 2002-09-04 16:28:29
|
Bonjour Tony, If I understand correctly, you want to index the whole site and then also have search forms which restrict the search somewhat. Depending on how complicated the subset is, you can either do two things: 1) Have two separate databases (and configuration files) See <http://www.htdig.org/FAQ.html#q4.4> for more. 2) Use the restrict and exclude fields of the search form to filter search results by URL. This is best when you have something like: http://www.foo.com/ http://www.foo.com/mail-archives/ (and you know that you want a search all within the mail-archives directory). Is this what you're interested in? -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ On Mon, 2 Sep 2002, Tony Jarriault wrote: >=20 > Hi, > I am french developper, and i want to index my site such as Index server= =20 > "Microsoft" >=20 > I would like to be able to index a site in entirety then on the same site= a=20 > repertoire. In order to have 2 search engines on the same site. Includi= ng=20 > one being more precise on a given subject. >=20 > Is this possible? If so, how then I to make? >=20 > Thank you by advance >=20 >=20 > Tony >=20 >=20 > ----------------------------------------------------------------------- > Service webmaster : mailto:web...@mg... > Tel : 01-34-49-06-69 > MGN : http://www.mgn.fr > ----------------------------------------------------------------------- >=20 > Tony Jarriault > mailto:tj...@mg... > Tel : 01-34-49-06-43 > MATRA GLOBAL NETSERVICES > Societ=E9 du groupe PROSODIE > 8, rue Grange Dame Rose > 78140 V=E9lizy >=20 |
|
From: Geoff H. <ghu...@ws...> - 2002-09-04 16:24:38
|
On Wed, 4 Sep 2002, Walantis Giosis wrote: > The ID bytes for length informations (excerpt length, docume size, URL > length) varies. Say we have a document size of less than 100h bytes. > Then the ID byte has the value 44h for that information. The size > needs only one byte. If the size exceeds 100h bytes (it needs two or > more bytes) then the ID byte has the value 84h. What's the logic > behind this ? Only to determine the byte count for the size ? At the > moment I've handled it using a switch/case statement. Hans-Peter Nilsson rewrote the Serialize/Deserialize routines very carefully, so I can't speak authoritatively. I think he was trying to save as much space as possible. AFAICT, there's a marker indicating that the next variable coming up is sizeof() whatever. Take a look at htcommon/DocumentRef.cc::Serialize() to see the code. > And why is the document size information stored twice in the database ? They should be different. See htcommon/DocumentRef.[cc,h] which deals with the document DB records. In particular, there's the text size of the database and optionally, it can figure out the size of the document including all images. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: mosher <mo...@xr...> - 2002-09-04 13:44:07
|
Hello developers, I've analyzed the binary document-database format so that I'm now able to extract the informations without using the textual database. But there's one thing I couldn't figure out: The ID bytes for length informations (excerpt length, docume size, URL length) varies. Say we have a document size of less than 100h bytes. Then the ID byte has the value 44h for that information. The size needs only one byte. If the size exceeds 100h bytes (it needs two or more bytes) then the ID byte has the value 84h. What's the logic behind this ? Only to determine the byte count for the size ? At the moment I've handled it using a switch/case statement. And why is the document size information stored twice in the database ? Thanks in advance, Walantis -- l8r, Walantis http://www.xraw.de |
|
From: Walantis G. <wal...@xr...> - 2002-09-04 13:34:07
|
Hello developers, I've analyzed the binary document-database format so that I'm now able to extract the informations without using the textual database. But there's one thing I couldn't figure out: The ID bytes for length informations (excerpt length, docume size, URL length) varies. Say we have a document size of less than 100h bytes. Then the ID byte has the value 44h for that information. The size needs only one byte. If the size exceeds 100h bytes (it needs two or more bytes) then the ID byte has the value 84h. What's the logic behind this ? Only to determine the byte count for the size ? At the moment I've handled it using a switch/case statement. And why is the document size information stored twice in the database ? Thanks in advance, Walantis -- l8r, Walantis http://www.xraw.de |
|
From: J. op d. B. <ht...@op...> - 2002-09-04 11:18:49
|
Hi all, I've been away sometime, cruising with the family. My daughter is 5,5 months now. Also, my account ms...@st... is disabled since aug, 12th 2002. It will not receive any mail nor will it forward to my new account. My new e-mail address for htdig matters is ht...@op... Greetz from Holland --Jesse |
|
From: Adam B. <ad...@fr...> - 2002-09-03 23:14:43
|
Hi, The site I am indexing uses cookies for authorisation. I am guessing I wi= ll=20 need to write a wrapper to htdig to log in to the site with the appropria= te=20 user name and password and then store the cookie data in Htdig somewhere = so=20 that it is authorised to browse the site.=20 How do I do this or could someone please point me to the appropriate=20 documentation. thanks, Adam |
|
From: Tony J. <tja...@mg...> - 2002-09-02 16:32:29
|
Hi, I am french developper, and i want to index my site such as Index server=20 "Microsoft" I would like to be able to index a site in entirety then on the same site a= =20 repertoire. In order to have 2 search engines on the same site. Including= =20 one being more precise on a given subject. Is this possible? If so, how then I to make? Thank you by advance Tony ----------------------------------------------------------------------- Service webmaster : mailto:web...@mg... Tel : 01-34-49-06-69 MGN : http://www.mgn.fr ----------------------------------------------------------------------- Tony Jarriault mailto:tj...@mg... Tel : 01-34-49-06-43 MATRA GLOBAL NETSERVICES Societ=E9 du groupe PROSODIE 8, rue Grange Dame Rose 78140 V=E9lizy |
|
From: Geoff H. <ghu...@us...> - 2002-09-01 07:13:54
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
(mifluz merge essentially finished, contact Geoff for patch to test)
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
SHOWSTOPPERS:
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#859)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#648.) Also make sure these config
attributes are all documented in defaults.cc, even if they're only set by
input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
(It should probably just set everything to empty) This relates to
PR#348.
|