htdig-dev Mailing List for ht://Dig (Page 79)

Brought to you by: angusgb, grdetil, lha, nealr, scherpbier

htdig-dev — Developer Discussion for the ht://Dig project

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (47)	Nov (74)	Dec (66)
2002	Jan (95)	Feb (102)	Mar (83)	Apr (64)	May (55)	Jun (39)	Jul (23)	Aug (77)	Sep (88)	Oct (84)	Nov (66)	Dec (46)
2003	Jan (56)	Feb (129)	Mar (37)	Apr (63)	May (59)	Jun (104)	Jul (48)	Aug (37)	Sep (49)	Oct (157)	Nov (119)	Dec (54)
2004	Jan (51)	Feb (66)	Mar (39)	Apr (113)	May (34)	Jun (136)	Jul (67)	Aug (20)	Sep (7)	Oct (10)	Nov (14)	Dec (3)
2005	Jan (40)	Feb (21)	Mar (26)	Apr (13)	May (6)	Jun (4)	Jul (23)	Aug (3)	Sep (1)	Oct (13)	Nov (1)	Dec (6)
2006	Jan (2)	Feb (4)	Mar (4)	Apr (1)	May (11)	Jun (1)	Jul (4)	Aug (4)	Sep	Oct (4)	Nov	Dec (1)
2007	Jan (2)	Feb (8)	Mar (1)	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov	Dec
2008	Jan (1)	Feb	Mar (1)	Apr (2)	May	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec
2009	Jan	Feb	Mar (2)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2011	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2016	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec

Flat | Threaded

<< < 1 .. 77 78 79 80 81 .. 108 > >> (Page 79 of 108)

Re: [htdig-dev] cannot search sites other than port 80

From: Gilles D. <gr...@sc...> - 2002-09-06 15:24:33

According to Stephan Hartmann:
> Hi developers,
> 
> when i give a start_url with port 8080 (tomcat) and the webapp's servlet 
> sends a redirect, htdig does not get any further. The reason seems to be that 
> htdig does not include the port in the Host header of the first HTTP-Request.
> Example:
> 
> the start_url is http://localhost:8080/mywebapp/myservlet/
> 
> htdig sends this request:
> 
> GET /robots.txt HTTP/1.0
> User-Agent: htdig/3.1.5 (myemail)
> Host: localhost
> 
> i think, Host should be localhost:8080 instead. At least mozilla does this.
> 
> Now if the servlet sends a redirect, it does send it without the port what 
> leads to a wrong redirect.
> 
> Can anybody confirm this behavior?

Yes.  It's fixed in 3.1.6:

Fri Sep 14 09:18:38 2001  Gilles Detillieux  <gr...@sc...>

        * htdig/Document.cc (RetrieveHTTP): Add port to Host: header when
        port is not default, as per RFC2616(14.23). Fixes bug #459969.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

Re: [htdig-dev] Adjustable logging patch.

From: Gilles D. <gr...@sc...> - 2002-09-06 14:59:05

According to Brian White:
> At 09:10 6/09/2002, Gilles Detillieux wrote:
> >Well, while going with POSIX-compliant locking would help with
> >portability, I'm not sure all systems currently supported by 3.1.x
> >are fully POSIX-compliant either, so it may be that some only support
> >flock(), or even perhaps no locking at all.  Some configure tests for
> >various locking schemes should be implemented, so the code uses what
> >the system provides, or no locking at all if nothing appropriate is found.
> 
> I already have "locked" and "unlocked" versions of the code, managed by
> an #ifdef - I would just have to add a  -D__NO_FILE_LOCKING__ or something
> like that.

Yes.  Ideally, though, it would be automated via configure tests.
For example, you test for the flock() call and define HAVE_FLOCK
in htconfig.h.  Then the code uses #ifdef HAVE_FLOCK.  Similarly,
you define something like HAVE_FCNTL_LOCK if that capability exists.
That test is a bit more complex, as it's not just testing for the
existance of a library function.

> I assume this means I would need to create a patch for the configure
> script - any tips on how to do that? Is that monster *really* maintained
> solely by hand or is there some tools for it?

It's generated from configure.in by the autoconf program.  So,
configure.in is the monster we maintain by hand, which isn't quite as
big and scary as the configure script itself.  Still, you need to learn
enough about autoconf to get by, which is more than I know at this point.

> > > 3) It should be simple enough to create a patch that works with 3.2.x,
> > >     judging by a quick look at the latest Display.cc in the CVS repository.
> > >
> > > I *would* like to get it rolled into 3.1.x if I can. I am
> > > more than willing to make any changes required to make this
> > > happen.
> >
> >I think it would be good to see this in the 3.2 CVS tree, with the
> >appropriate configure tests.  I'm still a bit lukewarm on the addition
> >of the "init" input parameter to htsearch.  It seems the absense of a
> >"page" parameter would mean the same thing, wouldn't it?
> 
> You know, I hadn't even thought of that. The only disadvantage to it
> is that it isn't explicit - I can see someone setting "Page=1" for their
> initial search and wondering why their logging doesn't work. The only
> way around this would be documentation, with notes
> 
>     1) Where the "page" parameter is discuseed
>     2) Where the logging attributes are discussed
>     3) In the FAQs
> 
> Otherwise - Yes! Perfect!

Not to mention writing attrs.html entries (and links in cf_by????.html) for
all the new attributes.  This is of course easier in 3.2.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

[htdig-dev] cannot search sites other than port 80

From: Stephan H. <be...@be...> - 2002-09-06 14:49:00

Hi developers,

when i give a start_url with port 8080 (tomcat) and the webapp's servlet 
sends a redirect, htdig does not get any further. The reason seems to be that 
htdig does not include the port in the Host header of the first HTTP-Request.
Example:

the start_url is http://localhost:8080/mywebapp/myservlet/

htdig sends this request:

GET /robots.txt HTTP/1.0
User-Agent: htdig/3.1.5 (myemail)
Host: localhost

i think, Host should be localhost:8080 instead. At least mozilla does this.

Now if the servlet sends a redirect, it does send it without the port what 
leads to a wrong redirect.

Can anybody confirm this behavior?

Bye,
Stephan

Re: [htdig-dev] changes for 3.1.7 (was: Adjustable logging patch.)

From: Gilles D. <gr...@sc...> - 2002-09-06 14:48:44

According to Joe R. Jah:
> Would you please list the patches you have already committed to CVS, and
> those you may, so that we can carry over the rest as patches to 3.1.7
> folder.

Actually, I haven't begun committing changes to CVS yet.  As for 3.1.6,
I'll probably wait until I have a sufficiently large and complete to-do
list, and a good chunk of time I can devote to the task (which I don't
have now), and then get a flurry of commits happening.

All I have right now is a to-do list of 23 bug fixes, some of which exist
in patches and some of which need to be written still.  I also need to go
through the set of existing patches to see what's ready to use as-is, what
needs tweaking/configure test/documentation, and what I'll exclude.

> Here is the list as of: Thu Sep 5 17:38:47 PDT 2002:
> 
> Patch                        # of downloads
> -----                        --------------
> ssl.9                          193
> timet_enddate.1                182
> Makefile.0                      78
> documentation.1                 68
> metadate.0                      65
> redirect.0                      51
> documentation.2                 50
> NUL.0                           47
> AdjustableLoggingPatch.tar.gz   44
> fileSpace.1                     42
> titleSpace.0                    36
> multiple-noindex.1              34
> Date-viewing.0                  32
> time_t.0                        27
> gcc-3.1.0                       22
> ExecutionTime.0                  9
> ExternalParser-max_doc_size.0    7
> htnotifyNull.0                   6

Offhand, I'd break them down thus...

Include as-is				Leave out
-------------				---------
Date-viewing.0				AdjustableLoggingPatch.tar.gz
ExternalParser-max_doc_size.0		ExecutionTime.0
Makefile.0				multiple-noindex.1
documentation.1				ssl.9
documentation.2				titleSpace.0
gcc-3.1.0
metadate.0
redirect.0
time_t.0
timet_enddate.1

Unsure/needs work
-----------------
fileSpace.1	(new feature, needs docs, but simple/clean/portable & in demand)
NUL.0		(needs config attribute & docs, adds overhead)
htnotifyNull.0	(still has problems with in.bad() handling)

For those who are interested, my current, sketchy to-do list for 3.1.7 is...

 - back out Gabriele's CVS changes of Aug 13
 - fix "not HTML" error message to something like "unknown Content-type".
 + server_wait_time is currently misspelled in cf_byname.html
 + string list description explains quoted string list in cf_types.html
 + htmerge -m is unclear (fixed in maindocs)
 + Marchand's patch to htsearch/Display.cc (fix enddate bug)
 + Marchand's patch to Makefile.config.in (use DEFS)
 + fix parsedcdate() in Retriever.cc to allow '-' after year
 - fix parsedcdate() in Retriever.cc to handle server's local timezone
 + "dc.date.modified" handling patch (May 17)
 - handle -ve scores and/or locations in WordList::Word()
 - fix parsers not to overflow location calc (find e-mail about this)
 - handle location_factor attr. in WordList::Word(), check bounds
 - checks for -ve scores in Display.cc
 - better handling of multimatch_factor, using a new count field in DocMatch
 - keep docdb records for noindex docs, just not words, so updates check these
 - don't delete ANCHOR just because it's not in excerpt
 - better handling of sup & sub tags in HTML.cc, optionally treat as punctuation
 - new catdoc link: http://www.ice.ru/~vitus/catdoc/ in contrib/* parsers
 - less verbose output from htnotify -v, require 2 or more v's for that
 - Martin Vorlaender's VMS patches
 + patch #548448 dealing with unsigned time_t (Apr 25)
 - handle nulls in text/* files (convert to space)

where + means fixed in maindocs or a complete patch, and - means needs
work.  As you can see, my to-do list and the list of patches I want
aren't even mutually complete, though most patches I want are mentioned
in the to-do list.  I've no doubt missed some things on my list that
have been discussed as important/urgent before, but never got around to
noting them.  If anyone wants to help complete the list, or better yet
knock off (i.e. implement) some items on the list, more power to you.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

Re: [htdig-dev] Questions about htdig-3.2.0b3/htcommon/defaults.cc

From: Geoff H. <ghu...@ws...> - 2002-09-06 14:45:50

On Fri, 6 Sep 2002, Brian White wrote:

> I was looking at defaults.cc and I was wondering if
> it might be better managing the info as an XML file
> and then using that as a basis for generating
> defaults.cc and the HTML docs.

Yes, this would actually be quite wonderful. Currently, it's hard to
"validate" changes you make to defaults.cc. It's also a minor pain to
insert and format HTML, since it has to be properly escaped. (No, it's not
a big deal, but XML would obviously be easier.)

> Disadvantages
>    * Part of the build process for exexcutabe would require
>      perl to exist

No, not really. We have lots of "autogenerated" files in 3.2. You'd only
need Perl if you modified defaults.xml and needed to generate the new
defaults.cc.

>    * After rabbiting on like this I now have to decide
>      if I willing to put my money where my mouth is.....

Yes, now that's the question. :-)

Formatting defaults.cc into defaults.xml isn't hard and I'd be glad to do
that with some emacs macros. But I'd be glad to accept this change if you
(or someone) will write the defaults_generate.pl script. Ideally the
script would have some nice error-checking to tell you if you've left out
a field, etc.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

Re: [htdig-dev] Adjustable logging patch.

From: Geoff H. <ghu...@ws...> - 2002-09-06 14:35:26

On Fri, 6 Sep 2002, Brian White wrote:

> Ok - my desire to get it into 3.1.x is based around the fact that
> we have installed 3.1.6 at a large client site, with the AdjustableLogging
> patch installed. In fact, it was written for their installation.
> It makes long term support slightly easier if the product is *fully* off the
> shelf.

No offense, but there are a variety of packaging mechanisms which will
also add a patch (.rpm, .deb, etc.) for various local modifications. I
also would agree with Gilles that your patch seems like a rather large
feature to be adding when we really want to "finish" 3.1.x releases.

> However, that said - getting it into the 3.2 CVS tree means by the
> time it ever becomes a genuine issue, a stable 3.2 release should
> be available for use. I would be happy with that.

OK, then let's talk about getting your patch into the 3.2.0b4
snapshots. If we get it in shortly, we'll probably have a beta release or
two to catch any portability problems.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

[htdig-dev] [ htdig-Patches-605517 ] fix for SSL patch to 3.1.6

From: <no...@so...> - 2002-09-06 11:01:30

Patches item #605517, was opened at 2002-09-06 13:01
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=304593&aid=605517&group_id=4593

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin Vorlaender (martinv2)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix for SSL patch to 3.1.6

Initial Comment:
I applied the SSL patch from ftp://ftp.ccsf.org/htdig-
patches/3.1.6/ssl.9 to the VMS port, and hit the following 
showstopper:
On platforms without a /dev/u?random device 
or an EGD daemon (e.g. VMS ;-), the SSL PRNG is seeded from a file. 
For this to work, the application must call RAND_load_file() or else 
a connect fails with an "PRNG not seeded" error message (new 
behaviour since OpenSSL 0.9.5). When I insert this call into 
htlib/Connection.cc's Connection::initSSL, SSL connections do 
work.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=304593&aid=605517&group_id=4593

Re: [htdig-dev] OS X 10.2

From: Jim C. <gre...@yg...> - 2002-09-06 03:51:31

Gilles Detillieux's bits of Thu, 5 Sep 2002 translated to:

>According to Jim Cole:
>> I think there is a bug in htnotify's readPreAndPostamble(). Both
>> htnotify_prefix_file and htnotify_suffix_file have a default
>> value of "", but the code only checks for NULL when examining the
>> values of prefixfile and suffixfile. The code then proceeds to
>> create ifstream objects using the default values. Finally, the
>> streams are checked with 'if (! in.bad())'; however the ifstream
>> constructor sets failbit, rather than badbit, when it is unable
>> to open the specified file. The result is that the code drops
>> into a while loop and starts extracting from an undefined stream
>> object.
...
>Yes, this was reported just a few weeks ago, and a patch was provided.
>See ftp://ftp.ccsf.org/htdig-patches/3.1.6/

Sorry. I somehow missed that post. However, even with the patch I
believe the code in both 3.1.6 and 3.2.x is incorrect. If the
name of a non-existent file was provided, the same problem with
infinite looping could occur. As I understand the standard, the
check of in.bad() is of no use with regard to whether the file
was opened successfully. If on the other hand in.good() is
checked, then it would ensure that neither badbit nor failbit is
set.

Jim

[htdig-dev] Questions about htdig-3.2.0b3/htcommon/defaults.cc

From: Brian W. <bw...@st...> - 2002-09-06 02:50:47

I was looking at defaults.cc and I was wondering if
it might be better managing the info as an XML file
and then using that as a basis for generating
defaults.cc and the HTML docs.

The fields are

struct ConfigDefaults
{
   char  *name;          // Name of the attribute
   char  *value;         // Default value
   char  *type;          // Type of the value (string, integer, boolean)
   char  *programs;      // Whitespace separated list of programs/modules 
using this attribute
   char  *block;         // Configuration block this can be used in (can be 
blank)
   char  *version;       // Version that introduced the attribute
   char  *category;      // Attribute category (to split documentation)
   char  *example;       // Example usage of the attribute (HTML)
   char  *description;       // Long description of the attribute (HTML)
};

I can see programming uses for  name, value, type and maybe programs.
I assume all the rest is just for documentation

It would be simple enough to write a perl script that
extracted the necesary fields to create defaults.cc,
that only had what was actually needed for the program,
and then something a bit cleverer written to create the
HTML pages.

( I just noticed the perl script that uses
   defaults.cc to generate the doc pages )

Advantages
   * It would put all the default info into a
     much easier to edit and documentable format
   * It would make it much clearer which values were
     required in the code and which were there
     for documentation.
   * It would reduce the size of the executable by
     about 80000 characters ( 80K or maybe 160 K)

Disadvantages
   * Part of the build process for exexcutabe would require
     perl to exist
   * The current system, be it a bit clunky to my eyes,
     does work, and does solve the problem of trying
     to maintain concurrently the code version and
     the documentation version of the attributes.
   * After rabbiting on like this I now have to decide
     if I willing to put my money where my mouth is.....

Regs

Brian





-------------------------
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML & XML
Phone: +612-93197901
Web:   http://www.steptwo.com.au/
Email: bw...@st...

Content Management Requirements Toolkit
112 CMS requirements, ready to cut-and-paste

Re: [htdig-dev] Adjustable logging patch.

From: Brian W. <bw...@st...> - 2002-09-06 01:33:41

At 09:10 6/09/2002, Gilles Detillieux wrote:
> > 2) If the issue is the portability of flock, would it be
> >     acceptable if I changed it over to using fcntl?
> >
> >      (Mr Google threw up the follwoing page which says that "fcntl() is the
> >       only POSIX-compliant locking mechanism, and is therefore the only
> >       truly portable lock"
> >
> >       http://www.erlenstar.demon.co.uk/unix/faq_3.html
> >       )
>
>Well, while going with POSIX-compliant locking would help with
>portability, I'm not sure all systems currently supported by 3.1.x
>are fully POSIX-compliant either, so it may be that some only support
>flock(), or even perhaps no locking at all.  Some configure tests for
>various locking schemes should be implemented, so the code uses what
>the system provides, or no locking at all if nothing appropriate is found.

I already have "locked" and "unlocked" versions of the code, managed by
an #ifdef - I would just have to add a  -D__NO_FILE_LOCKING__ or something
like that.

I assume this means I would need to create a patch for the configure
script - any tips on how to do that? Is that monster *really* maintained
solely by hand or is there some tools for it?

> > 3) It should be simple enough to create a patch that works with 3.2.x,
> >     judging by a quick look at the latest Display.cc in the CVS repository.
> >
> > I *would* like to get it rolled into 3.1.x if I can. I am
> > more than willing to make any changes required to make this
> > happen.
>
>I think it would be good to see this in the 3.2 CVS tree, with the
>appropriate configure tests.  I'm still a bit lukewarm on the addition
>of the "init" input parameter to htsearch.  It seems the absense of a
>"page" parameter would mean the same thing, wouldn't it?

You know, I hadn't even thought of that. The only disadvantage to it
is that it isn't explicit - I can see someone setting "Page=1" for their
initial search and wondering why their logging doesn't work. The only
way around this would be documentation, with notes

    1) Where the "page" parameter is discuseed
    2) Where the logging attributes are discussed
    3) In the FAQs

Otherwise - Yes! Perfect!

>As for 3.1.x, though, here are my thoughts.  I'm quite adament about
>not wanting to put out a 3.1.8 release.  So, that means I have to be
>very adament about getting 3.1.7 right, with no new bugs or portability
>problems.  To do that, I think I'm going to need to put my foot down as
>far as the feature freeze, and insist that only bug fixes go into 3.1.7,
>and no new features.  The only discussed new feature for 3.1.7 that I
>haven't completely ruled out yet is location_factor, because it's tied
>to some bug fixes in WordList::Word() anyway, and had been planned for
>3.1.6 but fell through the cracks.  I may drop this attribute anyway,
>and stick to just bug fixes.

Ok - my desire to get it into 3.1.x is based around the fact that
we have installed 3.1.6 at a large client site, with the AdjustableLogging
patch installed. In fact, it was written for their installation.
It makes long term support slightly easier if the product is *fully* off the
shelf.

However, that said - getting it into the 3.2 CVS tree means by the
time it ever becomes a genuine issue, a stable 3.2 release should
be available for use. I would be happy with that.



>--
>Gilles R. Detillieux              E-mail: <gr...@sc...>
>Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
>Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

-------------------------
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML & XML
Phone: +612-93197901
Web:   http://www.steptwo.com.au/
Email: bw...@st...

Content Management Requirements Toolkit
112 CMS requirements, ready to cut-and-paste

Re: [htdig-dev] Adjustable logging patch.

From: Joe R. J. <jj...@cl...> - 2002-09-06 00:53:30

On Thu, 5 Sep 2002, Gilles Detillieux wrote:

> Date: Thu, 5 Sep 2002 18:10:53 -0500 (CDT)
> From: Gilles Detillieux <gr...@sc...>
> To: Brian White <bw...@st...>
> Cc: htd...@li...
> Subject: Re: [htdig-dev] Adjustable logging patch.
> 
> As for 3.1.x, though, here are my thoughts.  I'm quite adament about
> not wanting to put out a 3.1.8 release.  So, that means I have to be
> very adament about getting 3.1.7 right, with no new bugs or portability
> problems.  To do that, I think I'm going to need to put my foot down as
> far as the feature freeze, and insist that only bug fixes go into 3.1.7,
> and no new features.  The only discussed new feature for 3.1.7 that I
> haven't completely ruled out yet is location_factor, because it's tied
> to some bug fixes in WordList::Word() anyway, and had been planned for
> 3.1.6 but fell through the cracks.  I may drop this attribute anyway,
> and stick to just bug fixes.

Would you please list the patches you have already committed to CVS, and
those you may, so that we can carry over the rest as patches to 3.1.7
folder.  Here is the list as of: Thu Sep 5 17:38:47 PDT 2002:

Patch                        # of downloads
-----                        --------------
ssl.9                          193
timet_enddate.1                182
Makefile.0                      78
documentation.1                 68
metadate.0                      65
redirect.0                      51
documentation.2                 50
NUL.0                           47
AdjustableLoggingPatch.tar.gz   44
fileSpace.1                     42
titleSpace.0                    36
multiple-noindex.1              34
Date-viewing.0                  32
time_t.0                        27
gcc-3.1.0                       22
ExecutionTime.0                  9
ExternalParser-max_doc_size.0    7
htnotifyNull.0                   6

Regards,

Joe
-- 
     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        jj...@cl...

Re: [htdig-dev] Adjustable logging patch.

From: Gilles D. <gr...@sc...> - 2002-09-05 23:11:19

According to Brian White:
> >According to J. op den Brouw:
> > > It's a nice patch for those who cannot use syslog facilities, but
> > > the patch removes the syslog logging feature. It would be nice
> > > to select one of them (or have them both) on compile or run time
> > > basis.
> > >
> > > It's also a patch against 3.1.6. It would be nice if there's a
> > > patch for 3.2.0b4-xxxx too.
> > >
> > > Furthermore, I see a flock() call somewhere. AFAIK, different
> > > OS-es use different names and parameter lists. Example
> > >
> > > HP-UX:   int lockf(int fildes, int function, off_t size);
> > > Linux 2.2:   int flock(int fd, int operation);
> >
> >I hadn't noticed when I looked at the patch that it completely removed
> >the ability to log to syslog().  That's one more reason to reject
> >it for 3.1.x.  I rejected it over concerns about portability, as you
> >pointed out.  I don't think it's appropriate for inclusion in 3.1.7
> >either for that reason.
> 
> Ok.
> 
> 1) The patch does not remove the ability to do syslog. In my notes
>     that go with the patch it says:
> 
> >    * logging_file ( Default: none )
> >
> >      If this is set to "none", then it will log using syslog, otherwise
> >      this will be assumed to be the path to the log file
> 
> 
>     The whole way it is set up, it uses the existing default
>     behaviour if it isn't explicitly activated.

Good.  I didn't recall seeing any red flags go up in regards to this last
time I looked at your patch, but that was a while ago.  I didn't review
your patch when Jesse made this statement, so I took his word for it.

> 2) If the issue is the portability of flock, would it be
>     acceptable if I changed it over to using fcntl?
> 
>      (Mr Google threw up the follwoing page which says that "fcntl() is the
>       only POSIX-compliant locking mechanism, and is therefore the only
>       truly portable lock"
> 
>       http://www.erlenstar.demon.co.uk/unix/faq_3.html
>       )

Well, while going with POSIX-compliant locking would help with
portability, I'm not sure all systems currently supported by 3.1.x
are fully POSIX-compliant either, so it may be that some only support
flock(), or even perhaps no locking at all.  Some configure tests for
various locking schemes should be implemented, so the code uses what
the system provides, or no locking at all if nothing appropriate is found.

> 3) It should be simple enough to create a patch that works with 3.2.x,
>     judging by a quick look at the latest Display.cc in the CVS repository.
> 
> I *would* like to get it rolled into 3.1.x if I can. I am
> more than willing to make any changes required to make this
> happen.

I think it would be good to see this in the 3.2 CVS tree, with the
appropriate configure tests.  I'm still a bit lukewarm on the addition
of the "init" input parameter to htsearch.  It seems the absense of a
"page" parameter would mean the same thing, wouldn't it?

As for 3.1.x, though, here are my thoughts.  I'm quite adament about
not wanting to put out a 3.1.8 release.  So, that means I have to be
very adament about getting 3.1.7 right, with no new bugs or portability
problems.  To do that, I think I'm going to need to put my foot down as
far as the feature freeze, and insist that only bug fixes go into 3.1.7,
and no new features.  The only discussed new feature for 3.1.7 that I
haven't completely ruled out yet is location_factor, because it's tied
to some bug fixes in WordList::Word() anyway, and had been planned for
3.1.6 but fell through the cracks.  I may drop this attribute anyway,
and stick to just bug fixes.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

Re: [htdig-dev] Re: RFC on 3.2 indexing v. hopcount

From: Gilles D. <gr...@sc...> - 2002-09-05 21:30:02

According to Geoff Hutchison:
> I had a brief brainstorm on my run today as far as profiling the 
> indexing. Obviously htword/mifluz performance still needs to improve 
> significantly. But another slowdown relative to 3.1 is from the way 3.2 
> treats hopcounts. To ensure that restricting indexes by hopcount works 
> correctly, the "queue" for URLs is really a priority queue. URLs with 
> lower hopcounts move up the heap. Of course this requires some sorting 
> and some overhead.
> 
> Right now, I don't think this needs to happen *unless* we're restricting 
> indexing based on hopcount. So the proposal is that when we're not 
> restricting by hopcount, the Server objects would switch back to the 
> previous system (i.e. no sorting).
> 
> I think this should shave a few percent off of indexing. Does this seem 
> like an OK idea? Can anyone come up with an example where this would be 
> a Bad Idea(tm)?

I can't think of a problem offhand.  Sounds reasonable to me.  Of course,
you probably understand this aspect of the code better than any of us.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

[htdig-dev] Re: [htdig] HTTP Header

From: Gilles D. <gr...@sc...> - 2002-09-05 20:54:46

According to Gabriele Bartolini:
> Ciao Romain,
> 
>    as far as I know, now htdig doesn't support it yet, but you could
> easily hack the code to make it work. I have something to complain about
> this way of negotiating a request by the CMS, because HTTP says the when
> no Accept is given, every media type is accepted by the client, but ...
> it's ok.
> 
>    However, I think this is a good point to analyse for the 3.2 code. We
> should somehow let the Web server know what kind of media types htdig is
> able to understand, by listing all of them (default ones plus those
> managed through external parses' help).
> 
>    What d'u think guys?

Well, I certainly don't have a problem with htdig 3.2 having support
for the Accept header in its requests.  In fact, it does sound like a
good idea.

However, Romain's web site is broken!  htdig 3.1.5 is an HTTP/1.0 client,
and in RFC 1945, which defines the HTTP 1.0 protocol, the Accept request
header is only mentioned in an appendix, where it states that this "...
header field can be used to indicate a list..."  (Note: can be used,
not MUST be used!)  I.e. this is not to be treated as a required header,
and many HTTP/1.0 clients will not put out this header.  Any server that
requires this of an HTTP/1.0 client is broken.

Even RFC 2068, which defines HTTP/1.1, says "... can be used ...",
and also "If no Accept header field is present, then it is assumed
that the client accepts all media types."  If a web site cannot render
content properly without the Accept header, it is not compliant with
this standard.  Fixing htdig to work around this bug may allow htdig
to index the site, but it won't prevent problems with other standards-
compliant web clients navigating this site, if they happen not to put
out this header either.

Workarounds for bugs like this should be a last resort, when it's
impossible to fix the real problem, and not a first resort to avoid
even attempting to get at the problem.

> Il mer, 2002-08-28 alle 15:14, rl...@bn... ha scritto:
> > I want to index my web site using htdig.
> > 
> > However, my web site, using a CMS , needs the "Accept " HTTP Header, in
> > order to render the dynamic content properly.
> > 
> > htdig does not send this Header.
> > 
> > How can I define custom HTTP Headers for the robot :
> > using htdig.conf ?
> > modifying the source code ?
> > 
> > PS:
> > I am using a compiled htdig v3.1.5 on an AIX v4.3 box

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

Re: [htdig-dev] OS X 10.2

From: Gilles D. <gr...@sc...> - 2002-09-05 16:43:57

According to Jim Cole:
> I think there is a bug in htnotify's readPreAndPostamble(). Both
> htnotify_prefix_file and htnotify_suffix_file have a default
> value of "", but the code only checks for NULL when examining the
> values of prefixfile and suffixfile. The code then proceeds to
> create ifstream objects using the default values. Finally, the
> streams are checked with 'if (! in.bad())'; however the ifstream
> constructor sets failbit, rather than badbit, when it is unable
> to open the specified file. The result is that the code drops
> into a while loop and starts extracting from an undefined stream
> object.
> 
> The problem doesn't occur in the 3.2 branch because in addition
> to checking for NULL prefixfile/suffixfile, the code also checks
> the values of *prefixfile and *suffixfile.

Yes, this was reported just a few weeks ago, and a patch was provided.
See ftp://ftp.ccsf.org/htdig-patches/3.1.6/

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

[htdig-dev] Cookies patch for 3.1.6

From: Gabriele B. <g.b...@co...> - 2002-09-05 14:57:43

Attachments: patch.gz

Hi guys,

   here is a patch for Cookies support in ht://Dig 3.1.6.

   I already sent it to the patch e-mail, but I thought it would have
been useful to warn you too.

Ciao
-Gabriele
-- 
Gabriele Bartolini - Web Programmer
Comune di Prato - Prato - Tuscany - Italy
g.b...@co... | http://www.comune.prato.it
> find bin/laden -name osama -exec rm {} ;

[htdig-dev] Re: [htdig-general] Index mutli-part of site

From: Tony J. <tja...@mg...> - 2002-09-04 16:53:18

Hi Geoff,
If found it. i must put if the form <input type=3D'hidden' name=3D"restrict"=
=20
value=3D"http://NDD/">

But now when i search with number (ex : 2001), the result is "not found"

Have you got any ideas ?

Best reagrds



At 12:28 PM 9/4/02 -0400, Geoff Hutchison wrote:

>Bonjour Tony,
>
>If I understand correctly, you want to index the whole site and then also
>have search forms which restrict the search somewhat. Depending on how
>complicated the subset is, you can either do two things:
>
>1) Have two separate databases (and configuration files)
>See <http://www.htdig.org/FAQ.html#q4.4> for more.
>2) Use the restrict and exclude fields of the search form to filter search
>results by URL. This is best when you have something like:
>
>http://www.foo.com/
>http://www.foo.com/mail-archives/
>
>(and you know that you want a search all within the mail-archives
>directory).
>
>Is this what you're interested in?
>
>--
>-Geoff Hutchison
>Williams Students Online
>http://wso.williams.edu/
>
>
>On Mon, 2 Sep 2002, Tony Jarriault wrote:
>
> >
> > Hi,
> > I am french developper, and i want to index  my site such as Index=
 server
> > "Microsoft"
> >
> > I would like to be able to index a site in entirety then on the same=20
> site a
> > repertoire.  In order to have 2 search engines on the same=20
> site.  Including
> > one being more precise on a given subject.
> >
> > Is this possible?  If so, how then I to make?
> >
> > Thank you by advance
> >
> >
> > Tony
> >
> >
> > -----------------------------------------------------------------------
> > Service webmaster : mailto:web...@mg...
> > Tel : 01-34-49-06-69
> > MGN : http://www.mgn.fr
> > -----------------------------------------------------------------------
> >
> > Tony Jarriault
> > mailto:tj...@mg...
> > Tel : 01-34-49-06-43
> > MATRA GLOBAL NETSERVICES
> > Societ=E9 du groupe PROSODIE
> > 8, rue Grange Dame Rose
> > 78140 V=E9lizy
> >

-----------------------------------------------------------------------
Service webmaster : mailto:web...@mg...
Tel : 01-34-49-06-69
MGN : http://www.mgn.fr
-----------------------------------------------------------------------

Tony Jarriault
mailto:tj...@mg...
Tel : 01-34-49-06-43
MATRA GLOBAL NETSERVICES
Societ=E9 du groupe PROSODIE
8, rue Grange Dame Rose
78140 V=E9lizy

[htdig-dev] Re: [htdig-general] Index mutli-part of site

From: Geoff H. <ghu...@ws...> - 2002-09-04 16:28:29

Bonjour Tony,

If I understand correctly, you want to index the whole site and then also
have search forms which restrict the search somewhat. Depending on how
complicated the subset is, you can either do two things:

1) Have two separate databases (and configuration files)
See <http://www.htdig.org/FAQ.html#q4.4> for more.
2) Use the restrict and exclude fields of the search form to filter search
results by URL. This is best when you have something like:

http://www.foo.com/
http://www.foo.com/mail-archives/

(and you know that you want a search all within the mail-archives
directory).

Is this what you're interested in?

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

On Mon, 2 Sep 2002, Tony Jarriault wrote:

>=20
> Hi,
> I am french developper, and i want to index  my site such as Index server=
=20
> "Microsoft"
>=20
> I would like to be able to index a site in entirety then on the same site=
 a=20
> repertoire.  In order to have 2 search engines on the same site.  Includi=
ng=20
> one being more precise on a given subject.
>=20
> Is this possible?  If so, how then I to make?
>=20
> Thank you by advance
>=20
>=20
> Tony
>=20
>=20
> -----------------------------------------------------------------------
> Service webmaster : mailto:web...@mg...
> Tel : 01-34-49-06-69
> MGN : http://www.mgn.fr
> -----------------------------------------------------------------------
>=20
> Tony Jarriault
> mailto:tj...@mg...
> Tel : 01-34-49-06-43
> MATRA GLOBAL NETSERVICES
> Societ=E9 du groupe PROSODIE
> 8, rue Grange Dame Rose
> 78140 V=E9lizy
>=20

Re: [htdig-dev] binary document-database format questions

From: Geoff H. <ghu...@ws...> - 2002-09-04 16:24:38

On Wed, 4 Sep 2002, Walantis Giosis wrote:

> The ID bytes for length informations (excerpt length, docume size, URL
> length) varies. Say we have a document size of less than 100h bytes.
> Then the ID byte has the value 44h for that information. The size
> needs only one byte. If the size exceeds 100h bytes (it needs two or
> more bytes) then the ID byte has the value 84h. What's the logic
> behind this ? Only to determine the byte count for the size ? At the
> moment I've handled it using a switch/case statement.

Hans-Peter Nilsson rewrote the Serialize/Deserialize routines very
carefully, so I can't speak authoritatively.  I think he was trying to
save as much space as possible. AFAICT, there's a marker indicating that
the next variable coming up is sizeof() whatever.

Take a look at htcommon/DocumentRef.cc::Serialize() to see the code.

> And why is the document size information stored twice in the database ?

They should be different. See htcommon/DocumentRef.[cc,h] which deals with
the document DB records. In particular, there's the text size of the
database and optionally, it can figure out the size of the document
including all images.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

[htdig-dev] binary document-database format questions

From: mosher <mo...@xr...> - 2002-09-04 13:44:07

Hello developers,

I've analyzed the binary document-database format so that I'm now able to extract the informations without using the textual database.
But there's one thing I couldn't figure out:

The ID bytes for length informations (excerpt length, docume size, URL length) varies. Say we have a document size of less than 100h bytes.
Then the ID byte has the value 44h for that information. The size needs only one byte.
If the size exceeds 100h bytes (it needs two or more bytes) then the ID byte has the value 84h. What's the logic behind this ? Only to determine the byte count for the size ?
At the moment I've handled it using a switch/case statement.

And why is the document size information stored twice in the database ?


Thanks in advance,

Walantis


--
l8r,
Walantis

http://www.xraw.de

[htdig-dev] binary document-database format questions

From: Walantis G. <wal...@xr...> - 2002-09-04 13:34:07

Hello developers,

I've analyzed the binary document-database format so that I'm now able to extract the informations without using the textual database.
But there's one thing I couldn't figure out:

The ID bytes for length informations (excerpt length, docume size, URL length) varies. Say we have a document size of less than 100h bytes.
Then the ID byte has the value 44h for that information. The size needs only one byte.
If the size exceeds 100h bytes (it needs two or more bytes) then the ID byte has the value 84h. What's the logic behind this ? Only to determine the byte count for the size ?
At the moment I've handled it using a switch/case statement.

And why is the document size information stored twice in the database ?


Thanks in advance,

Walantis


--
l8r,
Walantis

http://www.xraw.de

[htdig-dev] Back online

From: J. op d. B. <ht...@op...> - 2002-09-04 11:18:49

Hi all,

I've been away sometime, cruising with the family. My daughter is 5,5
months now. Also, my account ms...@st... is disabled since
aug, 12th 2002. It will not receive any mail nor will it forward to
my new account.

My new e-mail address for htdig matters is ht...@op...

Greetz from Holland

--Jesse

[htdig-dev] Using cookies

From: Adam B. <ad...@fr...> - 2002-09-03 23:14:43

Hi,

The site I am indexing uses cookies for authorisation. I am guessing I wi=
ll=20
need to write a wrapper to htdig to log in to the site with the appropria=
te=20
user name and password and then store the cookie data in Htdig somewhere =
so=20
that it is authorised to browse the site.=20

How do I do this or could someone please point me to the appropriate=20
documentation.

thanks,


Adam

[htdig-dev] Index mutli-part of site

From: Tony J. <tja...@mg...> - 2002-09-02 16:32:29

Hi,
I am french developper, and i want to index  my site such as Index server=20
"Microsoft"

I would like to be able to index a site in entirety then on the same site a=
=20
repertoire.  In order to have 2 search engines on the same site.  Including=
=20
one being more precise on a given subject.

Is this possible?  If so, how then I to make?

Thank you by advance


Tony


-----------------------------------------------------------------------
Service webmaster : mailto:web...@mg...
Tel : 01-34-49-06-69
MGN : http://www.mgn.fr
-----------------------------------------------------------------------

Tony Jarriault
mailto:tj...@mg...
Tel : 01-34-49-06-43
MATRA GLOBAL NETSERVICES
Societ=E9 du groupe PROSODIE
8, rue Grange Dame Rose
78140 V=E9lizy

[htdig-dev] Current Status as of snapshot 3.2.0b4-20020901

From: Geoff H. <ghu...@us...> - 2002-09-01 07:13:54

STATUS of ht://Dig branch 3-2-x

RELEASES:
   3.2.0b4: In progress 
	(mifluz merge essentially finished, contact Geoff for patch to test)
   3.2.0b3: Released:  22 Feb 2001.
   3.2.0b2: Released:  11 Apr 2000.
   3.2.0b1: Released:   4 Feb 2000.

SHOWSTOPPERS:

KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
   wordlist_compress set but work fine without wordlist_compress.
   (the date is definitely stored correctly, even with compression on
    so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
   consistant mapping of input -> config -> template for all inputs where
   it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set 
   correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
   we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
   not FLAG_DESCRIPTION. (PR#859)

PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)

NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.

TESTING:
* httools programs: 
  (htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
  argument handling for parser/converter, allowing binary output from an
  external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.

DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
   (including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
   to template variables. (Relates to PR#648.) Also make sure these config
   attributes are all documented in defaults.cc, even if they're only set by
   input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
   requirements of 3.2.x (e.g. phrase searching, regex matching,
   external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.

OTHER ISSUES:
* Can htsearch actually search while an index is being created?
   (Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
   (It should probably just set everything to empty) This relates to 
   PR#348.

9 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 77 78 79 80 81 .. 108 > >> (Page 79 of 108)