htdig-dev Mailing List for ht://Dig (Page 97)

Brought to you by: angusgb, grdetil, lha, nealr, scherpbier

htdig-dev — Developer Discussion for the ht://Dig project

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (47)	Nov (74)	Dec (66)
2002	Jan (95)	Feb (102)	Mar (83)	Apr (64)	May (55)	Jun (39)	Jul (23)	Aug (77)	Sep (88)	Oct (84)	Nov (66)	Dec (46)
2003	Jan (56)	Feb (129)	Mar (37)	Apr (63)	May (59)	Jun (104)	Jul (48)	Aug (37)	Sep (49)	Oct (157)	Nov (119)	Dec (54)
2004	Jan (51)	Feb (66)	Mar (39)	Apr (113)	May (34)	Jun (136)	Jul (67)	Aug (20)	Sep (7)	Oct (10)	Nov (14)	Dec (3)
2005	Jan (40)	Feb (21)	Mar (26)	Apr (13)	May (6)	Jun (4)	Jul (23)	Aug (3)	Sep (1)	Oct (13)	Nov (1)	Dec (6)
2006	Jan (2)	Feb (4)	Mar (4)	Apr (1)	May (11)	Jun (1)	Jul (4)	Aug (4)	Sep	Oct (4)	Nov	Dec (1)
2007	Jan (2)	Feb (8)	Mar (1)	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov	Dec
2008	Jan (1)	Feb	Mar (1)	Apr (2)	May	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec
2009	Jan	Feb	Mar (2)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2011	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2016	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec

Flat | Threaded

<< < 1 .. 95 96 97 98 99 .. 108 > >> (Page 97 of 108)

Re: [htdig-dev] Retriever/Parser

From: Neal R. <ne...@ri...> - 2002-02-07 04:02:09

> Can we come up with other types of Retriever classes beyond the 
> "here's a document in memory, index it" and "here's a URL, fetch it, 
> check status, index and spider" approaches?

	Not sure.  Anyone?  Could be some ability to fetch documents over
samba connections be usefull?

	I've got working BasicDocument & TextCollecter classes I'll post
soon.


> What do you present in the search results? How does a user select a 
> particular document--is it a link to fetch the document based on the 
> DocID? This may help for the people who've asked if htdig could not 
> only fetch the document but leave a local copy, a la the Google 
> "Cached Results" feature.

	The search results will be fetched via another set of classes. I'm
adapting the current htsearch query & display classes to have a per
document API.

	As each result is fetched, the 'URL' is in effect a pointer to an
XML document which is parsed and displayed with a PHP & XSLT.

	The 'URL' as it stands is not useable as a seperate entity, at
least for this application.

	One idea worth consideration along the lines of the
"Google cached" document feature would be to offload all spidering duties
to code like 'httrack', then index the files on the database.  With a log
file produced during httrack spidering' a second CACHED_URL could be
filled with the location of the local-copy, while the source URL is
preserved.

	httrack is built to spider and save web-pages to have
local-relative access to what is needed & linked to in the page.

	It's pretty well maintaned and well thought of, maybe you'd rather
leave the maintenance of spidering code to that project instead.

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site

Re: [htdig-dev] Retriever/Parser

From: Geoff H. <ghu...@ws...> - 2002-02-07 03:39:22

I'm going to take your points in a slightly different order and so I 
apologize for those following the thread.

At 6:54 PM -0700 1/29/02, Neal Richter wrote:
>	One could make an argument that mifluz could be used directly for
>this.  Very true, but mifluz is a bucket of nice parts.  Htdig is a
>working tool with the wrappers that make mifluz usefull quickly.

I'm not sure Loic ever saw it that way, but that's another story. He 
certainly was looking for very similar things in terms of using 
ht://Dig in other contexts. But we also haven't heard how his designs 
have changed over the years.

>	At some point the Retriever-as-swiss-army-knife approach can be
>overly complex.  A more basic class for optional use can be good for a
>narrow set of uses.

Fair enough, but again, this sounds more like a refactoring of htdig. 
Michael Haggerty worked on some things that may or may not be of 
interest (I haven't seen everything he did to htdig/ myself). If you 
have the CVS repository around, you should check the 
mrh-refactor-htdig branch. If not, let me know and I'll pull together 
a .tar.gz of that.

Can we come up with other types of Retriever classes beyond the 
"here's a document in memory, index it" and "here's a URL, fetch it, 
check status, index and spider" approaches?

>	For this project, all I really store as a 'URL' is part of the
>path to an XML file.. so by itself the URL is useless to any transport
>object.  For that matter you could use URL simply as a document-id in
>another separate system.
...
>	Similarly the query process is integrated inside another
>UI.  A Query is received via user input, passed to htdig search APIs and
>the results are repackaged with in the existing UI.

I'm curious about the URL from the search results aspect of it. 
Indeed the 3.2 code relies very little on the URL for indexing as 
everything is keyed by DocID. This is in contrast to 3.1 and prior 
where the URL was the key to the document database.

What do you present in the search results? How does a user select a 
particular document--is it a link to fetch the document based on the 
DocID? This may help for the people who've asked if htdig could not 
only fetch the document but leave a local copy, a la the Google 
"Cached Results" feature.

-Geoff

[htdig-dev] Re: mifluz merge

From: Geoff H. <ghu...@ws...> - 2002-02-07 03:39:19

Some of you are probably wondering why the mifluz merge doesn't seem 
to be progressing. There are a few reasons for this. First off, after 
some experimentation, it seems like the current mifluz CVS code 
doesn't build...

There's also the small problem that the latest released version isn't 
particularly cross-platform.

Here's how you can help the effort.

Download mifluz from <ftp://ftp.gnu.org/gnu/mifluz/mifluz-0.23.0.tar.gz>

Try to ./configure; make

Let me know if that's successful. In particular, I'm looking to find 
out what platforms have problems because ./configure dies with 
complaints about iconv.h. (These are platforms where we'll have to 
build a replacement.)

Failures:
* Mac OS X 10.1.2

Success:
* RedHat 7.2

-Geoff

[htdig-dev] htfuzzy efficiency question

From: Neal R. <ne...@ri...> - 2002-02-07 03:23:07

Hey,
	What are your thoughts on htfuzzy efficiency.

	Let's say I've got a htdig database that is getting incrementally
added to, say 1-2 megabytes per day in the worst case.

	I want to add new documents, which works fine, then update the
stemming databases.
	Right now it seems to be re-calculating all of the stemms for the
entire search database each time it is run.

	This kind of thing would get more expensive each day to reprocess
all the data...

	Can htmerge merge two stemming databases (looks like no?)

	Am I wrong here?

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site

[htdig-dev] system(mv) vs rename();

From: Neal R. <ne...@ri...> - 2002-02-07 00:59:38

Hello,

	Here's a humble patch..

	Instead of calling system(mv file1 file2) in htfuzzy/EndingsDB.cc 
& htfuzzy/Synonym.cc

	use rename(file1, file2);

	See man 2 rename 
	man 2 unlink [unlink() = rm]

	No need for any system specific stuff here, rename() is in libc.

Thanks

htfuzzy/EndingsDB.cc

82d81
< /*
90,93d88
< */
<
<     rename(root2word.get(), config["endings_root2word_db"].get());
<     rename(word2root.get(), config["endings_word2root_db"].get());

htfuzzy/Synonym.cc

121d120
< /*
128,130d126
< */
<
<     rename(dbFile.get(), config["synonym_db"].get());

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site

[htdig-dev] [Announce] ht://Dig 3.1.6 RPMs for Red Hat

From: Gilles D. <gr...@sc...> - 2002-02-05 23:53:44

I've just uploaded source and binary rpms for the ht://Dig
3.1.6 web site search engine to the htdig.org site, in
http://www.htdig.org/files/binaries/.  They can also be downloaded from
the SCRC web site, at http://www.scrc.umanitoba.ca/htdig/rpms/.

This is the latest stable release and is recommended for all production
servers.

This version in particular fixes a nasty security hole in htsearch 
that is present in all previous versions, including 3.1.5 and 
3.2.0b3. Because of this, it is *strongly* recommended that all users 
update to this version.

The following RPMs were built on Red Hat Linux 4.2, 6.2 and 7.2:

htdig-3.1.6-0rh42.i386.rpm       (for old libc5-based Red Hat 4.2)
htdig-3.1.6-0rh62.i386.rpm       (for glibc-2.1-based Red Hat 6.2)
htdig-3.1.6-0.src.rpm            (built on 4.2, but OK for 4.x, 5.x & 6.x)
htdig-3.1.6-0.rh72.i386.rpm      * (for glibc-2.2-based Red Hat 7.x, see note)
htdig-web-3.1.6-0.rh72.i386.rpm  * (ditto, see note below)
htdig-3.1.6-0.rh72.src.rpm       * (ditto)

Verify /etc/htdig/htdig.conf, then run /usr/sbin/rundig after installing,
to (re)build all your databases.

		----

* Note to Red Hat 7.1 & 7.2 users:

The KDE 2.1 package shipped with Red Hat 7.1 uses htdig and htsearch
to index and search its documentation.  For some reason, the version
Red Hat shipped is the buggy old 3.2.0b3 beta release, which was later
upgraded to a late-October 2001 snapshot of 3.2.0b4 in the errata update
release packages htdig-3.2.0-1.b4.0.71 and htdig-web-3.2.0-1.b4.0.71.
While less buggy than 3.2.0b3, this is still not exactly stable code.
The "rh72" packages above are meant to be drop-in replacements for the
3.2.0 betas, but because it's a smaller version number (even though being
a more recent release), you have to use the --oldpackage option on the
rpm command to update htdig to this release.  You should also find and
remove databases made by the 3.2.0 version and rebuild your indexes,
as 3.1.6 uses a different database version and format than 3.2.0 betas.
The binary packages are split in two because you only need the htdig-web
package for allowing searches from your web site, while the htdig package
is sufficient for KDE's khelpcenter search tool.  All this may be academic
because htsearch support was dropped from khelpcenter in KDE 2.2, which
shipped with Red Hat 7.2.

		----

Name        : htdig                       Distribution: (none)
Version     : 3.1.6                             Vendor: (none)
Release     : 0                             Build Date: Fri Feb 01 10:09:57 2002
Install date: Fri Feb 01 10:12:02 2002   Build Host: cliff.scrc.umanitoba.ca
Group       : Networking/Utilities          Source RPM: htdig-3.1.6-0.src.rpm
Size        : 3809910
Packager    : Gilles Detillieux <gr...@sc...>
URL         : http://www.htdig.org/
Summary     : A web indexing and searching system for a small domain or intranet
Description :
The ht://Dig system is a complete world wide web indexing and searching
system for a small domain or intranet. This system is not meant to replace
the need for powerful internet-wide search systems like Lycos, Infoseek,
Webcrawler and AltaVista. Instead it is meant to cover the search needs for
a single company, campus, or even a particular sub section of a web site.

As opposed to some WAIS-based or web-server based search engines, ht://Dig
can span several web servers at a site. The type of these different web
servers doesn't matter as long as they understand the HTTP 1.0 protocol.

		----

 Release notes for htdig-3.1.6 1 Feb 2002

As with previous releases, this version cleans up some remaining bugs and
adds a few heavily-requested features. As the latest stable release, it is
recommended for all production servers.

    * Fixed another nasty security hole in htsearch, which would allow a
denial of service attack or forcing htsearch to read in config files
outside of the configuration directory.
    * Fixed some problems with htmerge, including problems with words
beginning with special characters and merging multiple databases.
    * Fixed a bug in handling hopcounts.
    * Fixed problems in handling non-standard relative HTTP redirects.
    * Fixed bugs in external parsers support including being confused by
charset information in the Content-Type header and handling binary output
from external converters.
    * Fixed bugs in the default English endings database. (Under ispell,
it wasn't quite intended for the accuracy needed for our usage.)
    * Fixed additional bugs in the endings fuzzy algorithm.
    * Fixed bugs with compiling with gcc-3.0 and later.
    * Fixed bugs compiling and running on Mac OS X.
    * Fixed problems with servers not returning a Last-Modified date--now
assums indexing time as modification time.
    * Fixed a variety of bugs in the HTML parser to more flexibly handle
non-standard HTML.
    * Fixed problems in the TCP connection code and will more reliably
timeout when a connection hangs and will retry bad connections several
times before giving up.
    * Added the -m "minimal" flag to htdig for only indexing a set list of
URLs and made the -l (log) flag the default behavior so that htdig will
stop and restart automatically.
    * Added htdump and htload programs for dumping ASCII representations
of the databases and reloading the same.
    * Added support for htnotify to collect multiple URLs and allow easy
customization of notification messages, including the new attributes
htnotify_replyto, htnotify_webmaster, htnotify_prefix_file, and
htnotify_suffix_file.
    * Added a new "accents" fuzzy algorithm to morph accents, including
the new accents_db attribute.
    * Added a 'list all' feature to htsearch with a query of '*' or the
current prefix_match_character.
    * Added date restricted searching to htsearch including relative
dates.
    * Added documentation on running ht://Dig and the rundig script.
    * Added METADESCRIPTION and NSTARS variables to the htsearch templates
as well as support for $=(var) template variable references.
    * Added new config attributes to htsearch for restrict and exclude
which work like the normal htsearch form variables if the form variables
are not set.
    * Added many new attributes, including ignore_dead_servers
description_meta_tag_names, max_keywords, translate_latin1,
url_rewrite_rules, search_rewrite_rules, anchor_target, ignore_alt_text,
search_results_contenttype, boolean_keywords, boolean_syntax_errors,
multimatch_method, maximum_page_buttons, max_excerpts, plural_suffix,
any_keywords and use_doc_date.
    * Extended the build_select_lists attribute to support select
multiple, radio boxes and checkboxes.
    * Revised the documentation to make it clearer in parts, including the
url_part_aliases attribute.
    * Updated various contributed utilities including doc2html, xmlsearch,
rundig.sh, htparsedoc, acroconv.pl, multidig, etc.
    * A variety of other bug fixes, and many documentation updates. See
the ChangeLog for details.
    * Once again, thanks to everyone who reported bugs and bug fixes.

   The full ChangeLog for this release is available from:
   http://www.htdig.org/ChangeLog

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

Re: [htdig-dev] GCC warnings for htdig-3.2.0b4-012702

From: Gilles D. <gr...@sc...> - 2002-02-05 15:58:36

According to Alex Rousskov:
> 	Attached is error log, config.cache, and a patch in case you
> care to fix warnings that GCC is producing while compiling htdig in my
> environment. The patch does not fix warnings in conf_lexer.cxx because
> I am not sure what the dependency is there.

Thanks for the suggestions.  Your patches look good to me.  We'll have to
see how many of them are still appropriate/applicable after the mifluz
code merge, but we'll keep them in mind.  The conf_lexer.cxx code is a
bit of a problem because it's automatically generated by "flex", so we
don't have complete control over it.

> 	Environment:
> 		gcc version 2.7.2.3
> 		FreeBSD 3.5-STABLE
> 		htdig-3.2.0b4-012702
> 
> Please let me know if you need more information. 
> 
> Thank you for htdig.
> 
> Alex.
> 
> P.S. Apologies for not using the bug reporting form on SF. It
>      was too cumbersome to upload files compared to e-mail.

Yes, you're not the only one to find the SF bug reporting too cumbersome!

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

[htdig-dev] ANNOUNCE: xpdf 1.00 - a PDF viewer for X

From: Gilles D. <gr...@sc...> - 2002-02-04 22:31:00

I haven't tried this new xpdf release out with htdig and doc2html yet, but
I thought that some htdig users might be interested in this announcement,
so I'm passing it on...

--- begin forwarded message ---
From: de...@fo...
Date: 2 Feb 2002 00:51:15 -0000
Subject: ANNOUNCE: xpdf 1.00 - a PDF viewer for X

I've just released a new version of Xpdf, my Portable Document Format
(PDF) viewer for X.

Xpdf runs under the X Window System on Unix, VMS, and OS/2.  The non-X
components of the package (pdftops, pdftotext, etc.) also run on Win32
systems.

WARNING: Xpdf 1.x uses a completely different config file setup than
Xpdf 0.9x.  Please see the "Upgrading from Xpdf 0.9x" section in the
README file.

Noticeable changes:
* Completely rewrote the code that handles font encodings:
  - everything is Unicode-based
  - 16-bit fonts are handled much more cleanly
  - text output encoding can be set more flexibly
* New .xpdfrc config files.
* Implemented the sh (shaded fill) operator for the axial shading
  type.
* Added a duplex option to PSOutputDev and a -duplex switch to
  pdftops.
* Added key bindings for forward ('v') and backward ('b').
* Added the pdffonts program which lists the fonts used in a PDF
  file.
* Fixed several problems in the TrueType font embedding code (for
  PostScript output).
* Accept named destination on command line.
* Added several new items to pdfinfo: file size, PDF version, tagged
  (yes or no), XML metadata (with the -meta option).

See the `CHANGES' file for a complete list.

Source (C++ and C) is available, and it should be fairly easy to
compile for UNIX, VMS, OS/2, and Win32.

More information, source code, and precompiled binaries are on the
xpdf web page and ftp site:

    http://www.foolabs.com/xpdf/
    ftp://ftp.foolabs.com/pub/xpdf/

Source and Linux binaries are on sunsite.unc.edu, currently in
the incoming directory, but they will be moved to:

    ftp://ftp.ibiblio.org/pub/Linux/apps/graphics/viewers/X

--- end forwarded message ---

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

Re: [htdig-dev] Re: Clobbering CVS mainline

From: Gilles D. <gr...@sc...> - 2002-02-01 22:30:11

According to Geoff Hutchison:
> On Fri, 1 Feb 2002, Gilles Detillieux wrote:
> > how to deal with the patch to configure, as the patched code is probably
> > something internal to autoconf.  Maybe an autoconf expert can assist
> > with that, or maybe a newer autoconf version addresses this issue.
> 
> I'll take a look at the configure patch, but we probably will need to do
> some revising to update the configure stuff for autoconf 2.50 and newer,
> which do add some nice additions and clean up some bugs.
> 
> I'm not sure when the htsearch parser rewrites will happen and certainly
> some of the 3.1.6 changes don't impact either the mifluz import or the
> htsearch code. But then, you might have other work to do too. :-)

I do have other work to do.  (Of course, that's hasn't often stopped
me before.  :-P )  At the very least, I'll hold off on htsearch changes
until the breakdown and reorganization of Display.cc is done.  Many of
the 3.1.6 enhancements have been in there.

I think I'll also leave the htsearch "*" handling up to you, because I
think you have a clearer idea than I do as to how best to implement this
in 3.2, and how it'll affect the parser code as it stands or as it will be.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

Re: [htdig-dev] Re: Clobbering CVS mainline

From: Geoff H. <ghu...@ws...> - 2002-02-01 21:50:36

On Fri, 1 Feb 2002, Gilles Detillieux wrote:

> how to deal with the patch to configure, as the patched code is probably
> something internal to autoconf.  Maybe an autoconf expert can assist
> with that, or maybe a newer autoconf version addresses this issue.

I'll take a look at the configure patch, but we probably will need to do
some revising to update the configure stuff for autoconf 2.50 and newer,
which do add some nice additions and clean up some bugs.

I'm not sure when the htsearch parser rewrites will happen and certainly
some of the 3.1.6 changes don't impact either the mifluz import or the
htsearch code. But then, you might have other work to do too. :-)

-Geoff

Re: [htdig-dev] Re: Clobbering CVS mainline

From: Gilles D. <gr...@sc...> - 2002-02-01 21:43:04

According to Geoff Hutchison:
> As has been discussed a few times, I'm going to kill the current
> (broken) mainline and import the htdig-3-2-x branch in its place. This is
> a first step towards merging the new mifluz code and getting 3.2
> development moving again.
> 
> I'll make another announcement when this is done, as people may wish to
> switch their CVS trees.

Great.  I saved a copy of php-wrapper from the mainline, which is probably
the only thing worth salvaging, so I say clobber away.

I'll probably wait until the mifluz merge is done, and the htsearch
reorganization and parser rewrite are done, before I take time to port
3.1.6 enhancements and doc fixes over to 3.2.  (That'll give me time
for some other pressing matters here.)

Just to add some more stuff to the 3.2 to-do list, I thought I'd pass on
some patches that I found in the htdig-3.2.0-2.011302.src.rpm from Red Hat
rawhide distribution.  I left out the patches to rundig and htdig.conf,
which I think are rather assinine, but the other patches are noteworthy.
The %changelog in the .spec file says this:

* Wed Jan 30 2002 Phil Knirsch <pkn...@re...>
- Fixed newer autconf and gcc problems.
- Fixed a few compilations bugs.
- Disabled optimization as the new gcc still seems to have problems with it.

... so I thought it might be relevant to our efforts at gcc 3 compliance.

Of course, if we add includes for values.h and such, we should probably
test for them in ./configure.  I think we should also be consistent with
the whole string.h vs. strings.h issue and test for both.  I'm not sure
how to deal with the patch to configure, as the patched code is probably
something internal to autoconf.  Maybe an autoconf expert can assist
with that, or maybe a newer autoconf version addresses this issue.

::::::::::::::
htdig-3.2.0b3-glibc222.patch
::::::::::::::
--- htdig-3.2.0b3/htsearch/Display.cc.glibc222	Mon Mar  5 11:38:31 2001
+++ htdig-3.2.0b3/htsearch/Display.cc	Mon Mar  5 11:38:57 2001
@@ -36,6 +36,7 @@
 #include <locale.h>
 #include <math.h>
 #include <float.h>
+#include <values.h>
 
 #if !defined(DBL_MAX) && defined(MAXFLOAT)
 # define DBL_MAX MAXFLOAT
::::::::::::::
htdig-3.2.0b4-011302-configure.patch
::::::::::::::
--- htdig-3.2.0b4-011302/configure.configure	Sun Jan 13 09:13:22 2002
+++ htdig-3.2.0b4-011302/configure	Wed Jan 30 20:34:58 2002
@@ -3202,7 +3202,7 @@
 #include "confdefs.h"
 #include <zlib.h>
 EOF
-ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
+ac_try="$ac_cpp -w conftest.$ac_ext >/dev/null 2>conftest.out"
 { (eval echo configure:3207: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
 ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
 if test -z "$ac_err"; then
@@ -3505,7 +3505,7 @@
 #include "confdefs.h"
 #include <$ac_hdr>
 EOF
-ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
+ac_try="$ac_cpp -w conftest.$ac_ext >/dev/null 2>conftest.out"
 { (eval echo configure:3510: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
 ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
 if test -z "$ac_err"; then
@@ -3583,7 +3583,7 @@
 #include "confdefs.h"
 #include <fstream.h>
 EOF
-ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
+ac_try="$ac_cpp -w conftest.$ac_ext >/dev/null 2>conftest.out"
 { (eval echo configure:3588: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
 ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
 if test -z "$ac_err"; then
::::::::::::::
htdig-3.2.0b4-011302-md5.patch
::::::::::::::
--- htdig-3.2.0b4-011302/htlib/md5.cc.md5	Wed Jan 30 19:26:23 2002
+++ htdig-3.2.0b4-011302/htlib/md5.cc	Wed Jan 30 19:26:08 2002
@@ -1,4 +1,6 @@
 #include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
 #include <time.h>
 extern "C" {
 #include "mhash_md5.h"
::::::::::::::
htdig-3.2.0b4-h_hash.patch
::::::::::::::
--- htdig-3.2.0b4-011302/db/hash.c.h_hash	Tue Oct 10 05:15:27 2000
+++ htdig-3.2.0b4-011302/db/hash.c	Fri Jan 25 15:56:09 2002
@@ -245,6 +245,11 @@
 		need_sync = 1;
 	}
 
+        /* Make sure we always have a valid hashp->h_hash function. */
+	if (hashp->h_hash == NULL)
+		hashp->h_hash = hcp->hdr->dbmeta.version < 5
+		? CDB___ham_func4 : CDB___ham_func5;
+
 err2:	/* Release the meta data page */
 	if ((t_ret = CDB___ham_release_meta(dbc)) != 0 && ret == 0)
 		ret = t_ret;
::::::::::::::
htdig-3.2.0b4-xopen.patch
::::::::::::::
--- htdig-3.2.0b4-011302/db/os_rw.c.xopen	Tue Oct 10 05:15:29 2000
+++ htdig-3.2.0b4-011302/db/os_rw.c	Fri Jan 25 14:30:45 2002
@@ -5,6 +5,14 @@
  *	Sleepycat Software.  All rights reserved.
  */
 
+
+#define _XOPEN_SOURCE 500
+#include <sys/types.h>
+#include <unistd.h>
+#ifndef u_long
+typedef __u_long u_long;
+#endif
+
 #include "db_config.h"
 
 #ifndef lint

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

Re: [htdig-dev] [Announce] Release of ht://Dig 3.1.6

From: Joe R. J. <jj...@cl...> - 2002-02-01 19:46:33

On Fri, 1 Feb 2002, Geoff Hutchison wrote:

> Date: Fri, 01 Feb 2002 10:49:28 -0500 (EST)
> From: Geoff Hutchison <ghu...@ws...>
> To: htd...@li..., htd...@li...,
     htd...@li...
> Subject: [htdig-dev] [Announce] Release of ht://Dig 3.1.6
> 
> 
> At long last, I am quite pleased to announce the release of ht://Dig
> version 3.1.6. Thanks to the many people who contributed to this release
> in the form of code, feedback and bug reports!

How sweet it is;)  Thank you Geoff, thank you Gilles, thank you all.

Regards,

Joe
-- 
     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        jj...@cl...

[htdig-dev] Re: Clobbering CVS mainline

From: Geoff H. <ghu...@ws...> - 2002-02-01 17:25:21

As has been discussed a few times, I'm going to kill the current
(broken) mainline and import the htdig-3-2-x branch in its place. This is
a first step towards merging the new mifluz code and getting 3.2
development moving again.

I'll make another announcement when this is done, as people may wish to
switch their CVS trees.

-Geoff

[htdig-dev] [Announce] Release of ht://Dig 3.1.6

From: Geoff H. <ghu...@ws...> - 2002-02-01 15:59:38

At long last, I am quite pleased to announce the release of ht://Dig
version 3.1.6. Thanks to the many people who contributed to this release
in the form of code, feedback and bug reports!

This version is the latest production version and fixes a large number of
bugs, including all known security problems in previous versions. It is
*highly* recommended that all users update to this version. In addition,
version 3.1.6 offers additional features and improved documentation.

To download 3.1.6 or patches to previous versions, see
<http://www.htdig.org/files/where.html>
For the Release notes, see <http://www.htdig.org/RELEASE.html>
For the ChangeLog, see <http://www.htdig.org/ChangeLog>

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

 Release notes for htdig-3.1.6 1 Feb 2002

As with previous releases, this version cleans up some remaining bugs and
adds a few heavily-requested features. As the latest stable release, it is
recommended for all production servers.

    * Fixed another nasty security hole in htsearch, which would allow a
denial of service attack or forcing htsearch to read in config files
outside of the configuration directory.
    * Fixed some problems with htmerge, including problems with words
beginning with special characters and merging multiple databases.
    * Fixed a bug in handling hopcounts.
    * Fixed problems in handling non-standard relative HTTP redirects.
    * Fixed bugs in external parsers support including being confused by
charset information in the Content-Type header and handling binary output
from external converters.
    * Fixed bugs in the default English endings database. (Under ispell,
it wasn't quite intended for the accuracy needed for our usage.)
    * Fixed additional bugs in the endings fuzzy algorithm.
    * Fixed bugs with compiling with gcc-3.0 and later.
    * Fixed bugs compiling and running on Mac OS X.
    * Fixed problems with servers not returning a Last-Modified date--now
assums indexing time as modification time.
    * Fixed a variety of bugs in the HTML parser to more flexibly handle
non-standard HTML.
    * Fixed problems in the TCP connection code and will more reliably
timeout when a connection hangs and will retry bad connections several
times before giving up.
    * Added the -m "minimal" flag to htdig for only indexing a set list of
URLs and made the -l (log) flag the default behavior so that htdig will
stop and restart automatically.
    * Added htdump and htload programs for dumping ASCII representations
of the databases and reloading the same.
    * Added support for htnotify to collect multiple URLs and allow easy
customization of notification messages, including the new attributes
htnotify_replyto, htnotify_webmaster, htnotify_prefix_file, and
htnotify_suffix_file.
    * Added a new "accents" fuzzy algorithm to morph accents, including
the new accents_db attribute.
    * Added a 'list all' feature to htsearch with a query of '*' or the
current prefix_match_character.
    * Added date restricted searching to htsearch including relative
dates.
    * Added documentation on running ht://Dig and the rundig script.
    * Added METADESCRIPTION and NSTARS variables to the htsearch templates
as well as support for $=(var) template variable references.
    * Added new config attributes to htsearch for restrict and exclude
which work like the normal htsearch form variables if the form variables
are not set.
    * Added many new attributes, including ignore_dead_servers
description_meta_tag_names, max_keywords, translate_latin1,
url_rewrite_rules, search_rewrite_rules, anchor_target, ignore_alt_text,
search_results_contenttype, boolean_keywords, boolean_syntax_errors,
multimatch_method, maximum_page_buttons, max_excerpts, plural_suffix,
any_keywords and use_doc_date.
    * Extended the build_select_lists attribute to support select
multiple, radio boxes and checkboxes.
    * Revised the documentation to make it clearer in parts, including the
url_part_aliases attribute.
    * Updated various contributed utilities including doc2html, xmlsearch,
rundig.sh, htparsedoc, acroconv.pl, multidig, etc.
    * A variety of other bug fixes, and many documentation updates. See
the ChangeLog for details.
    * Once again, thanks to everyone who reported bugs and bug fixes.

Re: [htdig-dev] Last call 3.1.6

From: Gilles D. <gr...@sc...> - 2002-02-01 03:55:54

According to Geoff Hutchison:
> So if you have any last-second gotchas, please speak now.

All systems go on my end!  Thanks, Geoff.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

[htdig-dev] Last call 3.1.6

From: Geoff H. <ghu...@ws...> - 2002-02-01 01:08:56

OK, this is it. I have a tar.gz, a diff and all the updates to the
maindocs ready to go. I'll upload the .tar.gz and the diff after dinner
and check to make sure it downloads, the permissions are right, it
compiles, etc.

Technically, the updates will happen late tonight (~11:30 Chicago U.S.
time) for the mirrors, but I won't send out the release notices until
tomorrow morning. This also means that the mainpage of the news.txt file
that's included by SHTML in the main webpage won't mention 3.1.6 until
tomorrow morning either.

So if you have any last-second gotchas, please speak now.

-Geoff

[htdig-dev] cron job

From: CVR <cee...@ya...> - 2002-01-31 21:53:47

Hi Group,

I have installed htdig on Mandrake Linux 8.0 with
apache web serv. The search works fine but I get an
error: "DB2 problem...: /u01/htdig/db/db.docdb: No
such file or directory
htdig: Unable to open/create document database
'/u01/htdig/db/db.docdb'".

I did follow the thread(sometime in 2000) on this, but
that tapered off without an ending.

Any help would be greatly appreciated.

Tx

=====
-------------------------------------
God Saves...but ctrl+S is faster ;-)
-------------------------------------

__________________________________________________
Do You Yahoo!?
Great stuff seeking new owners in Yahoo! Auctions! 
http://auctions.yahoo.com

Re: [htdig-dev] latest to-do list for 3.1.6

From: Geoff H. <ghu...@ws...> - 2002-01-31 14:18:33

At 11:23 AM +0100 1/31/02, J. op den Brouw wrote:
>The Mirrors page holds links to 'latest production release' (3.1.5),

Actually this is the where.html page and it is updated on the 3-1-x 
branch. Obviously much of the website will be updated once the 
release happens and it may take some time for the mirrors to update.

-Geoff

Re: [htdig-dev] latest to-do list for 3.1.6

From: J. op d. B. <MSQ...@st...> - 2002-01-31 10:24:02

On Wed, 30 Jan 2002, Geoff Hutchison wrote:

> On Wed, 30 Jan 2002, Gilles Detillieux wrote:

> The FAQ needs to mention 3.1.6 as the latest version--I think there are
> one or two places where that happens. (Q. 2.1 in particular)

The Mirrors page holds links to 'latest production release' (3.1.5),
so these links have to be updated. Also, all mirrors must update
when 3.1.6 is released.

--jesse
--------------------------------------------------------------------
J. op den Brouw                           Johanna Westerdijkplein 75
Haagse Hogeschool                                  2521 EN  DEN HAAG
Faculty of Engeneering                                   Netherlands
Electrical Engeneering                                +31 70 4458936
-------------------- J.E...@st... --------------------

Linux - because reboots are for hardware changes

Re: [htdig-dev] latest to-do list for 3.1.6

From: Joe R. J. <jj...@cl...> - 2002-01-30 20:03:55

On Wed, 30 Jan 2002, Gilles Detillieux wrote:

> Date: Wed, 30 Jan 2002 13:18:51 -0600 (CST)
> From: Gilles Detillieux <gr...@sc...>
> To: "ht://Dig developers list" <htd...@li...>
> Subject: [htdig-dev] latest to-do list for 3.1.6
> 
> Here's my latest to-do list for 3.1.6.  Am I missing anything?
...
> 5.  better disclaimers about parse_doc.pl's obsolete status, in parse_doc.pl
>     and in FAQ (any other FAQ updates needed for new version?)

I think FAQ#5.14 neads to be changed, (configure --with-rx.)

Regards,

Joe
-- 
     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        jj...@cl...

Re: [htdig-dev] latest to-do list for 3.1.6

From: Geoff H. <ghu...@ws...> - 2002-01-30 19:25:48

On Wed, 30 Jan 2002, Gilles Detillieux wrote:

> 6.  merge ChangeLog updates into htdoc

7. Prepare maindocs updates and release notice

I have the Connection.cc code finished and will commit to CVS
momentarily. I'd do these pretty much in that order. I'll take care of 6
and 7 on Thursday evening and spin the tar file and diffs. As usual, these
will be uploaded well before the release notice and website changes hit.

The FAQ needs to mention 3.1.6 as the latest version--I think there are
one or two places where that happens. (Q. 2.1 in particular)

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

[htdig-dev] latest to-do list for 3.1.6

From: Gilles D. <gr...@sc...> - 2002-01-30 19:18:57

Here's my latest to-do list for 3.1.6.  Am I missing anything?

1.  fix Connection.cc error handling
2.  fix handling of install-strip in Makefile.in to work with relative path
    in INSTALL_PROGRAM (as per Jesse's e-mail)
3.  update english.0 with Alexander's submissions
4.  update synonyms files with David's submission
5.  better disclaimers about parse_doc.pl's obsolete status, in parse_doc.pl
    and in FAQ (any other FAQ updates needed for new version?)
6.  merge ChangeLog updates into htdoc

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

[htdig-dev] GCC warnings for htdig-3.2.0b4-012702

From: Alex R. <rou...@me...> - 2002-01-30 17:39:19

Attachments: t config.cache htdig-3.2.0b4-012702.patch

Hi there,

	Attached is error log, config.cache, and a patch in case you
care to fix warnings that GCC is producing while compiling htdig in my
environment. The patch does not fix warnings in conf_lexer.cxx because
I am not sure what the dependency is there.

	Environment:
		gcc version 2.7.2.3
		FreeBSD 3.5-STABLE
		htdig-3.2.0b4-012702

Please let me know if you need more information. 

Thank you for htdig.

Alex.

P.S. Apologies for not using the bug reporting form on SF. It
     was too cumbersome to upload files compared to e-mail.

[htdig-dev] Re: Mirroring Mailing list?

From: Geoff H. <ghu...@ws...> - 2002-01-30 17:07:34

Hi Jesse,

Do you think it would help to have a separate mailing list for ht://Dig
mirrors? It would obviously be low-volume, but could discuss issues like
ftp.htdig.org shutting down, changes in protocol, etc.

Obviously the mirrors would also get some advance notice of releases since
they're likely to see a load spike.

-Geoff

Re: [htdig-dev] Retriever/Parser

From: Neal R. <ne...@ri...> - 2002-01-30 01:58:05

On Tue, 29 Jan 2002, Geoff Hutchison wrote:

> The Retriever class isn't really built around much of anything IMHO. It
> requires that documents have a URL and that the URLs can be grouped into
> Server objects. 

	True, but the main Start function assumes a spidering
approach.  What if you just want to index a list of documents
already in memory (fetched from another source)?  The Start function is
cumbersome and there is no clear function that seems to say "here is
some data, please index it".  At least in my current reading it looks like
the core fetching + parsing + indexing + write-to-db process is shared
between Start & Retrieved Document.  Correct?

	Also, there are a few features of Retriever that are not useful
in other contexts... max_hop_count for instance.  Definitely a usable
class, but it's overkill for a very basic document whose source is outside
the Transport context.

> you're talking more about Transport-type concepts. 

	Yes, I was speaking in generalities.  I am basically thinking of
how HtDig can be used a a general purpose Information Retrieval tool.  I
probably switched topics a bit there.

	It's the difference between telling htdig "go over here, fetch the
data and index it all by yourself" vs "please index this data as I provide
it to you".  Using htdig as an 'application' vs 'a text indexing & query 
component of another system'

> Again, I think it's the URL that's the critical point. Otherwise how are
> the search results useful? How do you "jump to" a particular result from
> the output?

	For this project, all I really store as a 'URL' is part of the
path to an XML file.. so by itself the URL is useless to any transport
object.  For that matter you could use URL simply as a document-id in
another separate system.

 Integrating the necessary external code to find/fetch,
transcode (character set switch), parse via XSLT, etc would require as
much coding of a new Transport class (and integration of many external
libraries) as it would to:

Define a BasicDocument class with no bells and whistles other than a
Parser binding.

Define a TextCollecter (cousin of Retriever) whose sole job is to
facilitate parsing of documents and update the index.  No need to make
network connections, look at server codes, examine the document for links
to other documents, etc.  No 'document fetch' loop anywhere.   The
'index_doc' routine is called as needed, once per document by an outside
piece of code.

	The file is viewed via another piece of code that loads an XML
file, reads the fields contained via a specialized parser and does a kind
of Rendering/Formatting to present the information in a specific UI
complete with other bells and whistles.

	Similarly the query process is integrated inside another
UI.  A Query is received via user input, passed to htdig search APIs and
the results are repackaged with in the existing UI.


> Your work is appreciated. I'm just trying to point out a few things as
> someone who's been around for a while.

	Great, it's good input.. definitely helpful in understanding
htdig and the project team's conceptualization of it.

> 2) It's better not to reinvent the wheel. The less code that needs to be
> maintained, generally the better. Do we really need new Retriever classes,
> or do we need to refactor what we have?

	It's very powerful now, and very useful in a network centric
document environment.

	At some point the Retriever-as-swiss-army-knife approach can be
overly complex.  A more basic class for optional use can be good for a
narrow set of uses.

	What it comes down to is that I'm suggesting is that
libhtdig.so could use two additional classes that are very basic.  These
classes aren't really useful to anyone not using htdig as a separate
Information Retrieval component of another app.

	One could make an argument that mifluz could be used directly for
this.  Very true, but mifluz is a bucket of nice parts.  Htdig is a
working tool with the wrappers that make mifluz usefull quickly.

Thanks.
-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site

9 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 95 96 97 98 99 .. 108 > >> (Page 97 of 108)