You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(47) |
Nov
(74) |
Dec
(66) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(95) |
Feb
(102) |
Mar
(83) |
Apr
(64) |
May
(55) |
Jun
(39) |
Jul
(23) |
Aug
(77) |
Sep
(88) |
Oct
(84) |
Nov
(66) |
Dec
(46) |
| 2003 |
Jan
(56) |
Feb
(129) |
Mar
(37) |
Apr
(63) |
May
(59) |
Jun
(104) |
Jul
(48) |
Aug
(37) |
Sep
(49) |
Oct
(157) |
Nov
(119) |
Dec
(54) |
| 2004 |
Jan
(51) |
Feb
(66) |
Mar
(39) |
Apr
(113) |
May
(34) |
Jun
(136) |
Jul
(67) |
Aug
(20) |
Sep
(7) |
Oct
(10) |
Nov
(14) |
Dec
(3) |
| 2005 |
Jan
(40) |
Feb
(21) |
Mar
(26) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
(23) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
| 2006 |
Jan
(2) |
Feb
(4) |
Mar
(4) |
Apr
(1) |
May
(11) |
Jun
(1) |
Jul
(4) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
| 2007 |
Jan
(2) |
Feb
(8) |
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: Neal R. <ne...@ri...> - 2002-02-07 04:02:09
|
> Can we come up with other types of Retriever classes beyond the > "here's a document in memory, index it" and "here's a URL, fetch it, > check status, index and spider" approaches? Not sure. Anyone? Could be some ability to fetch documents over samba connections be usefull? I've got working BasicDocument & TextCollecter classes I'll post soon. > What do you present in the search results? How does a user select a > particular document--is it a link to fetch the document based on the > DocID? This may help for the people who've asked if htdig could not > only fetch the document but leave a local copy, a la the Google > "Cached Results" feature. The search results will be fetched via another set of classes. I'm adapting the current htsearch query & display classes to have a per document API. As each result is fetched, the 'URL' is in effect a pointer to an XML document which is parsed and displayed with a PHP & XSLT. The 'URL' as it stands is not useable as a seperate entity, at least for this application. One idea worth consideration along the lines of the "Google cached" document feature would be to offload all spidering duties to code like 'httrack', then index the files on the database. With a log file produced during httrack spidering' a second CACHED_URL could be filled with the location of the local-copy, while the source URL is preserved. httrack is built to spider and save web-pages to have local-relative access to what is needed & linked to in the page. It's pretty well maintaned and well thought of, maybe you'd rather leave the maintenance of spidering code to that project instead. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Geoff H. <ghu...@ws...> - 2002-02-07 03:39:22
|
I'm going to take your points in a slightly different order and so I apologize for those following the thread. At 6:54 PM -0700 1/29/02, Neal Richter wrote: > One could make an argument that mifluz could be used directly for >this. Very true, but mifluz is a bucket of nice parts. Htdig is a >working tool with the wrappers that make mifluz usefull quickly. I'm not sure Loic ever saw it that way, but that's another story. He certainly was looking for very similar things in terms of using ht://Dig in other contexts. But we also haven't heard how his designs have changed over the years. > At some point the Retriever-as-swiss-army-knife approach can be >overly complex. A more basic class for optional use can be good for a >narrow set of uses. Fair enough, but again, this sounds more like a refactoring of htdig. Michael Haggerty worked on some things that may or may not be of interest (I haven't seen everything he did to htdig/ myself). If you have the CVS repository around, you should check the mrh-refactor-htdig branch. If not, let me know and I'll pull together a .tar.gz of that. Can we come up with other types of Retriever classes beyond the "here's a document in memory, index it" and "here's a URL, fetch it, check status, index and spider" approaches? > For this project, all I really store as a 'URL' is part of the >path to an XML file.. so by itself the URL is useless to any transport >object. For that matter you could use URL simply as a document-id in >another separate system. ... > Similarly the query process is integrated inside another >UI. A Query is received via user input, passed to htdig search APIs and >the results are repackaged with in the existing UI. I'm curious about the URL from the search results aspect of it. Indeed the 3.2 code relies very little on the URL for indexing as everything is keyed by DocID. This is in contrast to 3.1 and prior where the URL was the key to the document database. What do you present in the search results? How does a user select a particular document--is it a link to fetch the document based on the DocID? This may help for the people who've asked if htdig could not only fetch the document but leave a local copy, a la the Google "Cached Results" feature. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2002-02-07 03:39:19
|
Some of you are probably wondering why the mifluz merge doesn't seem to be progressing. There are a few reasons for this. First off, after some experimentation, it seems like the current mifluz CVS code doesn't build... There's also the small problem that the latest released version isn't particularly cross-platform. Here's how you can help the effort. Download mifluz from <ftp://ftp.gnu.org/gnu/mifluz/mifluz-0.23.0.tar.gz> Try to ./configure; make Let me know if that's successful. In particular, I'm looking to find out what platforms have problems because ./configure dies with complaints about iconv.h. (These are platforms where we'll have to build a replacement.) Failures: * Mac OS X 10.1.2 Success: * RedHat 7.2 -Geoff |
|
From: Neal R. <ne...@ri...> - 2002-02-07 03:23:07
|
Hey, What are your thoughts on htfuzzy efficiency. Let's say I've got a htdig database that is getting incrementally added to, say 1-2 megabytes per day in the worst case. I want to add new documents, which works fine, then update the stemming databases. Right now it seems to be re-calculating all of the stemms for the entire search database each time it is run. This kind of thing would get more expensive each day to reprocess all the data... Can htmerge merge two stemming databases (looks like no?) Am I wrong here? -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Neal R. <ne...@ri...> - 2002-02-07 00:59:38
|
Hello, Here's a humble patch.. Instead of calling system(mv file1 file2) in htfuzzy/EndingsDB.cc & htfuzzy/Synonym.cc use rename(file1, file2); See man 2 rename man 2 unlink [unlink() = rm] No need for any system specific stuff here, rename() is in libc. Thanks htfuzzy/EndingsDB.cc 82d81 < /* 90,93d88 < */ < < rename(root2word.get(), config["endings_root2word_db"].get()); < rename(word2root.get(), config["endings_word2root_db"].get()); htfuzzy/Synonym.cc 121d120 < /* 128,130d126 < */ < < rename(dbFile.get(), config["synonym_db"].get()); -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |
|
From: Gilles D. <gr...@sc...> - 2002-02-05 23:53:44
|
I've just uploaded source and binary rpms for the ht://Dig 3.1.6 web site search engine to the htdig.org site, in http://www.htdig.org/files/binaries/. They can also be downloaded from the SCRC web site, at http://www.scrc.umanitoba.ca/htdig/rpms/. This is the latest stable release and is recommended for all production servers. This version in particular fixes a nasty security hole in htsearch that is present in all previous versions, including 3.1.5 and 3.2.0b3. Because of this, it is *strongly* recommended that all users update to this version. The following RPMs were built on Red Hat Linux 4.2, 6.2 and 7.2: htdig-3.1.6-0rh42.i386.rpm (for old libc5-based Red Hat 4.2) htdig-3.1.6-0rh62.i386.rpm (for glibc-2.1-based Red Hat 6.2) htdig-3.1.6-0.src.rpm (built on 4.2, but OK for 4.x, 5.x & 6.x) htdig-3.1.6-0.rh72.i386.rpm * (for glibc-2.2-based Red Hat 7.x, see note) htdig-web-3.1.6-0.rh72.i386.rpm * (ditto, see note below) htdig-3.1.6-0.rh72.src.rpm * (ditto) Verify /etc/htdig/htdig.conf, then run /usr/sbin/rundig after installing, to (re)build all your databases. ---- * Note to Red Hat 7.1 & 7.2 users: The KDE 2.1 package shipped with Red Hat 7.1 uses htdig and htsearch to index and search its documentation. For some reason, the version Red Hat shipped is the buggy old 3.2.0b3 beta release, which was later upgraded to a late-October 2001 snapshot of 3.2.0b4 in the errata update release packages htdig-3.2.0-1.b4.0.71 and htdig-web-3.2.0-1.b4.0.71. While less buggy than 3.2.0b3, this is still not exactly stable code. The "rh72" packages above are meant to be drop-in replacements for the 3.2.0 betas, but because it's a smaller version number (even though being a more recent release), you have to use the --oldpackage option on the rpm command to update htdig to this release. You should also find and remove databases made by the 3.2.0 version and rebuild your indexes, as 3.1.6 uses a different database version and format than 3.2.0 betas. The binary packages are split in two because you only need the htdig-web package for allowing searches from your web site, while the htdig package is sufficient for KDE's khelpcenter search tool. All this may be academic because htsearch support was dropped from khelpcenter in KDE 2.2, which shipped with Red Hat 7.2. ---- Name : htdig Distribution: (none) Version : 3.1.6 Vendor: (none) Release : 0 Build Date: Fri Feb 01 10:09:57 2002 Install date: Fri Feb 01 10:12:02 2002 Build Host: cliff.scrc.umanitoba.ca Group : Networking/Utilities Source RPM: htdig-3.1.6-0.src.rpm Size : 3809910 Packager : Gilles Detillieux <gr...@sc...> URL : http://www.htdig.org/ Summary : A web indexing and searching system for a small domain or intranet Description : The ht://Dig system is a complete world wide web indexing and searching system for a small domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Webcrawler and AltaVista. Instead it is meant to cover the search needs for a single company, campus, or even a particular sub section of a web site. As opposed to some WAIS-based or web-server based search engines, ht://Dig can span several web servers at a site. The type of these different web servers doesn't matter as long as they understand the HTTP 1.0 protocol. ---- Release notes for htdig-3.1.6 1 Feb 2002 As with previous releases, this version cleans up some remaining bugs and adds a few heavily-requested features. As the latest stable release, it is recommended for all production servers. * Fixed another nasty security hole in htsearch, which would allow a denial of service attack or forcing htsearch to read in config files outside of the configuration directory. * Fixed some problems with htmerge, including problems with words beginning with special characters and merging multiple databases. * Fixed a bug in handling hopcounts. * Fixed problems in handling non-standard relative HTTP redirects. * Fixed bugs in external parsers support including being confused by charset information in the Content-Type header and handling binary output from external converters. * Fixed bugs in the default English endings database. (Under ispell, it wasn't quite intended for the accuracy needed for our usage.) * Fixed additional bugs in the endings fuzzy algorithm. * Fixed bugs with compiling with gcc-3.0 and later. * Fixed bugs compiling and running on Mac OS X. * Fixed problems with servers not returning a Last-Modified date--now assums indexing time as modification time. * Fixed a variety of bugs in the HTML parser to more flexibly handle non-standard HTML. * Fixed problems in the TCP connection code and will more reliably timeout when a connection hangs and will retry bad connections several times before giving up. * Added the -m "minimal" flag to htdig for only indexing a set list of URLs and made the -l (log) flag the default behavior so that htdig will stop and restart automatically. * Added htdump and htload programs for dumping ASCII representations of the databases and reloading the same. * Added support for htnotify to collect multiple URLs and allow easy customization of notification messages, including the new attributes htnotify_replyto, htnotify_webmaster, htnotify_prefix_file, and htnotify_suffix_file. * Added a new "accents" fuzzy algorithm to morph accents, including the new accents_db attribute. * Added a 'list all' feature to htsearch with a query of '*' or the current prefix_match_character. * Added date restricted searching to htsearch including relative dates. * Added documentation on running ht://Dig and the rundig script. * Added METADESCRIPTION and NSTARS variables to the htsearch templates as well as support for $=(var) template variable references. * Added new config attributes to htsearch for restrict and exclude which work like the normal htsearch form variables if the form variables are not set. * Added many new attributes, including ignore_dead_servers description_meta_tag_names, max_keywords, translate_latin1, url_rewrite_rules, search_rewrite_rules, anchor_target, ignore_alt_text, search_results_contenttype, boolean_keywords, boolean_syntax_errors, multimatch_method, maximum_page_buttons, max_excerpts, plural_suffix, any_keywords and use_doc_date. * Extended the build_select_lists attribute to support select multiple, radio boxes and checkboxes. * Revised the documentation to make it clearer in parts, including the url_part_aliases attribute. * Updated various contributed utilities including doc2html, xmlsearch, rundig.sh, htparsedoc, acroconv.pl, multidig, etc. * A variety of other bug fixes, and many documentation updates. See the ChangeLog for details. * Once again, thanks to everyone who reported bugs and bug fixes. The full ChangeLog for this release is available from: http://www.htdig.org/ChangeLog -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Gilles D. <gr...@sc...> - 2002-02-05 15:58:36
|
According to Alex Rousskov: > Attached is error log, config.cache, and a patch in case you > care to fix warnings that GCC is producing while compiling htdig in my > environment. The patch does not fix warnings in conf_lexer.cxx because > I am not sure what the dependency is there. Thanks for the suggestions. Your patches look good to me. We'll have to see how many of them are still appropriate/applicable after the mifluz code merge, but we'll keep them in mind. The conf_lexer.cxx code is a bit of a problem because it's automatically generated by "flex", so we don't have complete control over it. > Environment: > gcc version 2.7.2.3 > FreeBSD 3.5-STABLE > htdig-3.2.0b4-012702 > > Please let me know if you need more information. > > Thank you for htdig. > > Alex. > > P.S. Apologies for not using the bug reporting form on SF. It > was too cumbersome to upload files compared to e-mail. Yes, you're not the only one to find the SF bug reporting too cumbersome! -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Gilles D. <gr...@sc...> - 2002-02-04 22:31:00
|
I haven't tried this new xpdf release out with htdig and doc2html yet, but
I thought that some htdig users might be interested in this announcement,
so I'm passing it on...
--- begin forwarded message ---
From: de...@fo...
Date: 2 Feb 2002 00:51:15 -0000
Subject: ANNOUNCE: xpdf 1.00 - a PDF viewer for X
I've just released a new version of Xpdf, my Portable Document Format
(PDF) viewer for X.
Xpdf runs under the X Window System on Unix, VMS, and OS/2. The non-X
components of the package (pdftops, pdftotext, etc.) also run on Win32
systems.
WARNING: Xpdf 1.x uses a completely different config file setup than
Xpdf 0.9x. Please see the "Upgrading from Xpdf 0.9x" section in the
README file.
Noticeable changes:
* Completely rewrote the code that handles font encodings:
- everything is Unicode-based
- 16-bit fonts are handled much more cleanly
- text output encoding can be set more flexibly
* New .xpdfrc config files.
* Implemented the sh (shaded fill) operator for the axial shading
type.
* Added a duplex option to PSOutputDev and a -duplex switch to
pdftops.
* Added key bindings for forward ('v') and backward ('b').
* Added the pdffonts program which lists the fonts used in a PDF
file.
* Fixed several problems in the TrueType font embedding code (for
PostScript output).
* Accept named destination on command line.
* Added several new items to pdfinfo: file size, PDF version, tagged
(yes or no), XML metadata (with the -meta option).
See the `CHANGES' file for a complete list.
Source (C++ and C) is available, and it should be fairly easy to
compile for UNIX, VMS, OS/2, and Win32.
More information, source code, and precompiled binaries are on the
xpdf web page and ftp site:
http://www.foolabs.com/xpdf/
ftp://ftp.foolabs.com/pub/xpdf/
Source and Linux binaries are on sunsite.unc.edu, currently in
the incoming directory, but they will be moved to:
ftp://ftp.ibiblio.org/pub/Linux/apps/graphics/viewers/X
--- end forwarded message ---
--
Gilles R. Detillieux E-mail: <gr...@sc...>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
|
|
From: Gilles D. <gr...@sc...> - 2002-02-01 22:30:11
|
According to Geoff Hutchison: > On Fri, 1 Feb 2002, Gilles Detillieux wrote: > > how to deal with the patch to configure, as the patched code is probably > > something internal to autoconf. Maybe an autoconf expert can assist > > with that, or maybe a newer autoconf version addresses this issue. > > I'll take a look at the configure patch, but we probably will need to do > some revising to update the configure stuff for autoconf 2.50 and newer, > which do add some nice additions and clean up some bugs. > > I'm not sure when the htsearch parser rewrites will happen and certainly > some of the 3.1.6 changes don't impact either the mifluz import or the > htsearch code. But then, you might have other work to do too. :-) I do have other work to do. (Of course, that's hasn't often stopped me before. :-P ) At the very least, I'll hold off on htsearch changes until the breakdown and reorganization of Display.cc is done. Many of the 3.1.6 enhancements have been in there. I think I'll also leave the htsearch "*" handling up to you, because I think you have a clearer idea than I do as to how best to implement this in 3.2, and how it'll affect the parser code as it stands or as it will be. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Geoff H. <ghu...@ws...> - 2002-02-01 21:50:36
|
On Fri, 1 Feb 2002, Gilles Detillieux wrote: > how to deal with the patch to configure, as the patched code is probably > something internal to autoconf. Maybe an autoconf expert can assist > with that, or maybe a newer autoconf version addresses this issue. I'll take a look at the configure patch, but we probably will need to do some revising to update the configure stuff for autoconf 2.50 and newer, which do add some nice additions and clean up some bugs. I'm not sure when the htsearch parser rewrites will happen and certainly some of the 3.1.6 changes don't impact either the mifluz import or the htsearch code. But then, you might have other work to do too. :-) -Geoff |
|
From: Gilles D. <gr...@sc...> - 2002-02-01 21:43:04
|
According to Geoff Hutchison:
> As has been discussed a few times, I'm going to kill the current
> (broken) mainline and import the htdig-3-2-x branch in its place. This is
> a first step towards merging the new mifluz code and getting 3.2
> development moving again.
>
> I'll make another announcement when this is done, as people may wish to
> switch their CVS trees.
Great. I saved a copy of php-wrapper from the mainline, which is probably
the only thing worth salvaging, so I say clobber away.
I'll probably wait until the mifluz merge is done, and the htsearch
reorganization and parser rewrite are done, before I take time to port
3.1.6 enhancements and doc fixes over to 3.2. (That'll give me time
for some other pressing matters here.)
Just to add some more stuff to the 3.2 to-do list, I thought I'd pass on
some patches that I found in the htdig-3.2.0-2.011302.src.rpm from Red Hat
rawhide distribution. I left out the patches to rundig and htdig.conf,
which I think are rather assinine, but the other patches are noteworthy.
The %changelog in the .spec file says this:
* Wed Jan 30 2002 Phil Knirsch <pkn...@re...>
- Fixed newer autconf and gcc problems.
- Fixed a few compilations bugs.
- Disabled optimization as the new gcc still seems to have problems with it.
... so I thought it might be relevant to our efforts at gcc 3 compliance.
Of course, if we add includes for values.h and such, we should probably
test for them in ./configure. I think we should also be consistent with
the whole string.h vs. strings.h issue and test for both. I'm not sure
how to deal with the patch to configure, as the patched code is probably
something internal to autoconf. Maybe an autoconf expert can assist
with that, or maybe a newer autoconf version addresses this issue.
::::::::::::::
htdig-3.2.0b3-glibc222.patch
::::::::::::::
--- htdig-3.2.0b3/htsearch/Display.cc.glibc222 Mon Mar 5 11:38:31 2001
+++ htdig-3.2.0b3/htsearch/Display.cc Mon Mar 5 11:38:57 2001
@@ -36,6 +36,7 @@
#include <locale.h>
#include <math.h>
#include <float.h>
+#include <values.h>
#if !defined(DBL_MAX) && defined(MAXFLOAT)
# define DBL_MAX MAXFLOAT
::::::::::::::
htdig-3.2.0b4-011302-configure.patch
::::::::::::::
--- htdig-3.2.0b4-011302/configure.configure Sun Jan 13 09:13:22 2002
+++ htdig-3.2.0b4-011302/configure Wed Jan 30 20:34:58 2002
@@ -3202,7 +3202,7 @@
#include "confdefs.h"
#include <zlib.h>
EOF
-ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
+ac_try="$ac_cpp -w conftest.$ac_ext >/dev/null 2>conftest.out"
{ (eval echo configure:3207: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
if test -z "$ac_err"; then
@@ -3505,7 +3505,7 @@
#include "confdefs.h"
#include <$ac_hdr>
EOF
-ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
+ac_try="$ac_cpp -w conftest.$ac_ext >/dev/null 2>conftest.out"
{ (eval echo configure:3510: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
if test -z "$ac_err"; then
@@ -3583,7 +3583,7 @@
#include "confdefs.h"
#include <fstream.h>
EOF
-ac_try="$ac_cpp conftest.$ac_ext >/dev/null 2>conftest.out"
+ac_try="$ac_cpp -w conftest.$ac_ext >/dev/null 2>conftest.out"
{ (eval echo configure:3588: \"$ac_try\") 1>&5; (eval $ac_try) 2>&5; }
ac_err=`grep -v '^ *+' conftest.out | grep -v "^conftest.${ac_ext}\$"`
if test -z "$ac_err"; then
::::::::::::::
htdig-3.2.0b4-011302-md5.patch
::::::::::::::
--- htdig-3.2.0b4-011302/htlib/md5.cc.md5 Wed Jan 30 19:26:23 2002
+++ htdig-3.2.0b4-011302/htlib/md5.cc Wed Jan 30 19:26:08 2002
@@ -1,4 +1,6 @@
#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
#include <time.h>
extern "C" {
#include "mhash_md5.h"
::::::::::::::
htdig-3.2.0b4-h_hash.patch
::::::::::::::
--- htdig-3.2.0b4-011302/db/hash.c.h_hash Tue Oct 10 05:15:27 2000
+++ htdig-3.2.0b4-011302/db/hash.c Fri Jan 25 15:56:09 2002
@@ -245,6 +245,11 @@
need_sync = 1;
}
+ /* Make sure we always have a valid hashp->h_hash function. */
+ if (hashp->h_hash == NULL)
+ hashp->h_hash = hcp->hdr->dbmeta.version < 5
+ ? CDB___ham_func4 : CDB___ham_func5;
+
err2: /* Release the meta data page */
if ((t_ret = CDB___ham_release_meta(dbc)) != 0 && ret == 0)
ret = t_ret;
::::::::::::::
htdig-3.2.0b4-xopen.patch
::::::::::::::
--- htdig-3.2.0b4-011302/db/os_rw.c.xopen Tue Oct 10 05:15:29 2000
+++ htdig-3.2.0b4-011302/db/os_rw.c Fri Jan 25 14:30:45 2002
@@ -5,6 +5,14 @@
* Sleepycat Software. All rights reserved.
*/
+
+#define _XOPEN_SOURCE 500
+#include <sys/types.h>
+#include <unistd.h>
+#ifndef u_long
+typedef __u_long u_long;
+#endif
+
#include "db_config.h"
#ifndef lint
--
Gilles R. Detillieux E-mail: <gr...@sc...>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
|
|
From: Joe R. J. <jj...@cl...> - 2002-02-01 19:46:33
|
On Fri, 1 Feb 2002, Geoff Hutchison wrote:
> Date: Fri, 01 Feb 2002 10:49:28 -0500 (EST)
> From: Geoff Hutchison <ghu...@ws...>
> To: htd...@li..., htd...@li...,
htd...@li...
> Subject: [htdig-dev] [Announce] Release of ht://Dig 3.1.6
>
>
> At long last, I am quite pleased to announce the release of ht://Dig
> version 3.1.6. Thanks to the many people who contributed to this release
> in the form of code, feedback and bug reports!
How sweet it is;) Thank you Geoff, thank you Gilles, thank you all.
Regards,
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah jj...@cl...
|
|
From: Geoff H. <ghu...@ws...> - 2002-02-01 17:25:21
|
As has been discussed a few times, I'm going to kill the current (broken) mainline and import the htdig-3-2-x branch in its place. This is a first step towards merging the new mifluz code and getting 3.2 development moving again. I'll make another announcement when this is done, as people may wish to switch their CVS trees. -Geoff |
|
From: Geoff H. <ghu...@ws...> - 2002-02-01 15:59:38
|
At long last, I am quite pleased to announce the release of ht://Dig version 3.1.6. Thanks to the many people who contributed to this release in the form of code, feedback and bug reports! This version is the latest production version and fixes a large number of bugs, including all known security problems in previous versions. It is *highly* recommended that all users update to this version. In addition, version 3.1.6 offers additional features and improved documentation. To download 3.1.6 or patches to previous versions, see <http://www.htdig.org/files/where.html> For the Release notes, see <http://www.htdig.org/RELEASE.html> For the ChangeLog, see <http://www.htdig.org/ChangeLog> -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ Release notes for htdig-3.1.6 1 Feb 2002 As with previous releases, this version cleans up some remaining bugs and adds a few heavily-requested features. As the latest stable release, it is recommended for all production servers. * Fixed another nasty security hole in htsearch, which would allow a denial of service attack or forcing htsearch to read in config files outside of the configuration directory. * Fixed some problems with htmerge, including problems with words beginning with special characters and merging multiple databases. * Fixed a bug in handling hopcounts. * Fixed problems in handling non-standard relative HTTP redirects. * Fixed bugs in external parsers support including being confused by charset information in the Content-Type header and handling binary output from external converters. * Fixed bugs in the default English endings database. (Under ispell, it wasn't quite intended for the accuracy needed for our usage.) * Fixed additional bugs in the endings fuzzy algorithm. * Fixed bugs with compiling with gcc-3.0 and later. * Fixed bugs compiling and running on Mac OS X. * Fixed problems with servers not returning a Last-Modified date--now assums indexing time as modification time. * Fixed a variety of bugs in the HTML parser to more flexibly handle non-standard HTML. * Fixed problems in the TCP connection code and will more reliably timeout when a connection hangs and will retry bad connections several times before giving up. * Added the -m "minimal" flag to htdig for only indexing a set list of URLs and made the -l (log) flag the default behavior so that htdig will stop and restart automatically. * Added htdump and htload programs for dumping ASCII representations of the databases and reloading the same. * Added support for htnotify to collect multiple URLs and allow easy customization of notification messages, including the new attributes htnotify_replyto, htnotify_webmaster, htnotify_prefix_file, and htnotify_suffix_file. * Added a new "accents" fuzzy algorithm to morph accents, including the new accents_db attribute. * Added a 'list all' feature to htsearch with a query of '*' or the current prefix_match_character. * Added date restricted searching to htsearch including relative dates. * Added documentation on running ht://Dig and the rundig script. * Added METADESCRIPTION and NSTARS variables to the htsearch templates as well as support for $=(var) template variable references. * Added new config attributes to htsearch for restrict and exclude which work like the normal htsearch form variables if the form variables are not set. * Added many new attributes, including ignore_dead_servers description_meta_tag_names, max_keywords, translate_latin1, url_rewrite_rules, search_rewrite_rules, anchor_target, ignore_alt_text, search_results_contenttype, boolean_keywords, boolean_syntax_errors, multimatch_method, maximum_page_buttons, max_excerpts, plural_suffix, any_keywords and use_doc_date. * Extended the build_select_lists attribute to support select multiple, radio boxes and checkboxes. * Revised the documentation to make it clearer in parts, including the url_part_aliases attribute. * Updated various contributed utilities including doc2html, xmlsearch, rundig.sh, htparsedoc, acroconv.pl, multidig, etc. * A variety of other bug fixes, and many documentation updates. See the ChangeLog for details. * Once again, thanks to everyone who reported bugs and bug fixes. |
|
From: Gilles D. <gr...@sc...> - 2002-02-01 03:55:54
|
According to Geoff Hutchison: > So if you have any last-second gotchas, please speak now. All systems go on my end! Thanks, Geoff. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
|
From: Geoff H. <ghu...@ws...> - 2002-02-01 01:08:56
|
OK, this is it. I have a tar.gz, a diff and all the updates to the maindocs ready to go. I'll upload the .tar.gz and the diff after dinner and check to make sure it downloads, the permissions are right, it compiles, etc. Technically, the updates will happen late tonight (~11:30 Chicago U.S. time) for the mirrors, but I won't send out the release notices until tomorrow morning. This also means that the mainpage of the news.txt file that's included by SHTML in the main webpage won't mention 3.1.6 until tomorrow morning either. So if you have any last-second gotchas, please speak now. -Geoff |
|
From: CVR <cee...@ya...> - 2002-01-31 21:53:47
|
Hi Group, I have installed htdig on Mandrake Linux 8.0 with apache web serv. The search works fine but I get an error: "DB2 problem...: /u01/htdig/db/db.docdb: No such file or directory htdig: Unable to open/create document database '/u01/htdig/db/db.docdb'". I did follow the thread(sometime in 2000) on this, but that tapered off without an ending. Any help would be greatly appreciated. Tx ===== ------------------------------------- God Saves...but ctrl+S is faster ;-) ------------------------------------- __________________________________________________ Do You Yahoo!? Great stuff seeking new owners in Yahoo! Auctions! http://auctions.yahoo.com |
|
From: Geoff H. <ghu...@ws...> - 2002-01-31 14:18:33
|
At 11:23 AM +0100 1/31/02, J. op den Brouw wrote: >The Mirrors page holds links to 'latest production release' (3.1.5), Actually this is the where.html page and it is updated on the 3-1-x branch. Obviously much of the website will be updated once the release happens and it may take some time for the mirrors to update. -Geoff |
|
From: J. op d. B. <MSQ...@st...> - 2002-01-31 10:24:02
|
On Wed, 30 Jan 2002, Geoff Hutchison wrote: > On Wed, 30 Jan 2002, Gilles Detillieux wrote: > The FAQ needs to mention 3.1.6 as the latest version--I think there are > one or two places where that happens. (Q. 2.1 in particular) The Mirrors page holds links to 'latest production release' (3.1.5), so these links have to be updated. Also, all mirrors must update when 3.1.6 is released. --jesse -------------------------------------------------------------------- J. op den Brouw Johanna Westerdijkplein 75 Haagse Hogeschool 2521 EN DEN HAAG Faculty of Engeneering Netherlands Electrical Engeneering +31 70 4458936 -------------------- J.E...@st... -------------------- Linux - because reboots are for hardware changes |
|
From: Joe R. J. <jj...@cl...> - 2002-01-30 20:03:55
|
On Wed, 30 Jan 2002, Gilles Detillieux wrote:
> Date: Wed, 30 Jan 2002 13:18:51 -0600 (CST)
> From: Gilles Detillieux <gr...@sc...>
> To: "ht://Dig developers list" <htd...@li...>
> Subject: [htdig-dev] latest to-do list for 3.1.6
>
> Here's my latest to-do list for 3.1.6. Am I missing anything?
...
> 5. better disclaimers about parse_doc.pl's obsolete status, in parse_doc.pl
> and in FAQ (any other FAQ updates needed for new version?)
I think FAQ#5.14 neads to be changed, (configure --with-rx.)
Regards,
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah jj...@cl...
|
|
From: Geoff H. <ghu...@ws...> - 2002-01-30 19:25:48
|
On Wed, 30 Jan 2002, Gilles Detillieux wrote: > 6. merge ChangeLog updates into htdoc 7. Prepare maindocs updates and release notice I have the Connection.cc code finished and will commit to CVS momentarily. I'd do these pretty much in that order. I'll take care of 6 and 7 on Thursday evening and spin the tar file and diffs. As usual, these will be uploaded well before the release notice and website changes hit. The FAQ needs to mention 3.1.6 as the latest version--I think there are one or two places where that happens. (Q. 2.1 in particular) -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
|
From: Gilles D. <gr...@sc...> - 2002-01-30 19:18:57
|
Here's my latest to-do list for 3.1.6. Am I missing anything?
1. fix Connection.cc error handling
2. fix handling of install-strip in Makefile.in to work with relative path
in INSTALL_PROGRAM (as per Jesse's e-mail)
3. update english.0 with Alexander's submissions
4. update synonyms files with David's submission
5. better disclaimers about parse_doc.pl's obsolete status, in parse_doc.pl
and in FAQ (any other FAQ updates needed for new version?)
6. merge ChangeLog updates into htdoc
--
Gilles R. Detillieux E-mail: <gr...@sc...>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
|
|
From: Alex R. <rou...@me...> - 2002-01-30 17:39:19
|
Hi there,
Attached is error log, config.cache, and a patch in case you
care to fix warnings that GCC is producing while compiling htdig in my
environment. The patch does not fix warnings in conf_lexer.cxx because
I am not sure what the dependency is there.
Environment:
gcc version 2.7.2.3
FreeBSD 3.5-STABLE
htdig-3.2.0b4-012702
Please let me know if you need more information.
Thank you for htdig.
Alex.
P.S. Apologies for not using the bug reporting form on SF. It
was too cumbersome to upload files compared to e-mail.
|
|
From: Geoff H. <ghu...@ws...> - 2002-01-30 17:07:34
|
Hi Jesse, Do you think it would help to have a separate mailing list for ht://Dig mirrors? It would obviously be low-volume, but could discuss issues like ftp.htdig.org shutting down, changes in protocol, etc. Obviously the mirrors would also get some advance notice of releases since they're likely to see a load spike. -Geoff |
|
From: Neal R. <ne...@ri...> - 2002-01-30 01:58:05
|
On Tue, 29 Jan 2002, Geoff Hutchison wrote: > The Retriever class isn't really built around much of anything IMHO. It > requires that documents have a URL and that the URLs can be grouped into > Server objects. True, but the main Start function assumes a spidering approach. What if you just want to index a list of documents already in memory (fetched from another source)? The Start function is cumbersome and there is no clear function that seems to say "here is some data, please index it". At least in my current reading it looks like the core fetching + parsing + indexing + write-to-db process is shared between Start & Retrieved Document. Correct? Also, there are a few features of Retriever that are not useful in other contexts... max_hop_count for instance. Definitely a usable class, but it's overkill for a very basic document whose source is outside the Transport context. > you're talking more about Transport-type concepts. Yes, I was speaking in generalities. I am basically thinking of how HtDig can be used a a general purpose Information Retrieval tool. I probably switched topics a bit there. It's the difference between telling htdig "go over here, fetch the data and index it all by yourself" vs "please index this data as I provide it to you". Using htdig as an 'application' vs 'a text indexing & query component of another system' > Again, I think it's the URL that's the critical point. Otherwise how are > the search results useful? How do you "jump to" a particular result from > the output? For this project, all I really store as a 'URL' is part of the path to an XML file.. so by itself the URL is useless to any transport object. For that matter you could use URL simply as a document-id in another separate system. Integrating the necessary external code to find/fetch, transcode (character set switch), parse via XSLT, etc would require as much coding of a new Transport class (and integration of many external libraries) as it would to: Define a BasicDocument class with no bells and whistles other than a Parser binding. Define a TextCollecter (cousin of Retriever) whose sole job is to facilitate parsing of documents and update the index. No need to make network connections, look at server codes, examine the document for links to other documents, etc. No 'document fetch' loop anywhere. The 'index_doc' routine is called as needed, once per document by an outside piece of code. The file is viewed via another piece of code that loads an XML file, reads the fields contained via a specialized parser and does a kind of Rendering/Formatting to present the information in a specific UI complete with other bells and whistles. Similarly the query process is integrated inside another UI. A Query is received via user input, passed to htdig search APIs and the results are repackaged with in the existing UI. > Your work is appreciated. I'm just trying to point out a few things as > someone who's been around for a while. Great, it's good input.. definitely helpful in understanding htdig and the project team's conceptualization of it. > 2) It's better not to reinvent the wheel. The less code that needs to be > maintained, generally the better. Do we really need new Retriever classes, > or do we need to refactor what we have? It's very powerful now, and very useful in a network centric document environment. At some point the Retriever-as-swiss-army-knife approach can be overly complex. A more basic class for optional use can be good for a narrow set of uses. What it comes down to is that I'm suggesting is that libhtdig.so could use two additional classes that are very basic. These classes aren't really useful to anyone not using htdig as a separate Information Retrieval component of another app. One could make an argument that mifluz could be used directly for this. Very true, but mifluz is a bucket of nice parts. Htdig is a working tool with the wrappers that make mifluz usefull quickly. Thanks. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site |