From: Geoff H. <ghu...@us...> - 2001-11-25 08:13:20
|
STATUS of ht://Dig branch 3-2-x RELEASES: 3.2.0b4: In progress 3.2.0b3: Released: 22 Feb 2001. 3.2.0b2: Released: 11 Apr 2000. 3.2.0b1: Released: 4 Feb 2000. SHOWSTOPPERS: KNOWN BUGS: * Odd behavior with $(MODIFIED) and scores not working with wordlist_compress set but work fine without wordlist_compress. (the date is definitely stored correctly, even with compression on so this must be some sort of weird htsearch bug) * Not all htsearch input parameters are handled properly: PR#648. Use a consistant mapping of input -> config -> template for all inputs where it makes sense to do so (everything but "config" and "words"?). * If exact isn't specified in the search_algorithms, $(WORDS) is not set correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can we fix this?) * META descriptions are somehow added to the database as FLAG_TITLE, not FLAG_DESCRIPTION. (PR#859) PENDING PATCHES (available but need work): * Additional support for Win32. * Memory improvements to htmerge. (Backed out b/c htword API changed.) * MySQL patches to 3.1.x to be forward-ported and cleaned up. (Should really only attempt to use SQL for doc_db and related, not word_db) NEEDED FEATURES: * Field-restricted searching. * Return all URLs. * Handle noindex_start & noindex_end as string lists. * Handle local_urls through file:// handler, for mime.types support. * Handle directory redirects in RetrieveLocal. * Merge with mifluz TESTING: * httools programs: (htload a test file, check a few characteristics, htdump and compare) * Turn on URL parser test as part of test suite. * htsearch phrase support tests * Tests for new config file parser * Duplicate document detection while indexing * Major revisions to ExternalParser.cc, including fork/exec instead of popen, argument handling for parser/converter, allowing binary output from an external converter. * ExternalTransport needs testing of changes similar to ExternalParser. DOCUMENTATION: * List of supported platforms/compilers is ancient. * Add thorough documentation on htsearch restrict/exclude behavior (including '|' and regex). * Document all of htsearch's mappings of input parameters to config attributes to template variables. (Relates to PR#648.) Also make sure these config attributes are all documented in defaults.cc, even if they're only set by input parameters and never in the config file. * Split attrs.html into categories for faster loading. * require.html is not updated to list new features and disk space requirements of 3.2.x (e.g. phrase searching, regex matching, external parsers and transport methods, database compression.) * TODO.html has not been updated for current TODO list and completions. OTHER ISSUES: * Can htsearch actually search while an index is being created? (Does Loic's new database code make this work?) * The code needs a security audit, esp. htsearch * URL.cc tries to parse malformed URLs (which causes further problems) (It should probably just set everything to empty) This relates to PR#348. |
From: Joe R. J. <jj...@cl...> - 2001-11-25 20:54:28
|
Hi Geoff, According to the ChangeLog file this snapshot was last changed on November 3, but Gilles indicated last week that he had committed several fixes and features to the CVS tree. Any ideas? Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |
From: Gilles D. <gr...@sc...> - 2001-11-27 23:07:50
|
According to Joe R. Jah: > According to the ChangeLog file this snapshot was last changed on November > 3, but Gilles indicated last week that he had committed several fixes and > features to the CVS tree. Any ideas? Arrggh! Something has gone wrong with the snapshot script, obviously. I suspected something was up last week when we got a few complaints about the 3.1.6 snapshot needing autoconf to build, so I knew there was a problem with some file times. It seems now that it's not getting its CVS updates correctly. The patch below will get you up to date. (Use patch -p1 for this one.) diff -rup htdig-3.1.6-112501/ChangeLog htdig-3.1.6/ChangeLog --- htdig-3.1.6-112501/ChangeLog Sun Nov 11 03:17:22 2001 +++ htdig-3.1.6/ChangeLog Wed Nov 21 12:55:12 2001 @@ -1,3 +1,27 @@ +Wed Nov 21 12:54:42 2001 Gilles Detillieux <gr...@sc...> + + * htdoc/rundig.html: Added note about effect of changing database_base. + + * htmerge/docs.cc (convertDocs): Changed confusing message about + total doc db size in stats. + +Wed Nov 21 11:37:52 2001 Gilles Detillieux <gr...@sc...> + + * htsearch/TemplateList.cc (createFromString), htdoc/attrs.html: + Treat template_map as a _quoted_ string list. Change <i> tags to + the HTML-4.0 compliant <em> tags in builtin-long template. + +Tue Nov 20 17:13:27 2001 Gilles Detillieux <gr...@sc...> + + * htlib/String.cc (String, append, sub): Added checks for negative + lengths or start position to make code more fault-tolerant. + +Tue Nov 20 16:37:26 2001 Gilles Detillieux <gr...@sc...> + + * htfuzzy/Synonym.cc (createDB): Check for lines with less than + 2 words, to avoid segfault caused by calling Database::Put() with + negative length for data field. + Sat Nov 3 23:55:00 2001 Geoff Hutchison <ghu...@ws...> * htlib/htString.h: Add #include for ostream.h to solve compile diff -rup htdig-3.1.6-112501/htdoc/attrs.html htdig-3.1.6/htdoc/attrs.html --- htdig-3.1.6-112501/htdoc/attrs.html Sun Nov 4 03:17:19 2001 +++ htdig-3.1.6/htdoc/attrs.html Wed Nov 21 11:33:21 2001 @@ -7624,7 +7624,7 @@ <em>type:</em> </dt> <dd> - string list + quoted string list </dd> <dt> <em>used by:</em> @@ -8800,7 +8800,7 @@ </dl> <hr size="4" noshade> <!-- hhmts start --> -Last modified: $Date: 2001/11/02 18:29:55 $ +Last modified: $Date: 2001/11/21 17:33:20 $ <!-- hhmts end --> </body> </html> diff -rup htdig-3.1.6-112501/htdoc/rundig.html htdig-3.1.6/htdoc/rundig.html --- htdig-3.1.6-112501/htdoc/rundig.html Tue Sep 18 10:53:21 2001 +++ htdig-3.1.6/htdoc/rundig.html Wed Nov 21 12:55:25 2001 @@ -155,7 +155,10 @@ <a href="attrs.html#database_dir">database_dir</a> or <a href="attrs.html#common_dir">common_dir</a> attributes (you'll need to make the corresponding changes to the DBDIR - and COMMONDIR variables in the script), if you decide to + and COMMONDIR variables in the script), if you change the + <a href="attrs.html#database_base">database_base</a> + attribute (there's and embedded "db." filename in the + script), if you decide to use other fuzzy algorithms that need their own databases rebuilt, or if you change the names of the endings or synonyms databases or source files. Before customizing the @@ -181,7 +184,7 @@ </dl> <hr size="4" noshade> - Last modified: $Date: 2001/09/18 15:53:21 $ + Last modified: $Date: 2001/11/21 18:55:25 $ <br> <a href="http://sourceforge.net/"> <img src="http://sourceforge.net/sflogo.php?group_id=4593&type=1" width="88" height="31" border="0" alt="SourceForge Logo"></a> diff -rup htdig-3.1.6-112501/htfuzzy/Synonym.cc htdig-3.1.6/htfuzzy/Synonym.cc --- htdig-3.1.6-112501/htfuzzy/Synonym.cc Wed Mar 31 15:25:12 1999 +++ htdig-3.1.6/htfuzzy/Synonym.cc Tue Nov 20 16:42:24 2001 @@ -5,7 +5,7 @@ // // #if RELEASE -static char RCSid[] = "$Id: Synonym.cc,v 1.3.2.2 1999/03/31 21:25:12 grdetil Exp $"; +static char RCSid[] = "$Id: Synonym.cc,v 1.3.2.3 2001/11/20 22:42:25 grdetil Exp $"; #endif #include "Synonym.h" @@ -74,6 +74,16 @@ Synonym::createDB(Configuration &config) while (fgets(input, sizeof(input), fl)) { StringList sl(input, " \t\r\n"); + if (sl.Count() < 2) + { + if (debug) + { + cout << "htfuzzy/synonyms: Rejected line with less than 2 words: " + << input << endl; + cout.flush(); + } + continue; + } for (int i = 0; i < sl.Count(); i++) { data = 0; diff -rup htdig-3.1.6-112501/htlib/String.cc htdig-3.1.6/htlib/String.cc --- htdig-3.1.6-112501/htlib/String.cc Thu Jul 5 11:26:35 2001 +++ htdig-3.1.6/htlib/String.cc Tue Nov 20 17:15:31 2001 @@ -1,7 +1,7 @@ // // Implementation of String class // -// $Id: String.cc,v 1.16.2.5 2001/07/05 16:26:35 ghutchis Exp $ +// $Id: String.cc,v 1.16.2.6 2001/11/20 23:15:32 grdetil Exp $ // // Part of the ht://Dig package <http://www.htdig.org/> // Copyright (c) 1995-2001 The ht://Dig Group @@ -10,7 +10,7 @@ // <http://www.gnu.org/copyleft/gpl.html> // #if RELEASE -static char RCSid[] = "$Id: String.cc,v 1.16.2.5 2001/07/05 16:26:35 ghutchis Exp $"; +static char RCSid[] = "$Id: String.cc,v 1.16.2.6 2001/11/20 23:15:32 grdetil Exp $"; #endif @@ -61,7 +61,7 @@ String::String(char *s, int len) { Allocated = 0; Length = 0; - if (s && len != 0) + if (s && len > 0) copy(s, len, len); } @@ -143,7 +143,7 @@ void String::append(char *s) void String::append(char *s, int slen) { - if (!s || !slen) + if (!s || slen <= 0) return; // if ( slen == 1 ) @@ -258,7 +258,7 @@ int String::as_integer(int def) String String::sub(int start, int len) const { - if (start > Length) + if (start > Length || start < 0 || len < 0) return 0; if (len > Length - start) diff -rup htdig-3.1.6-112501/htmerge/docs.cc htdig-3.1.6/htmerge/docs.cc --- htdig-3.1.6-112501/htmerge/docs.cc Mon Mar 22 17:39:30 1999 +++ htdig-3.1.6/htmerge/docs.cc Wed Nov 21 12:50:50 2001 @@ -3,7 +3,7 @@ // // Indexing the "doc_db" database by id-number in "doc_index". // -// $Id: docs.cc,v 1.14.2.2 1999/03/22 23:39:30 grdetil Exp $ +// $Id: docs.cc,v 1.14.2.3 2001/11/21 18:50:50 grdetil Exp $ // // @@ -106,7 +106,7 @@ convertDocs(char *doc_db, char *doc_inde if (stats) { cout << "htmerge: Total documents: " << document_count << endl; - cout << "htmerge: Total doc db size (in K): "; + cout << "htmerge: Total size of documents (in K): "; cout << docdb_size / 1024 << endl; } diff -rup htdig-3.1.6-112501/htsearch/TemplateList.cc htdig-3.1.6/htsearch/TemplateList.cc --- htdig-3.1.6-112501/htsearch/TemplateList.cc Thu Feb 17 14:46:13 2000 +++ htdig-3.1.6/htsearch/TemplateList.cc Wed Nov 21 11:40:44 2001 @@ -1,50 +1,23 @@ // // TemplateList.cc // -// Implementation of TemplateList -// -// $Log: TemplateList.cc,v $ -// Revision 1.4.2.3 2000/02/17 20:46:13 grdetil -// * installdir/htdig.conf: quote all HTML tag parameters. -// * htsearch/TemplateList.cc (createFromString), installdir/long.html, -// installdir/short.html: Use $&(URL) in templates. -// -// Revision 1.4.2.2 2000/02/17 16:49:48 grdetil -// silly little typo. -// -// Revision 1.4.2.1 2000/02/17 16:46:26 grdetil -// [ Improve htsearch's HTML 4.0 compliance ] -// * htsearch/TemplateList.cc (createFromString): Use file name rather -// than internal name to select builtin-* templates, use $&(TITLE) in -// templates and quote HTML tag parameters. -// * installdir/long.html, installdir/short.html: Use $&(TITLE) in -// templates and quote HTML tag parameters. -// * htsearch/Display.cc (setVariables): quote all HTML tag parameters -// in generated select lists. -// * installdir/footer.html, installdir/header.html, -// installdir/nomatch.html, installdir/search.html, -// installdir/syntax.html, installdir/wrapper.html: -// Use $&(var) where appropriate, and quote HTML tag parameters. -// -// Revision 1.4 1999/01/17 20:29:37 ghutchis -// Ensure template_map config has three members for each template we add, -// contributed by <tl...@mb...>. -// -// Revision 1.3 1998/09/10 04:16:26 ghutchis -// -// More bug fixes. -// -// Revision 1.1 1997/02/03 17:11:05 turtle -// Initial revision -// +// TemplateList: As it sounds--a list of search result templates. Reads the +// configuration and any template files from disk, then retrieves +// the relevant template for display. +// +// Part of the ht://Dig package <http://www.htdig.org/> +// Copyright (c) 1995-2001 The ht://Dig Group +// For copyright details, see the file COPYING in your distribution +// or the GNU Public License version 2 or later +// <http://www.gnu.org/copyleft/gpl.html> // #if RELEASE -static char RCSid[] = "$Id: TemplateList.cc,v 1.4.2.3 2000/02/17 20:46:13 grdetil Exp $"; +static char RCSid[] = "$Id: TemplateList.cc,v 1.4.2.5 2001/11/21 17:40:45 grdetil Exp $"; #endif #include "TemplateList.h" -#include <URL.h> -#include <StringList.h> +#include "URL.h" +#include "QuotedStringList.h" //***************************************************************************** TemplateList::TemplateList() @@ -86,7 +59,7 @@ TemplateList::get(char *internalName) int TemplateList::createFromString(char *str) { - StringList sl(str, "\t \r\n"); + QuotedStringList sl(str, "\t \r\n"); String display, internal, file; Template *t; @@ -109,7 +82,7 @@ TemplateList::createFromString(char *str s << "<dl><dt><strong><a href=\"$&(URL)\">$&(TITLE)</a></strong>"; s << "$(STARSLEFT)\n"; s << "</dt><dd>$(EXCERPT)<br>\n"; - s << "<i><a href=\"$&(URL)\">$&(URL)</a></i>\n"; + s << "<em><a href=\"$&(URL)\">$&(URL)</a></em>\n"; s << " <font size=\"-1\">$(MODIFIED), $(SIZE) bytes</font>\n"; s << "</dd></dl>\n"; t->setMatchTemplate(s); -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Joe R. J. <jj...@cl...> - 2001-11-28 06:59:48
|
On Tue, 27 Nov 2001, Gilles Detillieux wrote: > Date: Tue, 27 Nov 2001 17:07:38 -0600 (CST) > From: Gilles Detillieux <gr...@sc...> > To: Joe R. Jah <jj...@cl...> > Cc: Geoff Hutchison <ghu...@us...>, htd...@li... > Subject: Re: [htdig-dev] Current Status as of snapshot 3.1.6-112501 > > Arrggh! Something has gone wrong with the snapshot script, obviously. > I suspected something was up last week when we got a few complaints > about the 3.1.6 snapshot needing autoconf to build, so I knew there was > a problem with some file times. It seems now that it's not getting > its CVS updates correctly. The patch below will get you up to date. > (Use patch -p1 for this one.) Thank you. It is in the patch archives for redundancy:) ftp://ftp.ccsf.org/htdig-patches/3.1.6/111101-112501 Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |