|
From: Geoff H. <ghu...@us...> - 2002-10-20 07:14:13
|
STATUS of ht://Dig branch 3-2-x
RELEASES:
3.2.0b4: In progress
(mifluz merge essentially finished, contact Geoff for patch to test)
3.2.0b3: Released: 22 Feb 2001.
3.2.0b2: Released: 11 Apr 2000.
3.2.0b1: Released: 4 Feb 2000.
(Please note that everything added here should have a tracker PR# so
we can be sure they're fixed. Geoff is currently trying add PR#s for
what's currently here.)
SHOWSTOPPERS:
* Mifluz database errors are a severe problem (PR#428295)
* The new mifluz merged code is slow.
(no PR, Geoff hasn't added mifluz-merge to CVS yet.)
KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
wordlist_compress set but work fine without wordlist_compress.
(the date is definitely stored correctly, even with compression on
so this must be some sort of weird htsearch bug) PR#618737.
* Not all htsearch input parameters are handled properly: PR#405278. Use a
consistant mapping of input -> config -> template for all inputs where
it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set
correctly: PR#405294. (The documentation for 3.2.0b1 was updated, but can
we fix this?)
(More importantly, do we ever want exact to /not/ be specified?)
* META descriptions are somehow added to the database as FLAG_TITLE,
not FLAG_DESCRIPTION. (PR#618738)
PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)
* Mifluz merge.
NEEDED FEATURES:
* Field-restricted searching. (e.g. PR#460833)
* Return all URLs. (PR#618743)
* Handle noindex_start & noindex_end as string lists.
* Quim's new htsearch/qtest query parser framework.
* File/Database locking. PR#405764.
TESTING:
* httools programs:
(htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
argument handling for parser/converter, allowing binary output from an
external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.
DOCUMENTATION:
* List of supported platforms/compilers is ancient. (PR#405279)
* Add thorough documentation on htsearch restrict/exclude behavior
(including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
to template variables. (Relates to PR#405278.)
Should we make sure these config attributes are all documented in
defaults.cc, even if they're only set by input parameters and never
in the config file?
* Split attrs.html into categories for faster loading.
* Turn defaults.cc into an XML file for generating documentation and
defaults.cc.
* require.html is not updated to list new features and disk space
requirements of 3.2.x (e.g. phrase searching, regex matching,
external parsers and transport methods, database compression.)
PRs# 405280 #405281.
* TODO.html has not been updated for current TODO list and
completions.
* Htfuzzy could use more documentation on what each fuzzy algorithm
does. PR#405714.
* Document the list of all installed files and default
locations. PR#405715.
OTHER ISSUES:
* Can htsearch actually search while an index is being created?
(Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch. PR#405765.
* URL.cc tries to parse malformed URLs (which can cause further problems)
(It should probably just set everything to empty).
|
|
From: Brian W. <bw...@st...> - 2002-10-22 07:38:54
Attachments:
DefaultsXML.tar.gz
|
Ok. I have the first cut of the defaults.xml patch.
It is all in the attached file - it is patched against
htdig-3.2.0b4-20021013.
The remainder of this email is the README file from the
tar file.
Regs
Brian White
===============================================================
Documentation attached to initial version of defaults.xml patch
===============================================================
1. Overview of what it does
====================================
* Adds defaults.xml and defaults.dtd
plus manage_attributes.pl for managing access
to them
* Addition of make_defaults_cc.pl for creating
defaults.cc
* Complete rewrite of the cf_generate.pl that
creates attrs.html, cf_byprog.html and
cf_byname.html
* Reducing the size of the ConfigDefaults
structure to just have "name" and "value"
The version of defaults.xml as it exists in
this patch is valid against defaults.dtd.
The patch is done against htdig-3.2.0b4-20021013
2. Affected Files
====================================
New Files:
* htcommon/defaults.dtd
* htcommon/manage_attributes.pl
* htcommon/make_defaults_cc.pl
Replaced Files:
* htcommon/defaults.xml
* htdoc/cf_generate.pl
(Note that most of the patches are only 1 or 2 lines
- the biggest is probably about 10 )
Patched Files:
* htcommon/Makefile.am.patch
* htdoc/Makefile.am.patch
* htdoc/attrs_head.html.patch
* htdoc/attrs_tail.html.patch
* htdoc/cf_byname_head.html.patch
* htdoc/cf_byprog_head.html.patch
* htlib/Configuration.h.patch
* htdb/htdb_dump.cc.patch
* htdb/htdb_load.cc.patch
* htdb/htdb_stat.cc.patch
Files to removed from CVS
* defaults.cc
3. Creating Descriptions
====================================
The description is essentially a html
snippet, with the following differences
* It is limited to p,br,ol,ul,table,em,
strong,code and a elements, with
two additions:
1) <!ELEMENT codeblock (%paratext;)* >
This is used to provide block code or html
snippets. An example of this would be
<codeblock>
<SELECT NAME="search_algorithm">
<OPTION VALUE="exact:1 prefix:0.6 synonyms:0.5 endings:0.1"
SELECTED>fuzzy
<OPTION VALUE="exact:1">exact
</SELECT>
</codeblock>
2) <!ELEMENT ref (#PCDATA) >
<!ATTLIST ref type (program|attr|faq) #REQUIRED >
This is used to link to programs, faqs and other attributes.
Some examples are:
<ref type="attr">build_select_lists</ref>
<ref type="program">htdig</ref>
<ref type="faq">4.1</ref>
The purpose of doing this is to allow the info
to be reused, and remove the dependency
on html files in a particular place.
* The only allowed attributes in the description
are:
table : border, width
td,th : align, valign, rowspan, colspan
dl : compact="true"
4. A Discussion of XML Validation
====================================
Ideally the code should validate the XML
against the DTD, and should check for
well formedness. Unfortunately that requires
an XML parser, and I did not want to add
an extra dependency at this stage!
What I did as a compromise was to create
an API that is used to load and then
query the XML data - this API is
documented and implemented in
htcommon/manage_attributes.pl.
At the moment the internal data structures
are populated using standard perl pattern
matching - it *assumes* that defaults.xml
is valid against defaults.dtd, but is does
not test it.
The aim is that when an XML parser is
readily available, that can be used
to populate the internal data structures -
and everything else should just work!
5. The Current state of defaults.xml
====================================
The version of defaults.xml that is
presented in this patch is valid
against defaults.dtd, but is
desperately in need of a cleanup.
However:
a) I don't have time to clean it up at the moment
b) It is currently completely generated from the old
defaults.cc, which makes it easier to adjust
until it's form becomes stable
What I would like to do is get the patch
in place and once it has stablized
embark on a bit of a cleanup
6. Possible Issues
====================================
* Examples are just entered as values
- the "name : " before are now
automatically generated. This may seem
limiting, but it is exactly what is in
the docs at the moment.
* There are a few remaining links to
other parts of the documentation that
I have left a <a> elements, simply
because I couldn't see an obvious way
to include them
|