htsearch

ht://Dig Copyright © 1995-2004 The ht://Dig Group
Please see the file COPYING for license information.


HTML Form

The primary interface to htsearch is through an HTML form. When the form is submitted, the htsearch program will take values from the form and perform the actual search. The search can be modified in many ways with either hidden input fields or other HTML form tags. Study the examples to get a feel of what things are possible.

The HTML form is expected to contain at least an input text field named words. This is where the user will enter the search words. Other values are also recognized but have appropriate defaults in case they are not used:

config
Specifies the name of the configuration file. The name here is the name without the path and without the .conf at the end. This file is assumed to be located in the CONFIG_DIR directory. Periods are not allowed in this field for security reasons (to prevent HTML authors from pointing all around at your files).
The default is htdig
exclude
This value is a pattern that specifies which URLs are to be excluded from the search results. If a URL matches one of these patterns it is discarded. Multiple patterns can be given, separated by a bar ("|"), or multiple definitions of the exclude input parameter can be given. This pattern may include regular expressions when enclosed within [ and ] characters.
The default is specified by the exclude attribute in the configuration file.
format
This specifies the name of the template to display the search results in. There are two builtin templates named builtin-long and builtin-short which can be used, but any number of custom templates can also be defined. Find out more about the templates in the Output Templates section.
The format value can be specified as either a hidden input field or a drop down menu.
The default is specified by the template_name attribute in the configuration file, and the template variable is SELECTED_FORMAT.
keywords
Used to specify a list of required words that have to be in the documents. This list of words is added to the normal words value using logical "and"s, or logical "or"s if the any_keywords attribute is set to true in the configuration file.
An example use for this value is to make it a drop down menu with a limited set of predetermined categories or keywords to restrict the search. This can be very useful for very structured pages.
Note that the words may appear anywhere in the document. The scope of these required words is not limited to words in META tags with the "keywords" or "htdig-keywords" property, despite what the parameter name may suggest.
The default is specified by the keywords attribute in the configuration file.
matchesperpage
Specifies how many matches will be displayed on each page of results.
The default is specified by the matches_per_page attribute in the configuration file, and the template variable is MATCHES_PER_PAGE. Since this value has to be a number, it either needs to be set using a hidden input field or a with a drop down menu.
method
This can be one of and, or, or boolean. It determines what type of search will be performed.
The default is specified by the match_method attribute in the configuration file and the template variable is SELECTED_METHOD. It is quite useful to make this item a drop down menu so the user can select the type of search at search time.
page
This should normally not be used. It is generated by the paged results display.
restrict
This value is a pattern that all URLs of the search results will have to match. This can be used to restrict the search to a particular subtree or subsection of a bigger database. Multiple patterns can be given, separated by a bar ("|"), or multiple definitions of the restrict input parameter can be given. Any URL in the search results will have to match at least one of these patterns. The pattern may include regular expressions when the expression is enclosed by [ and ] characters.
Note that the restrict list does not take precedence over the exclude list - if a URL matches patterns in both lists it is still excluded from the search results.
The default is specified by the restrict attribute in the configuration file.
sort
This can be one of score, time, date, title, revscore, revtime, revdate, or revtitle. It determines what type of sort will be performed on the search results. The types time and date are synonymous, as are revtime and revdate, as all four sort on the time that the documents were last modified, if this information is given by the server. The sort methods that begin with rev simply reverse the order of the sort.
The default is specified by the sort attribute in the configuration file, and the template variable is SELECTED_SORT. It is quite useful to make this item a drop down menu so the user can select the type of sort at search time.
startyear, startmonth, startday, endyear, endmonth, endday
These values specify the allowed range of document modification dates allowed in the search results. They can be used to restrict the search to particular "ages" of documents, new or old.
If the year is specified by two digits (e.g. 02), then it is assumed to be in the 1900s if it is in the range 70-99, and in the 2000s if it is in the range 00-69. If the year is not specified, the search does not exclude documents outside the range of dates within the year. Thus it is impossible, for example, to restrict a search to documents dated "December".
Incompletely specified end dates are interpreted as follows:
Date Becomes
04-31 04-31- end of time
05-199905-31-1999
1999 12-31-1999

The default is the full range of documents in the database. These values can also be specified by configuration attributes of the same names in the configuration file. If a negative number is given for any of these, it is taken as relative to the current year, month or day. This can be one of score, time, date,
words
This is space-separated list of words to search for. If the method is "and" or "or", then htsearch will search for documents which contain all of the words, or any of the words, respectively. As of version 3.2, strings of words enclosed in double-quotes constitute a phrase, which must occur consecutively in the document.
If the method is "boolean", then words specifies a logical expression consisting of words separated by "and", "or" or "not". (These separators can be changed by specifying boolean_keywords.) The separators "+" and "-" are synonyms of "and" and "not". The separator "not" means "but not", so the query elephant not grey will return documents containing the word "elephant" but not the word "grey". Sub-expressions may be enclosed in parentheses, as elephant not (grey or white).

Last modified: $Date: 2004/05/28 13:15:18 $