From: Gilles D. <gr...@sc...> - 2002-10-04 20:40:13
|
According to Lachlan Andrew: > I've almost finished some patches trying to address the > "htsearch input parameters issue" (below). >=20 > I've updated defaults.cc to list all variables used by=20 > any of the programs (according to "grep config"), and=20 > described them as best I can. Where they are different,=20 > the input parameters to htsearch are also listed, and=20 > cross-referenced in both directions. Since no information=20 > is better than mis-information, there are some '??'s and=20 > 'TO BE COMPLETED's. Some entries which were out of=20 > alphabetical order have also been relocated. This patch is=20 > at > http://www.ee.mu.oz.au/staff/lha/pub/patch.defaults I agree with most of the changes in this patch. Good job! To answer some of your questions, here are a few points to clarify things. The distinction between "number" and "integer" attribute types is supposed to be that an attribute labeled "number" can be floating point. However, I think in practice a lot of these are actually supposed to be integer-only. I think we'd need to check over how all attributes are used and label them consistently. The Block (Global, Server, URL) field indicates whether an attribute can be set globally only, or if it can be overridden with a different value in server blocks or URL blocks. See http://www.htdig.org/dev/htdig-3.2/cf_blocks.html The code support for author_factor, caps_factor, and url_text_factor is not complete, so I assume this is why the attributes weren't in defaults.cc. They're implemented in htsearch, but nothing in htdig tags words with their corresponding flag values yet. The remove_default_doc attribute should apply to https:// URLs as well as http:// ones. If it doesn't right now, I'd consider that a bug. The keywords and endday, startday et al. are config attributes that can be overridden by CGI input parameters, so they're not really CGI input only. All except keywords are documented for 3.1.6, in http://www.htdig.org/attrs.html, if you want more complete descriptions. (Support for negative numbers hasn't been added to 3.2 yet, but it will be before 3.2.0b4 goes out.) The format, matchesperpage, method and page CGI input parameters have been around from the beginning, I think, but they are CGI input only, not config attributes. The config CGI input parameter is most definitely CGI only. It wouldn't make sense to specify the config file name in a config file, would it? All this raises the question of whether we should be listing CGI input parameters in attrs.html (which is generated from defaults.cc). To me, that would tend to blur the distinction between the two. I know that in many cases, a CGI input parameter and a config attribute of the same name exist (and those config attributes should be documented), but I think it would confuse the issue if we listed CGI-only parameter names here. CGI input parameters are listed in http://www.htdig.org/hts_form.html What to other developers think about this? I'll hold off on committing this patch until this question is resolved. (Otherwise, you just know that some Linux distribution will snatch up that snapshot and we'd be hounded for a year with questions about why such and such an attribute doesn't work in the config file. :-P ) > A second patch at > http://www.ee.mu.oz.au/staff/lha/pub/patch.inputs > makes htsearch scan the existing config parameters, and=20 > overwrites them if they are given on the command line. It=20 > also has the #ifdef option of checking that there are no=20 > invalid (hence ignored) command line arguments. (This=20 > needs cgi.h to include "Dictionary.h", and I don't know=20 > how the make procedure handles dependencies, so it is=20 > disabled by default.) >=20 > Before I start testing, could you please confirm that these=20 > are on the right track? This second patch is a pretty dangerous one! The whole reason for the allow_in_form is to let you define, in a controlled manner, which attributes can be overridden by CGI input parameters (beyond those which htsearch already does by default). If I read your patch correctly, it will allow ANY config attribute to be overridden by a CGI input parameter. E.g.: http://my.victim.com/cgi-bin/htsearch?nothing_found_file=3D/etc/passwd > Finally, the "current status" emails refer to problems=20 > numbers which don't match the SourceForge problem numbers. =20 > Where can I find the original numbering? These PR# style bug numbers are from our old bug tracking database, prior to our move to SourceForge, and I don't think that database is accessible anywhere anymore. At the time of the move, I think Geoff created new bug tracking entries for old bug reports that were still opened, so the STATUS file should be updated to reflect the new numbers. > > * Not all htsearch input parameters are handled properly:=20 > > PR#648. Use a > > =A0 =A0consistant mapping of input -> config -> template for=20 > > all inputs where > > =A0 =A0it makes sense to do so (everything but "config" and =20 > > "words"?). > >=20 > > * Document all of htsearch's mappings of input parameters=20 > > to config attributes > > =A0 =A0to template variables. (Relates to PR#648.) Also make=20 > > sure these config > > =A0 =A0attributes are all documented in defaults.cc, even if=20 > > they're only set by > > =A0 =A0input parameters and never in the config file. The original PR#648 referred to the keywords input parameter, which couldn't be set to a default value by a config attribute prior to 3.1.4. So, the original bug has been fixed (and likely closed), but in the bug database comments I had suggested systematically going through all CGI input parameters and making sure htsearch handles them all consistently wherever appropriate. I.e. unless there's a reason not to, a pre-defined CGI input parameter should have a corresponding attribute that it overrides, and this attribute value should make its way into a template variable. Also, any pre-defined CGI input parameter should be processed by Display::createURL(). Likely the only ones that shouldn't be done this way are page and config. I think we're mostly there now, bu= t there may be a few stragglers left, both in the code and the documentatio= n (definitely in the latter). Note that I stress "pre-defined" CGI input parameters. You can't allow a user to use any old attribute name as an input parameter, and have that take precedence! Even allow_in_form must be used very carefully to avoid opening up big security holes (see myvictim.com URL above). It shouldn't be used for any attribute that defines part or all of a file name. The config input parameter is checked for pathname components= , but none of the other input parameters are. --=20 Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |