From: Gilles D. <gr...@sc...> - 2001-10-25 19:02:27
|
According to Florian Hars: > On Wed, Oct 24, 2001 at 05:25:23PM -0500, Gilles Detillieux wrote: > > According to Florian Hars: > > > Things like STARSLEFT are totally different, they do not use client > > > supplied information and so are not vulnerable to cross site scripting > > > attacs. WORDS is. > > > > This is the main point I was trying to get across. > > Well, actually, no. Otherwise you wouldn't suggest to treat client-supplied > and server-supplied information in the same way: Well, I certainly don't recall ever suggesting such a thing. The fact that htsearch 3.1.4 and older did treat them the same way was indeed a problem. I was among those who first recognised it as a problem and I am the one extended htsearch's template handling so that you could treat them in different ways. I also fixed all the default templates to use the new syntax where appropriate. I would add further, though, that while client-supplied information may be tainted, it's also possible for server-supplied information to contain characters that need to be SGML-encoded. Examples of this are the URL and TITLE variables. Some server-supplied information might even be deliberately tainted, like the new METADESCRIPTION variable. (This is only an issue if you index untrusted sites, but some do.) However, the fact still remains that some internally-generated template variables cannot be SGML-encoded. Different template variables should be handled differently. That's been my point throughout this thread, whether I expressed it clearly enough before or not. We're not in disagreement on this point. The point on which we differ is how we should respond to the problem of users who still use old templates that don't handle variables differently or appropriately, or perhaps worse yet, base new templates on old, insecure templates without bothering to inform themselves about the risks. The two approaches suggested so far are: 1) (yours) Force the issue by making subsequent releases of htsearch always SGML-encode any template variable when the $(var) syntax is used. Presumably this would also involve the addition of a new syntax element for getting an unencoded variable out. The problem with this approach, as I pointed out, is... > > If we changed the behaviour of $(var) to SGML encode everything, > > it MIGHT make every exisiting template out there more secure, but it > > would almost CERTAINLY make them all unusable. I have a pretty good feel from experience about the volume of mail on the list this would generate, to say nothing of all the mail/pleas for help/flames Geoff and I would receive privately, if we adopted this approach. 2) (mine) Maintain the status quo. htsearch is now secure as distributed, so the problem only affects users who don't update their template files when updating htsearch. This would leave a lot of insecure existing htsearch implementations out there, but then there are still a lot of pre-3.1.5 htsearch implementations out there, after over a year and a half, which are far, far more insecure. In order to encourage users to update their templates, we can follow your very good suggestion... > The easiest fix would probably to document the current behaviour > appropiately, i.e. put a warning into the description of every template > variable that might contain tainted client-supplied information and should > never be used unencoded. This would certainly help the minority who actually reads the documentation (please pardon my cynicism) by stating the risks more clearly than they are now. > This will mostly be WORDS (LOGICAL_WORDS and KEYWORDS might already be > sanitized, I haven't looked at the source to verify this), and depending > on whether you can trust the sites you are indexing the variables that > display part of the indexed pages (but these look like they are already > transformed to pure text, and so not vulnerable). I think LOGICAL_WORDS is somewhat sanitized, but still it's not hurt by SGML-encoding. Some data from indexed pages should also be encoded. My "hit list" of template variables which should be SGML-encoded is: CONFIG, EXCLUDE, RESTRICT, WORDS, LOGICAL_WORDS, KEYWORDS, TITLE, URL, ANCHOR, METADESCRIPTION, DESCRIPTIONS, DESCRIPTION, SELECTED_FORMAT, SELECTED_METHOD, SELECTED_SORT. After giving it more thought, it occurred to me that as long as we're putting together a hit list like this for the documentation, we could do one better and put the hit list right in htsearch. This would lead to a third approach... 3) (the compromise kludge) Make htsearch check template variable names against the hit list above (using StringMatch) to force it to SGML-encode these variables even if the $(var) syntax is used. We'd probably still want to introduce a new syntax element to force unencoded output. (Suggestions?) I know this hit list approach is kludgy, but it does fairly neatly solve the security problems in old templates without breaking them altogether. I realize that any template variable generated by allow_in_form would potentially also be tainted, so it would also have to go on the hit list. The alternative is to make a "safe list" of variables that can't be SGML-encoded by default, but then we'd have to add any template variable generated by build_select_lists to that list. So, I put this before anyone on the developer's list who's still paying attention. What do you think? Should we take this third approach? Can you think of anything it might break? Is the hit list complete enough or are there some I missed? What should the syntax be for forcing an unencoded variable output? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |