From: Florian H. <fl...@ha...> - 2001-10-15 16:46:04
|
I just sent this to bugtraq: In Fri, Oct 12, 2001 at 12:59:13PM -0600, Dave Ahmad wrote: > On Thu, 11 Oct 2001, bugtraq wrote: > > http://www.perl.com/search/index.ncsp?sp-q=%3C%69%6D%67%20%73%72%63%3D%68%74%74%70%3A%2F%2F%31%39%39%2E%31%32%35%2E%38%35%2E%34%36%2F%74%69%6D%65%2E%6A%70%67%3E > Does anyone know which search engine software this is? I don't know which engine perl.com uses, but if you have the template parameter WORDS in you templates, htdig 3.1.5 puts the unquoted img-tag into the result page. Funnily enough, the htdig 3.1.5 on htdig.org encodes the offending string in <input type="text" size="30" name="words" value="<img src=http://199.125.85.46/time.jpg>"> while the distributed htdig 3.1.5 (here the debian-version 3.1.5-2) doesn't: <input type="text" size="30" name="words" value="<img src=http://199.125.85.46/time.jpg>"> (And there is neither a security section on htdig.org nor an email address for bug reports... so I am crossposting this to htdig-general) Yours, Florian Hars. |
From: Gilles D. <gr...@sc...> - 2001-10-18 18:13:46
|
According to Florian Hars: > I just sent this to bugtraq: > > In Fri, Oct 12, 2001 at 12:59:13PM -0600, Dave Ahmad wrote: > > On Thu, 11 Oct 2001, bugtraq wrote: > > > http://www.perl.com/search/index.ncsp?sp-q=%3C%69%6D%67%20%73%72%63%3D%68%74%74%70%3A%2F%2F%31%39%39%2E%31%32%35%2E%38%35%2E%34%36%2F%74%69%6D%65%2E%6A%70%67%3E > > > Does anyone know which search engine software this is? Doesn't LOOK like ht://Dig, but it can be hard to tell with the wrappers some people use. In any case, it would seem they resolved the problem on their site. > I don't know which engine perl.com uses, but if you have the template > parameter WORDS in you templates, htdig 3.1.5 puts the unquoted img-tag > into the result page. > > Funnily enough, the htdig 3.1.5 on htdig.org encodes the offending string > in > <input type="text" size="30" name="words" value="<img src=http://199.125.85.46/time.jpg>"> > > while the distributed htdig 3.1.5 (here the debian-version 3.1.5-2) doesn't: > > <input type="text" size="30" name="words" value="<img src=http://199.125.85.46/time.jpg>"> It all depends on whether the "words" input field in your followup search forms (template files header.html, nomatch.html, ...) use: <input type="text" size="30" name="words" value="$&(WORDS)"> or the older (pre-3.1.5) syntax: <input type="text" size="30" name="words" value="$(WORDS)"> The added "&" after the "$" in 3.1.5 template files causes the template variable to be SGML-encoded. I suspect that the debian release of htdig didn't bother updating the template files it installs, but instead installs something they customized from an earlier version of htdig. That's out of our hands, so you should report this to the Debian folks. > (And there is neither a security section on htdig.org nor an email address > for bug reports... so I am crossposting this to htdig-general) Yes, we had talked about adding a security section, but no one stepped forward to help write it. E-mailing bug reports to htdig-general is just fine by me, because most of the "bugs" reported on ht://Dig's SourceForge bug tracking system end up being configuration problems or things that have been fixed a long time ago. Both of these are easier to discuss on the mailing list. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Florian H. <fl...@ha...> - 2001-10-19 07:16:12
|
On Thu, Oct 18, 2001 at 01:13:38PM -0500, Gilles Detillieux wrote: > The added "&" after the "$" in 3.1.5 template files causes the template > variable to be SGML-encoded. I suspect that the debian release of > htdig didn't bother updating the template files it installs, but instead > installs something they customized from an earlier version of htdig. No, the files in debian contain properly escaped substitutions, it were just my templates that had the problem. Still, I think it is a design error to make the default syntax for variable substitution (which is the same every other program uses) insecure. You should have to take additional steps if you want insecure behaviour, not if you want secure behaviour. Yours, Florian Hars. |
From: Gilles D. <gr...@sc...> - 2001-10-19 14:22:08
|
According to Florian Hars: > On Thu, Oct 18, 2001 at 01:13:38PM -0500, Gilles Detillieux wrote: > > The added "&" after the "$" in 3.1.5 template files causes the template > > variable to be SGML-encoded. I suspect that the debian release of > > htdig didn't bother updating the template files it installs, but instead > > installs something they customized from an earlier version of htdig. > > No, the files in debian contain properly escaped substitutions, it were just > my templates that had the problem. Still, I think it is a design error to > make the default syntax for variable substitution (which is the same every > other program uses) insecure. > You should have to take additional steps if you want insecure behaviour, > not if you want secure behaviour. I have to disagree with you on this point. Whether the default syntax is insecure or not depends totally on the context in which the template variable is used, and how that template variable is generated. To change the default syntax so it SGML encodes the variable by default would seriously break just about any existing template file, because a great number of template variables can't be encoded this way - they're supposed to contain HTML tags that go straight through to the results page. Just a few of these variables are: METHOD, FORMAT, SORT, PAGEHEADER, PREVPAGE, PAGELIST, NEXTPAGE, EXCERPT, STARSLEFT and STARSRIGHT. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Florian H. <ha...@bi...> - 2001-10-24 12:52:18
|
On Fri, Oct 19, 2001 at 09:22:00AM -0500, Gilles Detillieux wrote: > I have to disagree with you on this point. Whether the default syntax > is insecure or not depends totally on the context in which the template > variable is used, and how that template variable is generated. No, it doesn't depend on the context. The default syntax passes client supplied data unchanged and untested to the result page. This is something that should never happen, under no circumstance. Things like STARSLEFT are totally different, they do not use client supplied information and so are not vulnerable to cross site scripting attacs. WORDS is. Yours, Florian. |
From: Gilles D. <gr...@sc...> - 2001-10-24 22:25:35
|
According to Florian Hars: > On Fri, Oct 19, 2001 at 09:22:00AM -0500, Gilles Detillieux wrote: > > I have to disagree with you on this point. Whether the default syntax > > is insecure or not depends totally on the context in which the template > > variable is used, and how that template variable is generated. > > No, it doesn't depend on the context. The default syntax passes client > supplied data unchanged and untested to the result page. This is something > that should never happen, under no circumstance. OK, so assuming you use what you call the default syntax (i.e. $(var)) on a template variable that contains untested client data, does that mean that client data could compromise security in any context. Perhaps, although the exploit might have to be adapted to the particular context. So, maybe context is irrelevant as far as security is concerned. However, it's not irrelevant as far as choice of encoding is concerned. E.g. you don't use hex encoding in the middle of HTML text, but you can use it in a URL inside an <a ...> tag. > Things like STARSLEFT are totally different, they do not use client > supplied information and so are not vulnerable to cross site scripting > attacs. WORDS is. This is the main point I was trying to get across. Different variables have to be used in different ways! If we changed the behaviour of $(var) to SGML encode everything, it MIGHT make every exisiting template out there more secure, but it would almost CERTAINLY make them all unusable. I don't see this as the lesser of two evils. If we were to make a design decision of deliberately breaking any old template files out there to force users to adopt a more secure configuration, why not be honest about it and adopt an entirely new syntax for all template variable substitutions? As someone who ends up answering over 50% of the support requests on this list (many from people who never so much as glance at the FAQ), I'm not about to add to my workload by taking such a radical step. The default template files all use the correct syntax for the variables they use and the context in which they're used. So, for a fresh 3.1.5 or later installation, without old template files, you should be safe from cross-site scripting attacks. As distributed, htsearch is secure by default, regardless of what we choose to call the "default syntax". For sites that don't want to update their templates, that's their choice. Given the number of users I've seen on this list that are are still using pre-3.1.5 versions, which are far more blatently insecure than 3.1.5 with old templates, it's pretty clear that a lot of users don't see security issues as a big problem for their sites. I'm not about to make that my problem. If you feel so strongly about this issue, that under no circumstance should we release htsearch as it is, I suggest you volunteer your time as a developer, put the issue to a vote among the developers, implement it as you see fit if the vote passes, and stick around to deal with the fallout. As I often say, this isn't a one-man show. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |
From: Florian H. <fl...@ha...> - 2001-10-25 07:21:19
|
On Wed, Oct 24, 2001 at 05:25:23PM -0500, Gilles Detillieux wrote: > According to Florian Hars: > > Things like STARSLEFT are totally different, they do not use client > > supplied information and so are not vulnerable to cross site scripting > > attacs. WORDS is. > > This is the main point I was trying to get across. Well, actually, no. Otherwise you wouldn't suggest to treat client-supplied and server-supplied information in the same way: > If we changed the behaviour of $(var) to SGML encode everything, > it MIGHT make every exisiting template out there more secure, but it > would almost CERTAINLY make them all unusable. The easiest fix would probably to document the current behaviour appropiately, i.e. put a warning into the description of every template variable that might contain tainted client-supplied information and should never be used unencoded. This will mostly be WORDS (LOGICAL_WORDS and KEYWORDS might already be sanitized, I haven't looked at the source to verify this), and depending on whether you can trust the sites you are indexing the variables that display part of the indexed pages (but these look like they are already transformed to pure text, and so not vulnerable). Yours, Florian. |
From: Gilles D. <gr...@sc...> - 2001-10-25 19:02:27
|
According to Florian Hars: > On Wed, Oct 24, 2001 at 05:25:23PM -0500, Gilles Detillieux wrote: > > According to Florian Hars: > > > Things like STARSLEFT are totally different, they do not use client > > > supplied information and so are not vulnerable to cross site scripting > > > attacs. WORDS is. > > > > This is the main point I was trying to get across. > > Well, actually, no. Otherwise you wouldn't suggest to treat client-supplied > and server-supplied information in the same way: Well, I certainly don't recall ever suggesting such a thing. The fact that htsearch 3.1.4 and older did treat them the same way was indeed a problem. I was among those who first recognised it as a problem and I am the one extended htsearch's template handling so that you could treat them in different ways. I also fixed all the default templates to use the new syntax where appropriate. I would add further, though, that while client-supplied information may be tainted, it's also possible for server-supplied information to contain characters that need to be SGML-encoded. Examples of this are the URL and TITLE variables. Some server-supplied information might even be deliberately tainted, like the new METADESCRIPTION variable. (This is only an issue if you index untrusted sites, but some do.) However, the fact still remains that some internally-generated template variables cannot be SGML-encoded. Different template variables should be handled differently. That's been my point throughout this thread, whether I expressed it clearly enough before or not. We're not in disagreement on this point. The point on which we differ is how we should respond to the problem of users who still use old templates that don't handle variables differently or appropriately, or perhaps worse yet, base new templates on old, insecure templates without bothering to inform themselves about the risks. The two approaches suggested so far are: 1) (yours) Force the issue by making subsequent releases of htsearch always SGML-encode any template variable when the $(var) syntax is used. Presumably this would also involve the addition of a new syntax element for getting an unencoded variable out. The problem with this approach, as I pointed out, is... > > If we changed the behaviour of $(var) to SGML encode everything, > > it MIGHT make every exisiting template out there more secure, but it > > would almost CERTAINLY make them all unusable. I have a pretty good feel from experience about the volume of mail on the list this would generate, to say nothing of all the mail/pleas for help/flames Geoff and I would receive privately, if we adopted this approach. 2) (mine) Maintain the status quo. htsearch is now secure as distributed, so the problem only affects users who don't update their template files when updating htsearch. This would leave a lot of insecure existing htsearch implementations out there, but then there are still a lot of pre-3.1.5 htsearch implementations out there, after over a year and a half, which are far, far more insecure. In order to encourage users to update their templates, we can follow your very good suggestion... > The easiest fix would probably to document the current behaviour > appropiately, i.e. put a warning into the description of every template > variable that might contain tainted client-supplied information and should > never be used unencoded. This would certainly help the minority who actually reads the documentation (please pardon my cynicism) by stating the risks more clearly than they are now. > This will mostly be WORDS (LOGICAL_WORDS and KEYWORDS might already be > sanitized, I haven't looked at the source to verify this), and depending > on whether you can trust the sites you are indexing the variables that > display part of the indexed pages (but these look like they are already > transformed to pure text, and so not vulnerable). I think LOGICAL_WORDS is somewhat sanitized, but still it's not hurt by SGML-encoding. Some data from indexed pages should also be encoded. My "hit list" of template variables which should be SGML-encoded is: CONFIG, EXCLUDE, RESTRICT, WORDS, LOGICAL_WORDS, KEYWORDS, TITLE, URL, ANCHOR, METADESCRIPTION, DESCRIPTIONS, DESCRIPTION, SELECTED_FORMAT, SELECTED_METHOD, SELECTED_SORT. After giving it more thought, it occurred to me that as long as we're putting together a hit list like this for the documentation, we could do one better and put the hit list right in htsearch. This would lead to a third approach... 3) (the compromise kludge) Make htsearch check template variable names against the hit list above (using StringMatch) to force it to SGML-encode these variables even if the $(var) syntax is used. We'd probably still want to introduce a new syntax element to force unencoded output. (Suggestions?) I know this hit list approach is kludgy, but it does fairly neatly solve the security problems in old templates without breaking them altogether. I realize that any template variable generated by allow_in_form would potentially also be tainted, so it would also have to go on the hit list. The alternative is to make a "safe list" of variables that can't be SGML-encoded by default, but then we'd have to add any template variable generated by build_select_lists to that list. So, I put this before anyone on the developer's list who's still paying attention. What do you think? Should we take this third approach? Can you think of anything it might break? Is the hit list complete enough or are there some I missed? What should the syntax be for forcing an unencoded variable output? -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |