Re: [Myghty-users] "safe" method contents

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 2/5/06, David Geller <dg...@sp...> wrote:
> What is necessary to prevent users from inserting potential malware in
> these text blobs?
>
> So far I have:
>
> 1. Filter/escape all lines starting with a %
> 2. Filter/escape all <% %> constructs
> 3. Filter/escape all <& &> constructs
>
> Am I missing something, or will this render the text "safe"?

First, making text safe for direct inclusion into a myghty "source"
file is going to be quite challenging to get right.  You may actually
be better off "compiling" those text blobs into module components
(python code using m.write()) so that Myghty doesn't even try to parse
it.  (Then you only need to check for, say, """).

Anyway, if you're curious about the exact regular expressions used,
look at the source file lib/myghty/lexer.py

The general recognized Myghty syntax falls into the following patterns
(the regexes are a little more restrictive than this actually):

  # .... \n      -- must start at beginning of line
  % ... \n     -- must start at beginning of line
  <%...%>
  <%...>
  </%...>
  <&...&>
  </&>

Also a "\" at the end of a line (or file) is recognized as special
markup too, but
it is probably fairly safe to let that one through.  If you
prevent/filter those patterns then Myghty itself should not interpret
anything directly (it will treat it all as output data).

You DO need to be cautious about how you filter or remove anything
though.  For example consider the following user-created text blob:

  <<% removeme %>% open('/etc/passwd','r').read() %<% removeme %>>

If you do nieve filtering, you may in fact change that into,

  <% open('etc/passwd','r').read() %>

which certainly is not save for inclusion into Myghty templates.

But even with perfect filtering, that still doesn't necessarily mean
it safe.  It really depends on the total environment and what you mean
by safe.  For example, the "text" could contain any arbitrary HTML
markup (assuming you're serving HTML).  The markup could also include
"code" which is executable by the browser (Javascript, etc).  Or even
dangerous URLs (like the data: scheme).

Surprisingly, even serving up text/plain documents may not always be
safe, thank's to Microsoft's bug/feature of second-guessing the
server's declared MIME type.  It is relatively easy to get IE to
interpret plain text as html.
--
Deron Meranda