Ok, so I had a thought this evening. Perhaps we're looking at things in a
way that's not helping us figure out what works best from an applied
standpoint.
My thought is that perhaps we should think a bit more about how we might
filter at _every_ input/output in a system. Seems to me like I have been
trying to solve the problem of accounting (or filtering) for every possible
abuse scenario as soon as I get the data, and the more I think about it the
more I realize that this eventually works against us. I forces me (and end
developers) to think about every use of their data further down the road
than is necessaray, breaking the "don't make me think" principle.
So perhaps we can do better by only addressing the immediate threats. Let
me expand on that before you flame me into oblivion. Consider the following
scenarios:
1.) taking form input
2.) insert a string into my database
3.) pulling a peice of data out of a database
4.) displaying data to a client
Were I to consider these data flows through the system as a whole (as they
could easily be the same piece of data), then I've got some thorny
questions, and it's my instinct to handle them all up front (which is what
I've been tempted to design for thus far). But what happens when I get data
from another input (say a file) and I don't KNOW where it's going to end up
(as I often may not at various stages in a project)? Well, I have to filter
for everything and hope I don't loose what I need. Strongly sub-optimal. I
(at least) have been considering how I'm going to protect the entire system
from from between items 1 and 2, kinda like:
(1) -> filter(for everything) -> (2) -> (3) -> (4)
at which point I'm just left praying that I've thought of everything at 1.
I also realized this evening that if we design a system that is _capable_
of this (easily), that's exactly how it's going to be used.
What might work better from a strictly security-related perspective would
be to consider boundaries as the important parts of the system, and not
necessarialy think beyond them. So, for instance, consider getting intput
from a form. My intput is from the server, my output is to the runtime. If
I concern myself only with the threats exposed at this boundary, my job
gets a lot more managable. I only have to filter at this boundary for
things that adversely affect me in the runtime.
Now consider (2). If I do boundary-centric filtering, I concern myself with
the input from the runtime environment (which I can call a "dirty" input
from the persepctive of the DB) and only worry about the class of things
that can go wrong with the boundary I'm crossing: SQL injection, SQL type
mismatches, etc. I've strictly defined what the types are on either side
(because one side enforces them) and so I don't have ot worry about
cross-language problems: I'm talking SQL, it has requirements that my
filter at this boundary MUST be conscious of. We filter the malicious
inputs to my subsystem here and move on.
But you say "what about XSS?!?, you're ignoring it!!". Yes, I am. But not
for long. Consider the SELECT from the DB into a variable in my runtime: if
I consider the input from the SQL "dirty", I can strongly filter all my
inputs to the environment and not have to worry about "where" the data came
from (which is really "what boundary did it pass from to get into the
DB?"). All I have to worry about are my current boundaries.
Likewise when I output from the runtime to an external listener (whatever
it may be). I have NO CLUE how my data is going to really be used between
(1) and (2), but at the outbound from the environment to the end client, I
know a LOT more about who I have to protect and how.
So here's what I'm proposing: we define "boundary" filters, not
necessarialy filters for one type of attack or another. We could implement
these using a common signature backend or whatever (although I think that's
way too much overhead, personally). By implementing the toolkit this way,
we will "force" developers to think of their data as "tainted" as it
crosses each boundary (which is a sane assumption, actually), while at the
same time breaking up the load of filtering and the mental load of tracing
data through the system explicitly.
As an added benefit, large projects that may be developed by many
developers often NEVER know where data REALLY comes from or goes to, or for
that matter what touches it in between. By breaking up our task into
boundary based segments, we can enable large projects like this to become
hardened in a sysetmic way that doesn't require the kinds of complex
context that can easily become mishmashed in the minds of developers.
Possible downsides: filtering at the boundary is drop dead simple in terms
of what an implementer has to know how to use at each step, but it does
mean that security has to be "added" at each step. This creates more raw
locations in code where the API has to be applied, and that might be a
significant detractor. A "scanner" that looks for I/O calls might be one
way to address this issue. Grouping of boundary checkers (providing
combinations of the most common boundaries?) might also be a way to
decrease this problem, but overall I feel that once a developer is "used
to" applying filters everywhere, this might become less of a concern.
The more I consider the sheer number of permeutations that we would have to
provide for when doing "monolithic" filtering, the less approachable the
problem seems. By breaking it up into situations where we filter at every
boundary crossing, we can provide much simpler syntax that directly
addresses the problems presented at each "place" in the system.
Thoughts?
--
Alex Russell
al...@Se...
al...@ne...
|