[Owasp-input-api-developers] turning the problem inside out

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ok, so I had a thought this evening. Perhaps we're looking at things in a 
way that's not helping us figure out what works best from an applied 
standpoint.

My thought is that perhaps we should think a bit more about how we might 
filter at _every_ input/output in a system. Seems to me like I have been 
trying to solve the problem of accounting (or filtering) for every possible 
abuse scenario as soon as I get the data, and the more I think about it the 
more I realize that this eventually works against us. I forces me (and end 
developers) to think about every use of their data further down the road 
than is necessaray, breaking the "don't make me think" principle.

So perhaps we can do better by only addressing the immediate threats. Let 
me expand on that before you flame me into oblivion. Consider the following 
scenarios:

	1.) taking form input
	2.) insert a string into my database
	3.) pulling a peice of data out of a database
	4.) displaying data to a client

Were I to consider these data flows through the system as a whole (as they 
could easily be the same piece of data), then I've got some thorny 
questions, and it's my instinct to handle them all up front (which is what 
I've been tempted to design for thus far). But what happens when I get data 
from another input (say a file) and I don't KNOW where it's going to end up 
(as I often may not at various stages in a project)? Well, I have to filter 
for everything and hope I don't loose what I need. Strongly sub-optimal. I 
(at least) have been considering how I'm going to protect the entire system 
from from between items 1 and 2, kinda like:

	(1) -> filter(for everything) -> (2) -> (3) -> (4) 

at which point I'm just left praying that I've thought of everything at 1. 
I also realized this evening that if we design a system that is _capable_ 
of this (easily), that's exactly how it's going to be used.

What might work better from a strictly security-related perspective would 
be to consider boundaries as the important parts of the system, and not 
necessarialy think beyond them. So, for instance, consider getting intput 
from a form. My intput is from the server, my output is to the runtime. If 
I concern myself only with the threats exposed at this boundary, my job 
gets a lot more managable. I only have to filter at this boundary for 
things that adversely affect me in the runtime.

Now consider (2). If I do boundary-centric filtering, I concern myself with 
the input from the runtime environment (which I can call a "dirty" input 
from the persepctive of the DB) and only worry about the class of things 
that can go wrong with the boundary I'm crossing: SQL injection, SQL type 
mismatches, etc. I've strictly defined what the types are on either side 
(because one side enforces them) and so I don't have ot worry about 
cross-language problems: I'm talking SQL, it has requirements that my 
filter at this boundary MUST be conscious of. We filter the malicious 
inputs to my subsystem here and move on.

But you say "what about XSS?!?, you're ignoring it!!". Yes, I am. But not 
for long. Consider the SELECT from the DB into a variable in my runtime: if 
I consider the input from the SQL "dirty", I can strongly filter all my 
inputs to the environment and not have to worry about "where" the data came 
from (which is really "what boundary did it pass from to get into the 
DB?"). All I have to worry about are my current boundaries.

Likewise when I output from the runtime to an external listener (whatever 
it may be). I have NO CLUE how my data is going to really be used between 
(1) and (2), but at the outbound from the environment to the end client, I 
know a LOT more about who I have to protect and how.

So here's what I'm proposing: we define "boundary" filters, not 
necessarialy filters for one type of attack or another. We could implement 
these using a common signature backend or whatever (although I think that's 
way too much overhead, personally). By implementing the toolkit this way, 
we will "force" developers to think of their data as "tainted" as it 
crosses each boundary (which is a sane assumption, actually), while at the 
same time breaking up the load of filtering and the mental load of tracing 
data through the system explicitly.

As an added benefit, large projects that may be developed by many 
developers often NEVER know where data REALLY comes from or goes to, or for 
that matter what touches it in between. By breaking up our task into 
boundary based segments, we can enable large projects like this to become 
hardened in a sysetmic way that doesn't require the kinds of complex 
context that can easily become mishmashed in the minds of developers.

Possible downsides: filtering at the boundary is drop dead simple in terms 
of what an implementer has to know how to use at each step, but it does 
mean that security has to be "added" at each step. This creates more raw 
locations in code where the API has to be applied, and that might be a 
significant detractor. A "scanner" that looks for I/O calls might be one 
way to address this issue. Grouping of boundary checkers (providing 
combinations of the most common boundaries?) might also be a way to 
decrease this problem, but overall I feel that once a developer is "used 
to" applying filters everywhere, this might become less of a concern.

The more I consider the sheer number of permeutations that we would have to 
provide for when doing "monolithic" filtering, the less approachable the 
problem seems. By breaking it up into situations where we filter at every 
boundary crossing, we can provide much simpler syntax that directly 
addresses the problems presented at each "place" in the system.

Thoughts?

-- 
Alex Russell
al...@Se...
al...@ne...