[Owasp-input-api-developers] turning the problem inside out]]

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Interesting debate. My 2 cents worth for what they are worth.

You know I am far from an expert on this in any shape or form, but if I
were doing this (and this may already be assumed in another mail
thread)  before solving that problem I would first define a system that
allows the developer to specify a small set of valid input chars and
reject everything else at the boundary. Its pretty damn hard to run sql
injection, xss etc anywhere if you only allow A-Z ASCII as input. This
would cover the huge majority of simple apps that allow users to enter
text, be simple to implement and effective. Most canonicalization is
dependent on the web server anyways and you may not have that much
control over the input. Canonicalize a URL encoded or Unicodes version
of anything and it wont be A-Z anymore so would drop. Of course it needs
to be put into a safe data type, operated on etc before it becomes a
variable in play so to speak.

Then comes the more difficult problems you discuss. 

I like the idea of defining an inbound and outbound gateway and
(boundary) and operating in that. I would agree that all data should be
checked, but to an extent as you point out thats a huge task and maybe
beyond this project. It may also mean that implementing what you design
would require a significant amount of reworking for existing
applications. 

The AV paradigm would seem to work well. You check file reads and
writes, not every thread call. Maybe in Java it becomes a callable
service that your markup and input can call when receiving and writing
data.

I think that approach certainly scopes the project much better, makes
deployment and post-production customization easier etc. 

> -------- Original Message --------
> Subject: [Owasp-input-api-developers] turning the problem inside out
> Date: Tue, 18 Jun 2002 20:12:13 -0500
> From: Alex Russell <al...@se...>
> Organization: SecurePipe
> To: owa...@li...
> 
> Ok, so I had a thought this evening. Perhaps we're looking at things in 
> a way that's not helping us figure out what works best from an applied
> standpoint.
> 
> My thought is that perhaps we should think a bit more about how we might
> filter at _every_ input/output in a system. Seems to me like I have been
> trying to solve the problem of accounting (or filtering) for every 
> possible abuse scenario as soon as I get the data, and the more I think 
> about it the more I realize that this eventually works against us. I 
> forces me (and end developers) to think about every use of their data 
> further down the road than is necessaray, breaking the "don't make me 
> think" principle.
> 
> So perhaps we can do better by only addressing the immediate threats. 
> Let me expand on that before you flame me into oblivion. Consider the 
> following scenarios:
> 
> 	1.) taking form input
> 	2.) insert a string into my database
> 	3.) pulling a peice of data out of a database
> 	4.) displaying data to a client
> 
> Were I to consider these data flows through the system as a whole (as 
> they could easily be the same piece of data), then I've got some thorny
> questions, and it's my instinct to handle them all up front (which is 
> what I've been tempted to design for thus far). But what happens when I 
> get data from another input (say a file) and I don't KNOW where it's 
> going to end up (as I often may not at various stages in a project)? 
> Well, I have to filter for everything and hope I don't loose what I 
> need. Strongly sub-optimal. I (at least) have been considering how I'm 
> going to protect the entire system from from between items 1 and 2, 
> kinda like:
> 
> 	(1) -> filter(for everything) -> (2) -> (3) -> (4)
> 
> at which point I'm just left praying that I've thought of everything at 
> 1. I also realized this evening that if we design a system that is 
> _capable_ of this (easily), that's exactly how it's going to be used.
> 
> What might work better from a strictly security-related perspective 
> would be to consider boundaries as the important parts of the system, 
> and not necessarialy think beyond them. So, for instance, consider 
> getting intput from a form. My intput is from the server, my output is 
> to the runtime. If I concern myself only with the threats exposed at 
> this boundary, my job gets a lot more managable. I only have to filter 
> at this boundary for things that adversely affect me in the runtime.
> 
> Now consider (2). If I do boundary-centric filtering, I concern myself 
> with the input from the runtime environment (which I can call a "dirty" 
> input from the persepctive of the DB) and only worry about the class of 
> things that can go wrong with the boundary I'm crossing: SQL injection, 
> SQL type mismatches, etc. I've strictly defined what the types are on 
> either side (because one side enforces them) and so I don't have ot 
> worry about cross-language problems: I'm talking SQL, it has 
> requirements that my filter at this boundary MUST be conscious of. We 
> filter the malicious inputs to my subsystem here and move on.
> 
> But you say "what about XSS?!?, you're ignoring it!!". Yes, I am. But 
> not for long. Consider the SELECT from the DB into a variable in my 
> runtime: if I consider the input from the SQL "dirty", I can strongly 
> filter all my inputs to the environment and not have to worry about 
> "where" the data came from (which is really "what boundary did it pass 
> from to get into the DB?"). All I have to worry about are my current 
> boundaries.
> 
> Likewise when I output from the runtime to an external listener 
> (whatever it may be). I have NO CLUE how my data is going to really be 
> used between (1) and (2), but at the outbound from the environment to 
> the end client, I know a LOT more about who I have to protect and how.
> 
> So here's what I'm proposing: we define "boundary" filters, not
> necessarialy filters for one type of attack or another. We could 
> implement these using a common signature backend or whatever (although I 
> think that's way too much overhead, personally). By implementing the 
> toolkit this way, we will "force" developers to think of their data as 
> "tainted" as it crosses each boundary (which is a sane assumption, 
> actually), while at the same time breaking up the load of filtering and 
> the mental load of tracing data through the system explicitly.
> 
> As an added benefit, large projects that may be developed by many
> developers often NEVER know where data REALLY comes from or goes to, or 
> for that matter what touches it in between. By breaking up our task into
> boundary based segments, we can enable large projects like this to 
> become hardened in a sysetmic way that doesn't require the kinds of 
> complex context that can easily become mishmashed in the minds of 
> developers.
> 
> Possible downsides: filtering at the boundary is drop dead simple in 
> terms of what an implementer has to know how to use at each step, but it 
> does mean that security has to be "added" at each step. This creates 
> more raw locations in code where the API has to be applied, and that 
> might be a significant detractor. A "scanner" that looks for I/O calls 
> might be one way to address this issue. Grouping of boundary checkers 
> (providing combinations of the most common boundaries?) might also be a 
> way to decrease this problem, but overall I feel that once a developer 
> is "used to" applying filters everywhere, this might become less of a 
> concern.
> 
> The more I consider the sheer number of permeutations that we would have 
> to provide for when doing "monolithic" filtering, the less approachable 
> the problem seems. By breaking it up into situations where we filter at 
> every boundary crossing, we can provide much simpler syntax that 
> directly addresses the problems presented at each "place" in the system.
> 
> Thoughts?
> 
> -- 
> Alex Russell
> al...@Se...
> al...@ne...