From: Alex R. <al...@se...> - 2002-06-19 01:11:55
|
Ok, so I had a thought this evening. Perhaps we're looking at things in a way that's not helping us figure out what works best from an applied standpoint. My thought is that perhaps we should think a bit more about how we might filter at _every_ input/output in a system. Seems to me like I have been trying to solve the problem of accounting (or filtering) for every possible abuse scenario as soon as I get the data, and the more I think about it the more I realize that this eventually works against us. I forces me (and end developers) to think about every use of their data further down the road than is necessaray, breaking the "don't make me think" principle. So perhaps we can do better by only addressing the immediate threats. Let me expand on that before you flame me into oblivion. Consider the following scenarios: 1.) taking form input 2.) insert a string into my database 3.) pulling a peice of data out of a database 4.) displaying data to a client Were I to consider these data flows through the system as a whole (as they could easily be the same piece of data), then I've got some thorny questions, and it's my instinct to handle them all up front (which is what I've been tempted to design for thus far). But what happens when I get data from another input (say a file) and I don't KNOW where it's going to end up (as I often may not at various stages in a project)? Well, I have to filter for everything and hope I don't loose what I need. Strongly sub-optimal. I (at least) have been considering how I'm going to protect the entire system from from between items 1 and 2, kinda like: (1) -> filter(for everything) -> (2) -> (3) -> (4) at which point I'm just left praying that I've thought of everything at 1. I also realized this evening that if we design a system that is _capable_ of this (easily), that's exactly how it's going to be used. What might work better from a strictly security-related perspective would be to consider boundaries as the important parts of the system, and not necessarialy think beyond them. So, for instance, consider getting intput from a form. My intput is from the server, my output is to the runtime. If I concern myself only with the threats exposed at this boundary, my job gets a lot more managable. I only have to filter at this boundary for things that adversely affect me in the runtime. Now consider (2). If I do boundary-centric filtering, I concern myself with the input from the runtime environment (which I can call a "dirty" input from the persepctive of the DB) and only worry about the class of things that can go wrong with the boundary I'm crossing: SQL injection, SQL type mismatches, etc. I've strictly defined what the types are on either side (because one side enforces them) and so I don't have ot worry about cross-language problems: I'm talking SQL, it has requirements that my filter at this boundary MUST be conscious of. We filter the malicious inputs to my subsystem here and move on. But you say "what about XSS?!?, you're ignoring it!!". Yes, I am. But not for long. Consider the SELECT from the DB into a variable in my runtime: if I consider the input from the SQL "dirty", I can strongly filter all my inputs to the environment and not have to worry about "where" the data came from (which is really "what boundary did it pass from to get into the DB?"). All I have to worry about are my current boundaries. Likewise when I output from the runtime to an external listener (whatever it may be). I have NO CLUE how my data is going to really be used between (1) and (2), but at the outbound from the environment to the end client, I know a LOT more about who I have to protect and how. So here's what I'm proposing: we define "boundary" filters, not necessarialy filters for one type of attack or another. We could implement these using a common signature backend or whatever (although I think that's way too much overhead, personally). By implementing the toolkit this way, we will "force" developers to think of their data as "tainted" as it crosses each boundary (which is a sane assumption, actually), while at the same time breaking up the load of filtering and the mental load of tracing data through the system explicitly. As an added benefit, large projects that may be developed by many developers often NEVER know where data REALLY comes from or goes to, or for that matter what touches it in between. By breaking up our task into boundary based segments, we can enable large projects like this to become hardened in a sysetmic way that doesn't require the kinds of complex context that can easily become mishmashed in the minds of developers. Possible downsides: filtering at the boundary is drop dead simple in terms of what an implementer has to know how to use at each step, but it does mean that security has to be "added" at each step. This creates more raw locations in code where the API has to be applied, and that might be a significant detractor. A "scanner" that looks for I/O calls might be one way to address this issue. Grouping of boundary checkers (providing combinations of the most common boundaries?) might also be a way to decrease this problem, but overall I feel that once a developer is "used to" applying filters everywhere, this might become less of a concern. The more I consider the sheer number of permeutations that we would have to provide for when doing "monolithic" filtering, the less approachable the problem seems. By breaking it up into situations where we filter at every boundary crossing, we can provide much simpler syntax that directly addresses the problems presented at each "place" in the system. Thoughts? -- Alex Russell al...@Se... al...@ne... |