Hi...
> have a look through the archives and try to find the "vision document".
Hm... I couldnt locate it. :o(
But I saw the idmef thing together with some other code...
What`s that? For me it seems to be a different way to describe=20
vulnerabilities, so is it a second VulnXML draft?
Please tell me some words about status/purpose.
> canonicalization. We need to be sure that what we're filtering is in th=
e=20
> charset we intend our filters to be handling, otherwise we open up=20
> ourselves to Unicode problems.
Wew, I think this is really a tricky / huge problem.
As you surely know, nearly all charset de/encoding mechanisms are not
trivial. If you really try to canonicalize everything before filtering, I=
bet=20
that the only attack an attacker has to try out is a simple overloading.
As I already stated, on the protocol / db layer (which should considered
to be the most sensitive one) you can assume at least everything to be
eight-bit, if not seven (ascii) or even six bit (base64).
Thus, the simple canonicalization on that layer would consist of bit mask=
ing.
The filters on other stages may well need a full-fledged charset=20
canonicalization, but it should only happen comparitively seldom that a
user has to provide charset-dependent input (e.g. i18n names / search
patterns or the like).
> eh. canonicalization before flitering makes this moot.
see above.
> > 6. the pattern matching should reject a byte sequence immediately if =
it
> > finds some byte / byte pattern that is not allowed.
> that's what the "good chars" stuff is for.
Yep, of course... I just wanted to restate that, because there are regexp=
=20
parts in the code, which *could* be more inefficient that the goodbytes
approach.
> I didn't include a "filter stack" in with the tarball, but we will have=
one.
Yep, just a hint to avoid reduplication of work.
> because I haven't coded any java in a year =3D )
> All idiom problems are my own, sorry.
Never mind, that's why the remarks came in triple braces. :o)
Kind regards
Ingo
|