|
From: Julian S. <js...@ac...> - 2006-11-17 19:13:43
|
> I am working on a reverse engineering shared library which can be called > from a variety of programs. The present problem I am trying to solve is > how to detect a binary program is compressed or not. [...] FWIW, I have no clue about Windows so I can't help you there. What I am struck by is that this seems a roundabout way to discover whether or not an executable is compressed. There are some pretty effective techniques for guessing whether or not a sequence of bytes comes from a known source if you have previous examples of the outputs of candidate sources. The crudest version of would be to simply try to compress the executable. Even a simple compression algorithm should be able to get x86 code below 4 bits/byte, whereas random data -- ie, already compressed code -- will not compress any further. If you want to get more sophisticated you could use a PPM-based sequence prediction algorithm. Train it on various source models, eg x86 code, amd64 code, english; then use each one in turn to compress (parts of) the executable. Typically the model that most closely matches the fragment you are testing will do noticeably better than the rest. PPM (Prediction by Partial Matching) is fairly simple to implement. J |