|
From: Brian M. <bm...@di...> - 2003-09-12 01:50:29
|
All, Firstly, let me apologize for airing a half-baked idea that is probably crazy. Stop reading now if you are easily offended by ideas that are not well thought out. No flames please. A couple days ago there was an article on Slashdot discussing a tool created by Eric S Raymond called the "comparator" that generated md5sum hashes for every three lines of source code fed into it. See details here: http://slashdot.org/article.pl?sid=03/09/09/2129207 The gist of the Slashdot posting was that you could run this tool on the SCO source tree and on the linux source tree and see how much code was in common. The Slashdot community response was mixed as you might expect. They raised two basic issues: 1. SCO would never let anyone run this tool on their source, and legitimate licensees may be on shaky legal ground if they were to do it. 2. This would only detect complete cut-and-paste code reuse. Something as simple as a search-and-replace of variable names would throw off the md5sum hash enough to prevent a match. I was thinking about this and was curious if something like this could be used on the generated machine code as executed at runtime. Could a valgrind skin be written to md5sum the running system at the micro-code level? It seems like this might be possible with valgrind and/or user-mode linux and/or some sort of bastardized Bochs/vmware type thing. Clearly the compiler/optimizer used would affect the generated machine code. Assuming that you could somehow compile linux using the same compiler/optimizer as the SCO release, what are some other problems that would prevent this from being technically feasible? Thanks for indulging me, --Brian |
|
From: Nicholas N. <nj...@ca...> - 2003-09-12 08:49:56
|
On Thu, 11 Sep 2003, Brian Mosher wrote: > A couple days ago there was an article on Slashdot discussing a tool created > by Eric S Raymond called the "comparator" that generated md5sum hashes for > every three lines of source code fed into it. See details here: > [snip] > I was thinking about this and was curious if something like this could be > used on the generated machine code as executed at runtime. Could a valgrind > skin be written to md5sum the running system at the micro-code level? It > seems like this might be possible with valgrind and/or user-mode linux > and/or some sort of bastardized Bochs/vmware type thing. > > Clearly the compiler/optimizer used would affect the generated machine code. > Assuming that you could somehow compile linux using the same > compiler/optimizer as the SCO release, what are some other problems that > would prevent this from being technically feasible? I'd be inclined to do it on the static machine code, rather than the dynamic instruction stream. One big problem: I expect that machine code is far less "stable" than source code, and tiny changes in the source code would make big differences in the machine code (eg. register allocation). Or if you used a different compiler or optimisation level. Also, I imagine machine code would be less "distinctive" than source code, for comparison purposes. Also, Valgrind can't run the Linux kernel (although maybe something could be done with UML). So I'm sceptical, but willing to be proven wrong :) For a discussion of a similar-ish idea, have a look at www.cl.cam.ac.uk/~njn25/pubs/redux2003.ps.gz N |