While working on reproducible builds for openSUSE, I found that
when building the ooRexx
package, there were differences between each build:
/usr/bin/rexx.img
had many such diffs:
-00000000 b8 71 15 00 00 00 00 00 08 cc 2f 1b 91 7f 00 00
+00000000 b8 71 15 00 00 00 00 00 08 bc 4b e7 19 7f 00 00
The differences went away when I disabled Address Space Layout Randomization (ASLR) e.g.
via setarch -R
The diffs have obvious pointer addresses spread across the img data, e.g. above you can see at offset 8 0x00007f911b2fcc08 vs
0x00007f19e74bbc08
There the last 12 bits do not differ, because they are the offset within a 4K page and the first bits are not usable in x86_64. The middle is randomized by ASLR.
Created rexx.img binaries should be deterministic.
See https://reproducible-builds.org/ for why this matters.
The rexx.img file is created by the rexximage program using as input CoreClasses.orx PlatformObjects.orx StreamClasses.orx in a directory.
I tried to understand how the data is generated, but it is spread over many cpp and hpp files: Interpreter.cpp RexxMemory.hpp+cpp interpreter/memory/Setup.cpp
Anonymous
Some fixes for this commit [r12153]
Related
Commit: [r12153]
Last edit: Erich 2021-02-21
Hopefully the final fixes in [r12155]
Related
Commit: [r12155]
Last edit: Erich 2021-02-21
To be able to better see what was changed, I used git svn to mirror the repo (and setup a cron job to update it every 15m)
https://github.com/gitmirrors2/ooRexx/commit/18a0b44f0617531db0eecdab3fd17a6293e9ff79
https://github.com/gitmirrors2/ooRexx/commit/1ed6f3f37cd69d4c47671530d59ede4f60c189e1
The fixes have zeroed out almost all memory addresses, except for three Array instances whose pointer to their methodDictionary still shows. Below is one example at address
001C0F08
.This is a 64-bit rexx.img on Windows.
There's also a bunch of (some 20) Set instances with data looking similar to a memory address, but I'm not sure. See address
001677F8
.Additional fixes committed [r12157].
The object at 001C0F08 was a LibraryPackage object, not an array. I also spotted and fixed a small potential exposure with arrays. I'm not seeing any differences between builds with VBDiff. But I didn't see any yesterday either after the other fixes, so somehow the LibraryPackage one slipped through.
Related
Commit: [r12157]
Last edit: Erich 2021-02-21
Ah, I didn't notice there's a normalization for internal class type numbers.
So the object starting at 00167770 really is a ControlledDoInstruction (not a Set instance) and the last 5 bytes in above listed line
001677F0 00 00 00 00 00 00 00 00 01 00 00 E5 FF 7F 00 00
are changing I assume because they are uninitialized padding bytes after the last three bytes of ControlledLoop class:uint8_t expressions[3]; // controlled loop expression order
In a DEBUG build these bytes show up as
CC CC CC CC CC
which probably is an MSVC debug feature.There may be more objects with uninitialized padding where DEBUG show CC chains - I'm seeing e. g. 16 CC bytes in type 66 RexxCode objects.
A little debugging tip. If you're in the debugger, if you expand a variable in the Windows debugger (for example, the variable copyObject in the saveimage loop), the first field displayed is that of the class that owns the virtual object pointer of the object. That way you don't have to figure out the object type from the type numbers.
OK, I've been debugging this from debug the build, which explains why I haven't been seeing any differences here. It's strange that these should be anything other than zero, since the memory manager zeros the memory out when an object is allocated. Not sure what would be setting this. This could be a bit tricky to eliminate, since we need to handle both 32-bit and 64-bit versions.
OK, I'm pretty sure I know source of the garbage/padding bytes. When the object is constructed, the controlLoop field is initialized using the ControlledLoop instance passed from the parser. Since that object is originally allocated from the stack, the entire object is just copied into the field, including what ever padding came from the stack. Obviously, in debug mode, the compiler is initializing everything to CC, while for non-debug, it is picking up whatever garbage happens to be there. I think this can be fixed by overriding the assignment method and just do a field-by-field copy.
Fix for controlled loop padding issues fixed [r12158]
Related
Commit: [r12158]
Last edit: Erich 2021-02-21
With change [r12155] .nil hashCode now returns DEADBEEF
I'm not sure if this is an actual issue though
say .nil~hashCode~reverse~c2x -- 00000000DEADBEEF
Related
Commit: [r12155]
On Sun, Feb 21, 2021 at 11:32 AM Erich erich_st@users.sourceforge.net
wrote:
It is just an arbitrary value picked to remain constant. It also has some
old history around it. Not an issue.
Rick
Related
Bugs:
#1712Commit: [r12155]
What is the "old history around it" in this case?
Found "deadbeef" mentioned in:
https://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_debug_values and
https://en.wikipedia.org/wiki/Hexspeak
The VM operating system used this value as an eye-catcher in memory. The
value really stood out in dumps. Even in ooRexx, when garbage collection
takes place, the dead objects are given a DEAD eyecatcher value.
Rick
On Tue, Feb 23, 2021 at 7:59 AM Rony G. Flatscher orexx@users.sourceforge.net wrote:
Related
Bugs:
#1712