|
From: Dan I. <Da...@Sq...> - 2004-04-14 05:25:12
|
Folks - I've been scrutinizing our old object header format, contemplating 64-bit variants, speculating on the demise of compact classes, and thinking about object header types. I've come up with a proposal that I believe maximizes compatibility with 64-bit images, and minimizes the number of changes that need to be made in the code. You will also find a variant proposal and some discussion at the end. Please look it over and tell me what you think. Before proposing any radical departures, please consider them from the standpoint of your having to write all the new code and get it to work flawlessly. Thanks - Dan ---------------------------------------- As a refresher course, here's the current object header format, as documented in class ObjectMemory: 3 bits reserved for gc (mark, old, dirty) 12 bits object hash (for HashSets) 5 bits compact class index 4 bits object format 6 bits object size in 32-bit words 2 bits header type (0: 3-word, 1: 2-word, 2: forbidden, 3: 1-word) and here is the current encoding of the "object format" field: 0 no fields 1 fixed fields only (all containing pointers) 2 indexable fields only (all containing pointers) 3 both fixed and indexable fields (all containing pointers) 4 both fixed and indexable weak fields (all containing pointers). 5 unused 6 indexable word fields only (no pointers) 7 unused 8-11 indexable byte fields only (no pointers) (low 2 bits are low 2 bits of size) 12-15 compiled methods: # of literal oops specified in method header, followed by indexable bytes (same interpretation of low 2 bits as above) Here is Proposal #1 (my preference) for Version 4... Memory management bits (high 3 bits) In actual fact, we currently only use the top two bits. I propose keeping these as they are, and reserving the third one for immutability. Header type bits With the demise of compact classes, header type zero goes away. Currently the garbage collector uses type 2. My proposal is to leave these as is, except to relabel type 2 as gcOnly, and type 3 as forbidden. I thought of changing the order of 0 and 1 for convenient logic, but I don't think this matters and it all works right now. Object size This field, with low bits masked off, gives the object's size in host memory words, whether 32- or 64-bit words. The value is a don't-care for objects with large headers, but it will be given a value of zero in that case. Object format My proposal here is to add a bit to this field, and slightly reorganize it as follows: 0 no fields -- actually unused 1 fixed fields only (all containing pointers) 2 indexable fields only (all containing pointers) 3 both fixed and indexable fields (all containing pointers) 4 both fixed and indexable weak fields (all containing pointers). 5 unused 6 unused 7 unused 8 unused 9 indexable 64-bit fields only (no pointers) 10-11 indexable 32-bit fields only (no pointers) (low 0-1 bits are low 0-1 bits of size) 12-15 indexable 16-bit fields only (no pointers) (low 1-2 bits are low 1-2 bits of size) 16-23 indexable 8-bit fields only (no pointers) (low 2-3 bits are low 2-3 bits of size) 24-31 compiled methods: [becomes unused with NCMs] # of literal oops specified in method header, followed by indexable bytes (same interpretation of low 2-3 bits as above) Here's my thinking: We need an extra bit on all the non-pointer arrays when we go to 64 bits. The current logic is fairly simple and solid, so let's just add a bit to it where needed. Once we do this, there is the possibility of cleaning up the hacks that are currently used for arrays of 16-bit objects, and doing a reasonably uniform thing for 32-bit non-pointer arrays in a 64-bit image. [note there may be no real value to cleaning up 16-bit values at this point, especially since the current hacks work with signed values -- IIABDFI] Compact class index ...goes away. Object hash With the demise of compact classes, we have 5 bits to redistribute. Since we already took one for the format, we have 4 left to add to the object hash, bringing it up to 16 bits. 64-bit images Here the format would be almost identical, except we would stretch the hash field by 14 bits to 30 bits (my choice: highest positive smallInt value), and stretch the object size field by 18 bits, thus all but eliminating 3-word headers in the 64-bit world. Proposal #2 This is an alternative that I have seriously considered because it puts all the size bits together. I don't like it because it uses one more bit, and because it requires a shift of the word size. Object size Here we would combine all the size bits in one 8-bit field. It would be encoded (as now) so that the high bits are the size in words and the low bits are the low bits of the size-1. Thus you can mask to get the size in words, or add a constant to get the Squeak byte size. Unfortunately, since the low bits are included, a right shift is required to get around the header type. Object format In this scheme, 4 bits is still plenty for format, since the low bits of size are elsewhere. 0 no fields -- actually unused 1 fixed fields only (all containing pointers) 2 indexable fields only (all containing pointers) 3 both fixed and indexable fields (all containing pointers) 4 both fixed and indexable weak fields (all containing pointers). 5 unused 6 unused 7 unused 8 unused 9 indexable 64-bit fields 10 indexable 32-bit fields 11 indexable 16-bit fields 12 indexable 8-bit fields 13 compiled methods: [becomes unused with NCMs] # of literal oops specified in method header, followed by indexable bytes 14 unused 15 unused Discussion The problem with header type bits is that they have to be both on the word pointed to by the oop, and the first word encountered in a scan of memory. Our current scheme is nice because it puts them where they can be masked out of the class word. But that means that they are in the way if we want to integrate the size bits without having to shift them. A somewhat radical change would be to put the type bits at the top of the word, and change the field order so that the class word followed the header word or words. This would be a good thing, but it's another potentially large change. Moreover the GC usage of the forbidden type as a flag would have to be addressed as well. If this change had been done already, then Proposal #2 would become my preference. Considering all the above, I am inclined to proceed with Proposal #1 with a willingness to adopt #2 if someone (possibly me) comes up with all the changes needed to make it simple. It would definitely be nice to more the class word after the header regardless, because then memory scans would usually find the header word they want, rather than having to look a the next word every time (this was different with compact classes). |