| 
     
      
      
      From: Donal K. F. <don...@ma...> - 2010-02-03 09:29:26
      
     
   | 
On 02/02/2010 17:28, Andy Goth wrote: > From the way the data structures are defined, it appears to me that this new > approach would use less memory. Let me try to put some numbers to this, > with 32-bit/64-bit byte counts. The old and new numbers should compare > directly because I don't plan to change the preallocation characteristics. > I don't know the overhead each ckalloc() incurs. > > If you like, I could make a table comparing the expected memory utilization > for dicts of various sizes. I'm a bit suspicious of your figures, you know. The issue is alignment of structure elements. Typically, these are on machine word boundaries (we don't play any games to pack them tighter because those things are typically not portable) so there's a doubling of size from 32-bit to 64-bit (unless arrays are involved, when the default alignment rules tend to be tighter). Alignment is *very* important on modern processors so it pays to not be over-smart in code layout. Cost of malloc() on a 32-bit platform is 8 bytes. I've not looked at the cost on 64-bit systems, but I'd expect it to be at least 16. (It's got to be more than 8 in order to hold the length of the memory block, and it must also be word-aligned. Since there's usually some guard bytes too, that makes for 16 bytes.) The Tcl_Obj allocator avoids a lot of these costs. > New code: > > Each hash entry, both actively used and preallocated, requires... > 1. One Tcl_Obj * inside List.elements (4/8 bytes). > 2. One DictEntry inside Dict.entries (12/16 bytes). > 3. One int inside Dict.allocation (4/4 bytes). > 4. Total (20/28 bytes). There's no guarantee that a DictEntry will actually be 16 bytes on 64-bit. It might be 24 bytes. > Each list+dict intrep requires... > 1. One List (24/32 bytes). > 2. One Dict (36/56 bytes, not counting the static* arrays). > 3. Total (60/88 bytes). > 4. Two ckallocs() for List and List.dict. Going with 32-bit (easier to calculate) I make your dictionary's baseline cost out to be 61 words for Dict and 6 for List (at least; I wasn't quite sure what the memory management strategy for the list's array of pointers was). Or 268 bytes. Plus malloc overheads (16 bytes I estimate). The cost of those static arrays is the major part of these costs. Rules for estimating the cost of Dict: 9 words for non-array fields 4*1 words for the staticBuckets array 3*4*3=36 words for the staticEntries array 3*4*1=12 words for the staticAllocation array Total: 61 words. On 32-bit, long is the same size as int and has the same alignment rules. On 64-bit, it depends on the details of the architecture, but the most common 64-bit platform these days is LP64. Due to the complexity of alignment rules though, it's important to measure the actual size. In particular, I don't know whether adjacent 32-bit words will be packed into a single 64-bit word. (You should move the downShift field of your Dict one earlier to take advantage of this effect if it does apply...) I'm not bothering to spin up a 64-bit VM to test my theories. :-) > Each list+dict over the STATIC size wastes... > 1. An unused Dict.staticBuckets array (16/16 bytes). > 2. An unused Dict.staticEntries array (240/336 bytes). > 3. An unused Dict.staticAllocation array (48/48 bytes). > 4. Total (304/400 bytes). I don't trust that calculation. I make the 32-bit size of your DictEntry out to be 3 words and STATIC_ENTRIES as 12. That means that the staticEntries array is 36 words 144 bytes (alignment doesn't cause any issues here since the structure's naturally aligned on 32-bit). > Each list+dict over the STATIC size has an extra... > 1. Three ckalloc()s for Dict.buckets, Dict.entries, and Dict.allocation. > > Each non-dict intrep wastes... > 1. An unused List.dict pointer (4/8 bytes). > > Old code: > > Each actively used hash entry requires... > 1. One ChainEntry (28/56 bytes). > 2. One ckalloc() for the ChainEntry. > > Each dict intrep requires... > 1. One Dict (60/88 bytes, not counting staticBuckets). > 2. One ckalloc() for the Dict. Excluding the staticBuckets field makes your whole analysis a bit bogus. Instead, you've got to include such things. > Each dict intrep over the SMALL_HASH_TABLE size wastes... > 1. An unused Dict.table.staticBuckets array (16/32 bytes). I think that's the wrong way to account for that. > Each dict intrep over the SMALL_HASH_TABLE size has an extra... > 1. One ckalloc() for Dict.table.buckets. > > Some cuts are possible. A 32-bit hash could be used on a 64-bit machine, > which is the current Tcl behavior, saving four bytes per entry on a 64-bit > machine. Probably not a worthwhile saving. It will also increase the amount of time spent when working with the hash as non-full-word accesses pay a speed penalty on modern architectures. > Maybe when the Dict struct exceeds the STATIC size (12 entries), > it could be shrunk with ckrealloc() to no longer contain the static* arrays, > possibly saving 304/400 bytes for large dicts. The allocation array could > be interleaved into the entries array, since they always have the same > length (just add an allocation field to the DictEntry struct), saving 4/8 > bytes for all dicts and one ckalloc() for large dicts. The more I look at this, the more I'm concerned about the sheer complexity of what you propose. Building a new complex data structure is a non-trivial task and demonstrating that it is genuinely cheaper than existing ones (which are probably about as simple as they can be; there's very little waste) is quite hard. You've got some way to go to win me over as yet. (Call me conservative if you want.) Have you considered whether a different hash function would be better? (This is independent of the memory management details that you've been focussing on.) There's a long-standing issue with the fact that Tcl's hash functions are actually very weak; much better ones are available these days. Some performance data would be interesting. Donal.  |