|
From: Julian S. <js...@ac...> - 2008-01-15 21:28:22
Attachments:
helgrind-7350-64bit-SVal-and-writeback-cache.diff
|
> [...] change SVal to 64-bit at some point (I vote for it!), I already did that in Nov 07, but did not commit due to increase in run time and space usage. The resulting patch is attached. It allows up to 24 bits for lock- and thread-set IDs. I can't remember now why I limited it to 24 bits -- could go up to about 30 bits. Really 64-bit for a SVal is too much but 32-bit is not enough. Something like 48 bits would be a good compromise, but there is no sensible way to do that in C. Oh well. The patch also changes CacheLine to have a dirty bit in an attempt to reduce the performance overhead from writing back cache lines. That helps, although it also adds to the complexity and overhead of verifying that the shadow memory system is functioning correctly. J |
|
From: Konstantin S. <kon...@gm...> - 2008-01-16 12:14:06
|
Great! Do you plan to checkin these or similar changes in future? Maybe under #ifdef so that users do not suffer if they don't need too many [TL]SETs? While I think that something like 24 bits for [TL]SETs is a must for large apps, it might not be enough. We may need some garbage collection for [TL]SETs. Once in a while we could scan all memory, collect all used [TL]SETs and prune unused. We could also do garbage collection at thread_join and mutex_destroy (not every time, of course and only if number of [TL]SETs is getting close to the limit). Anyway, I'll try the 64-bit patch to see if garbage collection is really critical. --kcc On Jan 16, 2008 12:26 AM, Julian Seward <js...@ac...> wrote: > > > [...] change SVal to 64-bit at some point (I vote for it!), > > I already did that in Nov 07, but did not commit due to increase > in run time and space usage. The resulting patch is attached. > It allows up to 24 bits for lock- and thread-set IDs. I can't > remember now why I limited it to 24 bits -- could go up to about > 30 bits. > > Really 64-bit for a SVal is too much but 32-bit is not enough. > Something like 48 bits would be a good compromise, but there is > no sensible way to do that in C. Oh well. > > The patch also changes CacheLine to have a dirty bit in an attempt > to reduce the performance overhead from writing back cache lines. > That helps, although it also adds to the complexity and overhead > of verifying that the shadow memory system is functioning > correctly. > > J > |
|
From: Konstantin S. <kon...@gm...> - 2008-01-18 08:28:28
|
>> Anyway, I'll try the 64-bit patch I've tried both helgrinds (original one with 32-bit SVals and the one with 64-bit SVals). The test program natively works about 4 seconds. 'top' output (sampled each 5 seconds and took the largest one). 32bit SVal: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23903 kcc 18 0 2250m 1.2g 73m S 95 31.5 4:08.74 helgrind 64bit SVal PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27158 kcc 18 0 2374m 1.4g 115m S 4 35.6 3:54.36 helgrind native run (w/o helgrind) 715 kcc 24 0 1044m 87m 72m S 10 2.3 0:03.38 So, the memory footprint is about 15% larger with 64-bit SVal. Run times: 32-bit: 240-260 seconds. 64-bit: 230-260 seconds. So (modulo noise) the performance does not seem to be affected much. But this was 64-bit binary running on a 64-bit machine (core 2 duo). Profile (for both 64-bit and 32-bit): 184044 32.0981 cacheline_wback 124552 21.7224 cacheline_fetch 105773 18.4473 shadow_mem_set64 55917 9.7522 avl_find_node 10972 1.9136 vgPlain_ssort This profile is common for most other tests I've run so far. Thanks! --kcc On Jan 16, 2008 3:14 PM, Konstantin Serebryany <kon...@gm...> wrote: > Great! > Do you plan to checkin these or similar changes in future? > Maybe under #ifdef so that users do not suffer if they don't need too many > [TL]SETs? > > While I think that something like 24 bits for [TL]SETs is a must for large > apps, it might not be enough. > We may need some garbage collection for [TL]SETs. > Once in a while we could scan all memory, collect all used [TL]SETs and > prune unused. > We could also do garbage collection at thread_join and mutex_destroy (not > every time, of course and only if number of [TL]SETs is getting close to the > limit). > Anyway, I'll try the 64-bit patch to see if garbage collection is really > critical. > > > --kcc > > > On Jan 16, 2008 12:26 AM, Julian Seward < js...@ac...> wrote: > > > > > > [...] change SVal to 64-bit at some point (I vote for it!), > > > > I already did that in Nov 07, but did not commit due to increase > > in run time and space usage. The resulting patch is attached. > > It allows up to 24 bits for lock- and thread-set IDs. I can't > > remember now why I limited it to 24 bits -- could go up to about > > 30 bits. > > > > Really 64-bit for a SVal is too much but 32-bit is not enough. > > Something like 48 bits would be a good compromise, but there is > > no sensible way to do that in C. Oh well. > > > > The patch also changes CacheLine to have a dirty bit in an attempt > > to reduce the performance overhead from writing back cache lines. > > That helps, although it also adds to the complexity and overhead > > of verifying that the shadow memory system is functioning > > correctly. > > > > J > > > > |