|
From: Konstantin S. <kon...@gm...> - 2011-01-21 16:12:57
|
2011/1/21 Julian Seward <js...@ac...> > > I am really surprised to see this. I know that vgPlain_search_transtab > does take some time, but it's not much more than 2 or 3 %. Especially > after I put in some hacks to make it cheaper, some time around 3.6.0 > (not sure when). > > The guest->host mapping is cached in a direct-mapped cache, > VG_(tt_fast), and VG_(search_transtab) is only used when > the cache misses. But the cache typically has a 99% hit > rate, so VG_(search_transtab) should not see much action. > > The only way I can see is that you are jumping between two > pieces of code which are exactly 2^N bytes apart (for N=17, > or something like that), in the address space. > Then the > cache will miss on each reference because both addresses map > to the same line and there is no associativity and no > victim cache. > > Can you send the results from --stats=yes ? From that we can > see the miss rate on VG_(tt_fast) and perhaps some other > significant numbers. > This? --17273-- translate: fast SP updates identified: 0 ( --%) --17273-- translate: generic_known SP updates identified: 0 ( --%) --17273-- translate: generic_unknown SP updates identified: 0 ( --%) --17273-- tt/tc: 14,941,381 tt lookups requiring 97,504,544 probes --17273-- tt/tc: 14,941,381 fast-cache updates, 13 flushes --17273-- transtab: new 292,328 (4,802,899 -> 45,933,795; ratio 95:10) [0 scs] --17273-- transtab: dumped 0 (0 -> ??) --17273-- transtab: discarded 151 (1,798 -> ??) --17273-- scheduler: 736,878,221 jumps (bb entries). --17273-- scheduler: 9,067/37,285,698 major/minor sched events. --17273-- sanity: 9068 cheap, 115 expensive checks. --17273-- exectx: 769 lists, 0 contexts (avg 0 per list) --17273-- exectx: 0 searches, 0 full compares (0 per 1000) --17273-- exectx: 0 cmp2, 0 cmp4, 0 cmpAll --17273-- errormgr: 0 supplist searches, 0 comparisons during search --17273-- errormgr: 0 errlist searches, 0 comparisons during search > > J > > On Thursday, January 13, 2011, Konstantin Serebryany wrote: > > Hi, > > > > I am running one large test (chrome browser on a heavy JS page) under > > Memcheck and ThreadSanitizer. > > The profile for ThreadSanitizer process looks like this: > > > > 151192 56.6740 tsan-amd64-linux tsan-amd64-linux > > vgPlain_search_transtab > > 10702 4.0116 tsan-amd64-linux tsan-amd64-linux > > vgPlain_run_innerloop__dispatch_unprofiled > > 9741 3.6514 tsan-amd64-linux tsan-amd64-linux > > vgPlain_discard_translations > > > > > > Most of the time is spent in the inner loop in vgPlain_search_transtab > > > > 6 0.0024 : 3807fd75: cltq > > 33 0.0133 : 3807fd77: mov 0x4241ba(%rip),%rcx # > > 384a3f38 <n_lookup_probes> > > 8 0.0032 : 3807fd7e: imul $0x1030,%rax,%rax > > 57 0.0230 : 3807fd85: lea 0xfff1(%rcx),%rbp > > > > : 3807fd8c: mov 0x384a3fa8(%rax),%rbx > > > > 175 0.0706 : 3807fd93: mov %edx,%eax > > 10 0.0040 : 3807fd95: jmp 3807fdb7 > > <vgPlain_search_transtab+0xa7> > > > > : 3807fd97: nopw 0x0(%rax,%rax,1) > > > > 3992 1.6106 : 3807fda0: cmp %rsi,0x18(%r10) > > 27213 10.9791 : 3807fda4: je 3807fe00 > > <vgPlain_search_transtab+0xf0> > > 7304 2.9468 : 3807fda6: add $0x1,%eax > > 641 0.2586 : 3807fda9: cmp $0xfff1,%eax > > 1485 0.5991 : 3807fdae: cmove %r12d,%eax > > 7334 2.9589 : 3807fdb2: cmp %rbp,%rcx > > 420 0.1694 : 3807fdb5: je 3807fde0 > > <vgPlain_search_transtab+0xd0> > > 2269 0.9154 : 3807fdb7: movslq %eax,%r10 > > 414 0.1670 : 3807fdba: add $0x1,%rcx > > 5084 2.0511 : 3807fdbe: lea (%r10,%r10,4),%r11 > > 409 0.1650 : 3807fdc2: mov %rcx,0x42416f(%rip) # > > 384a3f38 <n_lookup_probes> > > 3501 1.4125 : 3807fdc9: lea (%r10,%r11,2),%r10 > > 697 0.2812 : 3807fdcd: lea (%rbx,%r10,8),%r10 > > 5691 2.2960 : 3807fdd1: mov 0x8(%r10),%r11d > > 65972 26.6165 : 3807fdd5: test %r11d,%r11d > > 6220 2.5095 : 3807fdd8: je 3807fda0 > > <vgPlain_search_transtab+0x90> > > 3302 1.3322 : 3807fdda: cmp $0x2,%r11d > > 7211 2.9093 : 3807fdde: jne 3807fda6 > > <vgPlain_search_transtab+0x96> > > 11 0.0044 : 3807fde0: add $0x1,%r13d > > 40 0.0161 : 3807fde4: add $0x4,%r14 > > 11 0.0044 : 3807fde8: cmp $0x8,%r13d > > 8 0.0032 : 3807fdec: jne 3807fd6d > > <vgPlain_search_transtab+0x5d> > > > > Memcheck profile looks a bit less scary, but still most of the time is > > spent in transtab. > > > > 34472 12.4832 memcheck-amd64-linux memcheck-amd64-linux > > delete_translations_in_sector_eclass > > 31870 11.5409 memcheck-amd64-linux memcheck-amd64-linux > > vgMemCheck_helperc_MAKE_STACK_UNINIT > > 26495 9.5945 memcheck-amd64-linux memcheck-amd64-linux > > vgPlain_search_transtab > > 26203 9.4888 memcheck-amd64-linux memcheck-amd64-linux > > vgPlain_discard_translations > > > > > > Is there any known performance trouble in transtab when running jitted > > code? > > Are there any knobs one could tweak to boost transtab? > > > > Thanks! > > --kcc > > |