Re: [Valgrind-developers] vgPlain_search_transtab

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

2011/1/21 Julian Seward <js...@ac...>

>
> I am really surprised to see this.  I know that vgPlain_search_transtab
> does take some time, but it's not much more than 2 or 3 %.  Especially
> after I put in some hacks to make it cheaper, some time around 3.6.0
> (not sure when).
>
> The guest->host mapping is cached in a direct-mapped cache,
> VG_(tt_fast), and VG_(search_transtab) is only used when
> the cache misses.  But the cache typically has a 99% hit
> rate, so VG_(search_transtab) should not see much action.
>
> The only way I can see is that you are jumping between two
> pieces of code which are exactly 2^N bytes apart (for N=17,
> or something like that), in the address space.


> Then the
> cache will miss on each reference because both addresses map
> to the same line and there is no associativity and no
> victim cache.
>
> Can you send the results from --stats=yes ?  From that we can
> see the miss rate on VG_(tt_fast) and perhaps some other
> significant numbers.
>

This?
--17273-- translate:            fast SP updates identified: 0 (   --%)
--17273-- translate:   generic_known SP updates identified: 0 (   --%)
--17273-- translate: generic_unknown SP updates identified: 0 (   --%)
--17273--     tt/tc: 14,941,381 tt lookups requiring 97,504,544 probes
--17273--     tt/tc: 14,941,381 fast-cache updates, 13 flushes
--17273--  transtab: new        292,328 (4,802,899 -> 45,933,795; ratio
95:10) [0 scs]
--17273--  transtab: dumped     0 (0 -> ??)
--17273--  transtab: discarded  151 (1,798 -> ??)
--17273-- scheduler: 736,878,221 jumps (bb entries).
--17273-- scheduler: 9,067/37,285,698 major/minor sched events.
--17273--    sanity: 9068 cheap, 115 expensive checks.
--17273--    exectx: 769 lists, 0 contexts (avg 0 per list)
--17273--    exectx: 0 searches, 0 full compares (0 per 1000)
--17273--    exectx: 0 cmp2, 0 cmp4, 0 cmpAll
--17273--  errormgr: 0 supplist searches, 0 comparisons during search
--17273--  errormgr: 0 errlist searches, 0 comparisons during search


>
> J
>
> On Thursday, January 13, 2011, Konstantin Serebryany wrote:
> > Hi,
> >
> > I am running one large test (chrome browser on a heavy JS page) under
> > Memcheck and ThreadSanitizer.
> > The profile for ThreadSanitizer process looks like this:
> >
> > 151192   56.6740  tsan-amd64-linux         tsan-amd64-linux
> > vgPlain_search_transtab
> > 10702     4.0116  tsan-amd64-linux         tsan-amd64-linux
> > vgPlain_run_innerloop__dispatch_unprofiled
> > 9741      3.6514  tsan-amd64-linux         tsan-amd64-linux
> > vgPlain_discard_translations
> >
> >
> > Most of the time is spent in the inner loop in vgPlain_search_transtab
> >
> >      6  0.0024 :    3807fd75:   cltq
> >     33  0.0133 :    3807fd77:   mov    0x4241ba(%rip),%rcx        #
> > 384a3f38 <n_lookup_probes>
> >      8  0.0032 :    3807fd7e:   imul   $0x1030,%rax,%rax
> >     57  0.0230 :    3807fd85:   lea    0xfff1(%rcx),%rbp
> >
> >                :    3807fd8c:   mov    0x384a3fa8(%rax),%rbx
> >
> >    175  0.0706 :    3807fd93:   mov    %edx,%eax
> >     10  0.0040 :    3807fd95:   jmp    3807fdb7
> > <vgPlain_search_transtab+0xa7>
> >
> >                :    3807fd97:   nopw   0x0(%rax,%rax,1)
> >
> >   3992  1.6106 :    3807fda0:   cmp    %rsi,0x18(%r10)
> >  27213 10.9791 :    3807fda4:   je     3807fe00
> > <vgPlain_search_transtab+0xf0>
> >   7304  2.9468 :    3807fda6:   add    $0x1,%eax
> >    641  0.2586 :    3807fda9:   cmp    $0xfff1,%eax
> >   1485  0.5991 :    3807fdae:   cmove  %r12d,%eax
> >   7334  2.9589 :    3807fdb2:   cmp    %rbp,%rcx
> >    420  0.1694 :    3807fdb5:   je     3807fde0
> > <vgPlain_search_transtab+0xd0>
> >   2269  0.9154 :    3807fdb7:   movslq %eax,%r10
> >    414  0.1670 :    3807fdba:   add    $0x1,%rcx
> >   5084  2.0511 :    3807fdbe:   lea    (%r10,%r10,4),%r11
> >    409  0.1650 :    3807fdc2:   mov    %rcx,0x42416f(%rip)        #
> > 384a3f38 <n_lookup_probes>
> >   3501  1.4125 :    3807fdc9:   lea    (%r10,%r11,2),%r10
> >    697  0.2812 :    3807fdcd:   lea    (%rbx,%r10,8),%r10
> >   5691  2.2960 :    3807fdd1:   mov    0x8(%r10),%r11d
> >  65972 26.6165 :    3807fdd5:   test   %r11d,%r11d
> >   6220  2.5095 :    3807fdd8:   je     3807fda0
> > <vgPlain_search_transtab+0x90>
> >   3302  1.3322 :    3807fdda:   cmp    $0x2,%r11d
> >   7211  2.9093 :    3807fdde:   jne    3807fda6
> > <vgPlain_search_transtab+0x96>
> >     11  0.0044 :    3807fde0:   add    $0x1,%r13d
> >     40  0.0161 :    3807fde4:   add    $0x4,%r14
> >     11  0.0044 :    3807fde8:   cmp    $0x8,%r13d
> >      8  0.0032 :    3807fdec:   jne    3807fd6d
> > <vgPlain_search_transtab+0x5d>
> >
> > Memcheck profile looks a bit less scary, but still most of the time is
> > spent in transtab.
> >
> > 34472    12.4832  memcheck-amd64-linux     memcheck-amd64-linux
> > delete_translations_in_sector_eclass
> > 31870    11.5409  memcheck-amd64-linux     memcheck-amd64-linux
> > vgMemCheck_helperc_MAKE_STACK_UNINIT
> > 26495     9.5945  memcheck-amd64-linux     memcheck-amd64-linux
> > vgPlain_search_transtab
> > 26203     9.4888  memcheck-amd64-linux     memcheck-amd64-linux
> > vgPlain_discard_translations
> >
> >
> > Is there any known performance trouble in transtab when running jitted
> > code?
> > Are there any knobs one could tweak to boost transtab?
> >
> > Thanks!
> > --kcc
>
>