From: Filip P. <pi...@pu...> - 2010-03-25 22:41:58
|
It depends on how you use the yieldpoint. If you fire a yieldpoint then it's expensive. I haven't measured this precisely but I'd expect a yieldpoint that is fired for no reason (i.e. none of the yieldpoint's OSR, GC, or thread logic goes to slow path) will take about 5-10 microseconds on a fast Linux/x86 box. Typically yieldpoints are used for profiling by doing sampling; you fire them periodically. If you fire them at ~100Hz then you shouldn't perturb performance by more than 1% unless the code you've added to the yieldpoint handler is really complex (as in takes more than 100 microseconds or so). It seems to me that if you want a precise profile then you really need to insert your own IR code or machine code into the code stream generated by the opt compiler. That IR code would then update your counters or profiling buffers or whatever. So the only issue is deciding where to put the counters. I'd put them in memory. So then it's just an issue of deciding where in memory, and how to get the inserted IR will find that memory. Even just incrementing a counter on every iteration of a loop is likely to perturb execution by 5%, especially if you *really* need *every* iteration. This assumes that the counter is in memory, at a location that is readily accessible (if it's just a field in RVMThread, or at a well-known place in memory). If you're putting data into a buffer then it will be worse - you'll be inserting more code and that code will have to do more indirections. For some loops your instrumentation will be doing more work than a single loop iteration would have otherwise done. So you'll get a lot of overhead. I may be wrong ... others may be aware of some tricks. Perhaps if you do some compiler analysis then you can get the loop induction variable and update your profiling info in one step, thus eliminating the overhead. But that might be hard. You could also try stealing a register and using that as your counter. Again, this could be hard, and very error-prone. You might find a register to steal from Jikes RVM, but then you'll have some work to do to make sure that the register is saved and restored on native calls. Again, this will be hard. But basically I'm just throwing out random ideas since I don't really know what kind of profiling you want to do. :-) -Filip On 3/25/2010 6:16 PM, Kelvin Tian wrote: > Hi Filip, > > Thanks a lot for the helpful suggestions! I've also considered > the gc issues... Seems it's not a neat way to go, It may require much > effort to get it work correctly, and it may also cost some unexpected > overhead. > I need to collect some loop profiling information, and the > instrumentation overhead needs to be low, as it may be executed quite > frequently. Do you have any suggestions in mind to minimize the > instrumentation overhead? Would yielding point cause much overhead? > Thanks again for the helpful comments. > > > -- > Best regards, > Kelvin > > > On Wed, Mar 24, 2010 at 10:31 PM, Filip Pizlo <pi...@pu... > <mailto:pi...@pu...>> wrote: > > Hi Kelvin, > > It seems to me that what you're wanting to do should work, but may be > tricky to get right due to interaction with GC and other subsystems. > Also just getting the right assembly code generated for this might > require Effort. > > Basically here's the list of things that I'd immediately check for: > > - check, double-check, and then tripple-check the chunk of code you're > adding for silly mistakes. > - how are you referencing the buffer in the chunk of code? are you > trying to generate a field load? or are you just putting the object's > address into the machine code? (that's what I'd do) > - if you're putting the buffer object's address into the machine code, > are you making sure that the buffer is non-moving and otherwise > pinned? > you don't want it to move since the address is a constant in the > machine > code. that part is easy - MMTk supports non-moving allocation. then > you want to make sure that the GC knows that this object is alive - so > just make sure that you've either got a live thread that references it > from its stack (as opposed to referencing it from the code), or else > have a static field somewhere that references it. > - are you sure that you aren't accidentally instrumenting Jikes RVM's > code? this can be a *very* tricky property to ensure. In Jikes > RVM the > lines between "client code" and "VM code" are very blurry. This is > intentional. It makes the system (a) funner to play with and (b) in > many cases faster (or at least "easier to make fast") because > arbitrary > VM code can be inlined into arbitrary client code, for the most part. > - are you sure that you aren't accidentally instrumenting your > instrumentation? :-) > > There are probably other things you should look out for ... basically > what you're trying to do *sounds like it should just work* but may not > be easy to get right because of creepy corner cases. > > If you post a fuller crash dump or code snippet, then possibly some of > us might have some free time to take a peek and help you out. > > -Filip > > > On 3/24/2010 10:17 PM, Kelvin Tian wrote: > > Hi, > > > > I want to collect some runtime profiling information without > > using yielding points, as I think yielding points might cause some > > considerable overhead. I just want to add instrumentation counters > > into client program explicitly and then add some other > instrumentation > > to let the client program store the raw profiling information (i.e > > id, counter value) into an external array buffer declared in my host > > programs in Jikes. > > Is this way doable in Jikes? Can client programs access the > static > > variables declared in Jikes? I tried to implement my idea in > this way, > > but met with some difficulties, especially with how to add the chunk > > of code in client programs that store the profiling info into an > > external buffer declared in Jikes. Are there any suggestions? Thanks > > for any comments and suggestions! > > > > > > -- > > Best regards, > > Kelvin > > > > > ------------------------------------------------------------------------------ > > Download Intel® Parallel Studio Eval > > Try the new software tools for yourself. Speed compiling, find bugs > > proactively, and fine-tune applications for parallel performance. > > See why Intel Parallel Studio got high marks during beta. > > http://p.sf.net/sfu/intel-sw-dev > > _______________________________________________ > > Jikesrvm-researchers mailing list > > Jik...@li... > <mailto:Jik...@li...> > > https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Jikesrvm-researchers mailing list > Jik...@li... > <mailto:Jik...@li...> > https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > > > _______________________________________________ > Jikesrvm-researchers mailing list > Jik...@li... > https://lists.sourceforge.net/lists/listinfo/jikesrvm-researchers > |