On Wed, May 30, 2012 at 9:56 AM, Maynard Johnson <maynardj@...> wrote:
> On 05/29/2012 03:15 PM, Xin Tong wrote:
>> On Tue, May 29, 2012 at 3:44 PM, Xin Tong <xerox.time.tech@...> wrote:
>>> On Tue, May 29, 2012 at 3:06 PM, Maynard Johnson <maynardj@...> wrote:
>>>> On 05/29/2012 01:14 PM, Xin Tong wrote:
>>>>> what is the relationship between oprofiled and libopagent library. i
>>>>> have a binary translator that is generating code on the fly. i have
>>>>> libopagent library loaded into my binary translator and on every
>>>>> translation i call the op_write_native_code function. on the other
>>>>> hand, i am running oprofiled, however, the oprofile still can not
>>>>> recognized the translated code.
>>>> Xin, please read http://oprofile.sourceforge.net/doc/devel/index.html. If you still have questions, feel free to post a follow-up.
>>> I was looking at this document when i intergrated libopagent into my
>>> binary translator. I do see the jitdump being created and my
>>> translated code being registered there. but how can i get opreport to
>>> process it ?
>> Ok , i got it working. I was not register the dynamically generated
>> traces correctly. I am running into another problem. there is a lock
>> that is never released in op_write_native_code. the call stack looks
>> like this. do you know which file op_write_native_code is trying to
>> hold a lock on ? and who else could possible hold the lock ?
> It's locking the dump file (named "<pid>.dump") that's stored in /var/lib/oprofile/jitdump. Other than the libopagent library that's writing to that file, there is another oprofile process (opjitconv) that periodically searches for newly-updated jit dump files. When it finds one, it makes a copy and does some processing on the copy. I doubt that could be causing the problem. The only other process that accesses the jit dump files is opcontrol. Old, unused jit dump files are deleted by opcontrol when you do either "--start" or "--reset". But opcontrol has safeguards to make sure it doesn't delete files that are currently open by another process.
> I'm not the expert in this area of the code, but according to the comment in opagent.c, the flockfile API is used to make "sure that we continuously write this record, if we are called within a multi-threaded context". Does your JIT compiler spawn threads to execute dynamically generated code, and might there be multiple threads running simultaneously? This code works OK for all JAVA JIT compilers that have been tested, so there must be something unique about your VM.
I get oprofile 0.9.7 and compiled the libopagent.c with some
instrumentation. i.e. printing before and after holding the files
locks on the dumpfile. This is what it looks like
locking dumpfile for the 997 times
locked dumpfile for the 997 times
unlocking dumpfile for the 997 times
unlocked dumpfile for the 997 times
locking dumpfile for the 998 times
locked dumpfile for the 998 times
unlocking dumpfile for the 998 times
unlocked dumpfile for the 998 times
locking dumpfile for the 999 times
locked dumpfile for the 999 times
unlocking dumpfile for the 999 times
unlocked dumpfile for the 999 times
what is strange is that this times it did not hang in the
locking/unlocking on the dumpfile. instead, it hangs in the printf
function. there is only one thread in my dynamic translator. and the
call stack looks like this.
#0 0x000000377b4f542e in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x000000377b449010 in _L_lock_790 () from /lib64/libc.so.6
#2 0x000000377b443e10 in vfprintf () from /lib64/libc.so.6
#3 0x000000377b44ea2a in printf () from /lib64/libc.so.6
#4 0x00007fc0e97e2767 in op_write_native_code (hdl=0xe64c40,
symbol_name=0x416c6d20 "translation-0x4167f370", vma=<value optimized
out>, code=0x4167f370, size=42) at opagent.c:263
Search online a bit. it seems to be a known issue on the libc side.
but i can not be fully sure. anyways, i am reverting to a temporary
solution which i use pthread mutex to guranteed critical section as
supposed to flock/funlock. there is another function that can write to
the dumpfile- the write_debug_symbol function. but i know i am not
using it. so it is working for me now at least.
please keep me posted if you find anything interesting that might
explain the seemingly likely libc problem.
>> #0 0x000000377b4f542e in __lll_lock_wait_private () from /lib64/libc.so.6
>> #1 0x000000377b4639e5 in _L_lock_23 () from /lib64/libc.so.6
>> #2 0x000000377b4639c6 in flockfile () from /lib64/libc.so.6
>> #3 0x00007ffff7b1d672 in op_write_native_code () from
>> #4 0x000000007106eea8 in emit_fragment_common (dcontext=0x40006a80,
>> tag=0x377b47413d "H\213\203\210", ilist=0x40097860, flags=0,
>> link_fragment=true, add_to_htable=true, replace_fragment=0x0) at
>>>>> Live Security Virtual Conference
>>>>> Exclusive live event will cover all the ways today's security and
>>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>>> will include endpoint security, mobile security and the latest in malware
>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>> oprofile-list mailing list