Re: [Dpcl-develop] Using 'diag' test
Brought to you by:
dpcl-admin,
dwootton
|
From: Steve C. <sl...@sg...> - 2004-05-28 23:18:41
|
Dave -
Thanks for the reply. I feel better already!! I'll answer
your questions below and copy the newsgroup. Is that ok? If
not, I'll go back thru the newsgroup next time.
Thanks again - SteveC
>
> Steve
> This looks like the target program is what is crashing and you are ending
> up with a core file for that process. Is that correct?
Yes. The 'mutatee' in Dyninst parlance is getting the SEGV. And again,
the walkback changes ever so slightly every coupld of attempts at running
./eut_diag, and sometimes the 'mutatee' does not SEGV at all. Very flaky.
> If not, can you
> clarify what is crashing and how you are getting the traceback?
I am in the directory where I have run the mutatee, and I do:
gdb ../build/eon core.nnnnn
where 'eon' is the SPEC eon benchmark. It is run out of ../run/run.csh
and built out of ../build/build.csh. There is nothing company private
about 'eon' AFAIK. I <think> I could just ship it to you. Can you get
a copy out of IBM's benchmark suite?
> Where are
> you placing print statements in eut_diag.C?
Actually, I was putting 'fd writes' into the shared memory routines in
~dpcl/src/daemon_RT/src/os/linux/ShmManager.C, etc.. These slow things down
just enough to prevent the SEGV.
> The callback functions
> dcall_back and dcall_back2 or somewhere else?
From what I can tell the 'SOT_data' code is not executed because there is
no DATA (I guess) in DEFAULT_MODULE. Just functions are processed via ./eut_diag.
>
> Assuming this is the target, nothing that eut_diag.C should be causing
> timing problems with the possible exception of the dcall_back and
> dcall_back2 functions, and even then I have a hard time seeing a problem
> with the printf slowing down things enough to influence target execution.
> The path from the target program thru the daemon back to the client is
> pretty long anyway, and buffering of messages which happen to arrive so
> fast they cause Ais_send to slow down should be buffered in the daemon
> anyway.
>
Yeah, that was my thinking too. But I have been able to just #if(0) the
body of 'func_cb' in eut_diag.C and get the thing to run sometimes. Or
adjust the INTERVAL to 20 or so and things go.
I know it dies upon doing the 'resume_cb' every time. I suppose it could
be a case of bad instrumentation and the 'libunwind' on ia64 is somewhat
marginal (thus, bad walkbacks?).
> Basically, what eut_diag does is loads or attaches to the application,
> suspends it, finds all the function entry and exit points, instruments
> them, then resumes target execution.
Oh yeah. I've been thru eut_diag.C and know it almost like a friend. :^)
>
> The traceback you included does appear to be from within a call to
> Ais_send within the target after the target has been resumed. What is the
> target program logic? Are there functions being called in tight loops,
> such that the entry and exit probes are being called rapidly,
Oh yeah the "USER_CB" messages just fly by for a second or two before the
SEGV. And since the target is the SPEC benchmark, I suspect there are tight
loops everywhere - such is the nature of benchmark codes.
> or are there
> things like sleep() calls slowing down execution. Does that make a
> difference?
I've played with the 'sleep' in eut_diag.C and even the 'alarm', and
sometimes it makes a difference, sometimes it doesn't. Sorry, I'm not
trying to be evasive!!
Are you loading or connecting to the application?
Connecting: ./eut_diag d pid 'nnnnnnn'
> Is the
> target application single threaded?
>
Yes, single threaded.
> Those questions will help get us started. I will find some time next week
> to look at this further.
That would be terrific, Dave. I'll try my best to support you with information.
I have spend a lot of cpu/human time trying to finger this one.
Thanks Again - SteveC
>
>
> Hello, DPCL'ers. I am using DPCL/Dyninst on ia64 and, for some
> time now, have been trying to get the 'diag' (eut_diag) sample DPCL
> test running. It seems to (non-deterministically) be dying in the
> daemon_RT shared memory code, to wit:
>
> #0 shmFObjectAllocV (buffer=0x2000000000f15000, shm_key=
> {daemon_address = 0x2000000000e15000, process_address
> = 0x2000000000f15000},
> object_number=1, object_holder=0x2000000000e263c8,
> rc=0x60000fffffff8100)
> at ../src/os/linux/ShmManager.C:687
> 687 *p_free_object =
> (gdb) where
> #0 shmFObjectAllocV (buffer=0x2000000000f15000, shm_key=
> {daemon_address = 0x2000000000e15000, process_address
> = 0x2000000000f15000},
> object_number=1, object_holder=0x2000000000e263c8,
> rc=0x60000fffffff8100)
> at ../src/os/linux/ShmManager.C:687
> #1 0x2000000000e11090 in shm_processObjectAllocV (shm_key=
> {daemon_address = 0x2000000000e15000, process_address
> = 0x2000000000f15000},
> object_number=0, object_holder=0x2000000000e263c8,
> rc=0x60000fffffff8100)
> at ../src/os/linux/ShmManagerAPI_app.C:48
> #2 0x2000000000e119d0 in Ais_send
> (msg_handle_id=0x2000000000f225a4 "",
> message=0x200000000083e320, message_size=30) at
> ../src/os/linux/ShmMessageAPI_app.C:319
> #3 0x2000000000e11670 in Ais_send_int
> (msg_handle_id=0x2000000000f225a4 "",
> message=0x200000000083e320, message_size=30) at
> ../src/os/linux/ShmMessageAPI_app.C:85
> #4 0x200000000083e3e0 in DYNINSTstaticHeap_4M_anyHeap_1 ()
> from /scratch/slc/dpcl-install/lib/libdyninstAPI_RT.so.1
>
> This walkback from gdb is, itself, suspicious, but about all I have
> available.
> Again, the failure doesn't happen every time. If I tweak 'diag'
> (eut_diag.C)
> in a small, seemingly innocuous, way, then things work. If I put any
> sort of
> debug prints in place, then the SEGV doesn't appear. It really seems
> like a
> timing thing, but I can't finger the culprit.
>
> This is admittedly a fishing expedition. Does 'diag' have some sort of
> known
> race condition, or flaky behaviour? Just fishing....
>
> SteveC - SGI
>
>
> Steve Collins <sl...@sg...>
> Sent by: dpc...@ww...
> 05/28/2004 02:08 PM
>
> To
> dpc...@ww...
> cc
> sl...@sg...
> Subject
> [Dpcl-develop] Using 'diag' test
>
>
>
>
>
>
>
> Hello, DPCL'ers. I am using DPCL/Dyninst on ia64 and, for some
> time now, have been trying to get the 'diag' (eut_diag) sample DPCL
> test running. It seems to (non-deterministically) be dying in the
> daemon_RT shared memory code, to wit:
>
> #0 shmFObjectAllocV (buffer=0x2000000000f15000, shm_key=
> {daemon_address = 0x2000000000e15000, process_address
> = 0x2000000000f15000},
> object_number=1, object_holder=0x2000000000e263c8,
> rc=0x60000fffffff8100)
> at ../src/os/linux/ShmManager.C:687
> 687 *p_free_object =
> (gdb) where
> #0 shmFObjectAllocV (buffer=0x2000000000f15000, shm_key=
> {daemon_address = 0x2000000000e15000, process_address
> = 0x2000000000f15000},
> object_number=1, object_holder=0x2000000000e263c8,
> rc=0x60000fffffff8100)
> at ../src/os/linux/ShmManager.C:687
> #1 0x2000000000e11090 in shm_processObjectAllocV (shm_key=
> {daemon_address = 0x2000000000e15000, process_address
> = 0x2000000000f15000},
> object_number=0, object_holder=0x2000000000e263c8,
> rc=0x60000fffffff8100)
> at ../src/os/linux/ShmManagerAPI_app.C:48
> #2 0x2000000000e119d0 in Ais_send
> (msg_handle_id=0x2000000000f225a4 "",
> message=0x200000000083e320, message_size=30) at
> ../src/os/linux/ShmMessageAPI_app.C:319
> #3 0x2000000000e11670 in Ais_send_int
> (msg_handle_id=0x2000000000f225a4 "",
> message=0x200000000083e320, message_size=30) at
> ../src/os/linux/ShmMessageAPI_app.C:85
> #4 0x200000000083e3e0 in DYNINSTstaticHeap_4M_anyHeap_1 ()
> from /scratch/slc/dpcl-install/lib/libdyninstAPI_RT.so.1
>
> This walkback from gdb is, itself, suspicious, but about all I have
> available.
> Again, the failure doesn't happen every time. If I tweak 'diag'
> (eut_diag.C)
> in a small, seemingly innocuous, way, then things work. If I put any
> sort of
> debug prints in place, then the SEGV doesn't appear. It really seems
> like a
> timing thing, but I can't finger the culprit.
>
> This is admittedly a fishing expedition. Does 'diag' have some sort of
> known
> race condition, or flaky behaviour? Just fishing....
>
> SteveC - SGI
> Tools/Compilers
> _______________________________________________
> Dpcl-develop mailing list
> Dpc...@ww...
> http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop
>
>
|