Re: [Dpcl-develop] Re: some DPCL questions
Brought to you by:
dpcl-admin,
dwootton
|
From: Dave W. <dwo...@us...> - 2004-03-03 17:32:16
|
Steve
I looked at the source file and I don't see anything obviously wrong.
Since you are using printf statements to try and track this down, I'm
guessing this is the target application side which you are looking at. Is
this correct?
If so, then I suspect some sort of sign extension bug somewhere when
casting from int to pointer types, unsigned long or long. It looks like
the person who wrote this code was pretty careful about casting to
unsigned long, so the problem is not obvious. This code does work with 64
bit target applications on AIX, so this is rather puzzling.
I would suggest a couple things
First, recompile the dpcl/src/daemon_RT directory with the -Wall compiler
option set. You can modify the dpcl/src/rules.mk.aix file, adding this
flag to the GLOBAL_CFLAGS and GLOBAL_CXX_FLAGS definitions. Once compiled,
look at the compiler diagnostics to see if there are any hints about
problems with imporoper sign conversions of truncations.
You will need to add more printf statements to the code to try to identify
where things are going wrong. I would start with parameters passed to all
of the functions in this file, paying particular attention to anything
that is a pointer or an integer with a negative calue. If you see pointers
which suddenly have zeroes or 0xffffffff in the upper 4 buyes, then that
is an indicator of a trucation or sign extension problem.
If looking at parameters does not help, then you need to start putting
printf statements in the code at signifucant points to try to track this
down further.
An alternative to printf statements is to attach to the target (assuming
the target is the problem) with a debugger after DPCL has inserted probes
and started the application. Some debuggers allow you to use a 'force
attach' option to force the debugger to steal ptrace control away from the
other program, in this case DPCL, which already has ptrace comtrol. Once
attached, set breakpoints and trace thru execution of the application.
As an aside, it looks like the problem here is that the storage at
*free_object_tail is zero, which is the basis for returning the error
status. Note that you have coded your printf statement with what looks
like intent to print out a 64-bit pointer, by use of the 0x%16x, but that
is not what you are really getting. In order to print a 64 bit pointer you
need to use either 0x%16llx 0r 0x%016llx, where the second zero pads the
data to be displayed. In this case, since what you are dereferencing by
*free_object_tail is an integer, you are actually printing the value of
the integer, although with 16 hex digits instead of 8.
Dave
Steve Collins <sl...@sg...>
Sent by: dpc...@ww...
03/02/2004 09:48 AM
To: dpc...@ww...
cc: sl...@sg...
Subject: [Dpcl-develop] Re: some DPCL questions
Once again many thanks to DaveW for his continued support. He
has solved the 'soft .vs. hard' external puzzle and I have a
rather hefty clue as to why the supposed 'limit of 45' problem
exists. See below.
Thanks to all - SteveC
Original 3 DPCL questions:
1. Why does there seem to be a 'limit of 45 callbacks' in my
'sleep mutator' testcase?
Analysis:
The following code in
~dpcl/src/daemon_RT/src/os/linux/ShmManger.C,
routine shmFObjectAllocV, is shutting things down after 45
callbacks:
if ((*free_object_tail != FREE_OBJECT_MAGIC_PATTERN) ||
(p_free_object [i]->mask != FREE_OBJECT_MAGIC_PATTERN)) {
.....
*rc = MEM_BAD_FREE_LIST;
return NULL;
}
I inserted a printf as follows:
if ((*free_object_tail != FREE_OBJECT_MAGIC_PATTERN) ) {
printf("MEM_BAD_FREE_object tail bad 0x%16x 0x%16x\n",
*free_object_tail, FREE_OBJECT_MAGIC_PATTERN);
}
and got the following result:
MEM_BAD_FREE_object tail bad 0x 0 0x deadbeaf
It is not clear if the preceding pointer arithmetic is bad or the
mask
is bad or just what is going wrong.
2. Code in the DPCL Library is causing 'unaligned access' errors.
Analysis:
An example of the code occurs in ~dpcl/src/lib/src/ModuleId.C
in routine 'ModuleId unpack_ModuleId(char **buffer)', to wit:
ModuleId
unpack_ModuleId(char **buffer)
{
char *data = *buffer;
char *uniqstr = data;
data = data + 1 + strlen(uniqstr); // don't forget the NULL
character
int *uint_p = (int *) data;
data = data + sizeof(int);
ModuleId new_mid = ModuleId(uniqstr, *uint_p);
.....
This last statement which derefernces from an 'int' alignment
(*uint_p)
seems to upset the ia64 hardware and I get something like this:
mutator(16567): unaligned access to 0x600000000000aa36,
ip=0x2000000000277bb0
Resolution of this problem appears to require finding all the
dubious code
which might cause such 'unaligned access' errors and rewriting it.
Future
project for now since this is just a performance issue, not a
functionality
issue.
3. Why can Dyninst find 'sleep' (the soft external) but DPCL cannot?
Answer: (DaveW's analysis)
The Hybrid uses a version of Dyninst that can find symbols in
dynamically shared/linked objects. DPCL has no such ability.
Mystery solved. Extending DPCL to handle dynamic/shared objects
has been on the 'list' from the beginning. Future project I
guess.
_______________________________________________
Dpcl-develop mailing list
Dpc...@ww...
http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop
|