Re: [Dpcl-develop] Re: some DPCL questions
Brought to you by:
dpcl-admin,
dwootton
From: Dave W. <dwo...@us...> - 2004-03-03 17:32:16
|
Steve I looked at the source file and I don't see anything obviously wrong. Since you are using printf statements to try and track this down, I'm guessing this is the target application side which you are looking at. Is this correct? If so, then I suspect some sort of sign extension bug somewhere when casting from int to pointer types, unsigned long or long. It looks like the person who wrote this code was pretty careful about casting to unsigned long, so the problem is not obvious. This code does work with 64 bit target applications on AIX, so this is rather puzzling. I would suggest a couple things First, recompile the dpcl/src/daemon_RT directory with the -Wall compiler option set. You can modify the dpcl/src/rules.mk.aix file, adding this flag to the GLOBAL_CFLAGS and GLOBAL_CXX_FLAGS definitions. Once compiled, look at the compiler diagnostics to see if there are any hints about problems with imporoper sign conversions of truncations. You will need to add more printf statements to the code to try to identify where things are going wrong. I would start with parameters passed to all of the functions in this file, paying particular attention to anything that is a pointer or an integer with a negative calue. If you see pointers which suddenly have zeroes or 0xffffffff in the upper 4 buyes, then that is an indicator of a trucation or sign extension problem. If looking at parameters does not help, then you need to start putting printf statements in the code at signifucant points to try to track this down further. An alternative to printf statements is to attach to the target (assuming the target is the problem) with a debugger after DPCL has inserted probes and started the application. Some debuggers allow you to use a 'force attach' option to force the debugger to steal ptrace control away from the other program, in this case DPCL, which already has ptrace comtrol. Once attached, set breakpoints and trace thru execution of the application. As an aside, it looks like the problem here is that the storage at *free_object_tail is zero, which is the basis for returning the error status. Note that you have coded your printf statement with what looks like intent to print out a 64-bit pointer, by use of the 0x%16x, but that is not what you are really getting. In order to print a 64 bit pointer you need to use either 0x%16llx 0r 0x%016llx, where the second zero pads the data to be displayed. In this case, since what you are dereferencing by *free_object_tail is an integer, you are actually printing the value of the integer, although with 16 hex digits instead of 8. Dave Steve Collins <sl...@sg...> Sent by: dpc...@ww... 03/02/2004 09:48 AM To: dpc...@ww... cc: sl...@sg... Subject: [Dpcl-develop] Re: some DPCL questions Once again many thanks to DaveW for his continued support. He has solved the 'soft .vs. hard' external puzzle and I have a rather hefty clue as to why the supposed 'limit of 45' problem exists. See below. Thanks to all - SteveC Original 3 DPCL questions: 1. Why does there seem to be a 'limit of 45 callbacks' in my 'sleep mutator' testcase? Analysis: The following code in ~dpcl/src/daemon_RT/src/os/linux/ShmManger.C, routine shmFObjectAllocV, is shutting things down after 45 callbacks: if ((*free_object_tail != FREE_OBJECT_MAGIC_PATTERN) || (p_free_object [i]->mask != FREE_OBJECT_MAGIC_PATTERN)) { ..... *rc = MEM_BAD_FREE_LIST; return NULL; } I inserted a printf as follows: if ((*free_object_tail != FREE_OBJECT_MAGIC_PATTERN) ) { printf("MEM_BAD_FREE_object tail bad 0x%16x 0x%16x\n", *free_object_tail, FREE_OBJECT_MAGIC_PATTERN); } and got the following result: MEM_BAD_FREE_object tail bad 0x 0 0x deadbeaf It is not clear if the preceding pointer arithmetic is bad or the mask is bad or just what is going wrong. 2. Code in the DPCL Library is causing 'unaligned access' errors. Analysis: An example of the code occurs in ~dpcl/src/lib/src/ModuleId.C in routine 'ModuleId unpack_ModuleId(char **buffer)', to wit: ModuleId unpack_ModuleId(char **buffer) { char *data = *buffer; char *uniqstr = data; data = data + 1 + strlen(uniqstr); // don't forget the NULL character int *uint_p = (int *) data; data = data + sizeof(int); ModuleId new_mid = ModuleId(uniqstr, *uint_p); ..... This last statement which derefernces from an 'int' alignment (*uint_p) seems to upset the ia64 hardware and I get something like this: mutator(16567): unaligned access to 0x600000000000aa36, ip=0x2000000000277bb0 Resolution of this problem appears to require finding all the dubious code which might cause such 'unaligned access' errors and rewriting it. Future project for now since this is just a performance issue, not a functionality issue. 3. Why can Dyninst find 'sleep' (the soft external) but DPCL cannot? Answer: (DaveW's analysis) The Hybrid uses a version of Dyninst that can find symbols in dynamically shared/linked objects. DPCL has no such ability. Mystery solved. Extending DPCL to handle dynamic/shared objects has been on the 'list' from the beginning. Future project I guess. _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |