dpcl-develop Mailing List for Dynamic Probe Class Library (Page 5)
Brought to you by:
dpcl-admin,
dwootton
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(9) |
Aug
(1) |
Sep
(3) |
Oct
(5) |
Nov
(5) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
|
Feb
|
Mar
(2) |
Apr
(1) |
May
(2) |
Jun
|
Jul
|
Aug
(4) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2003 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
(6) |
Jun
(4) |
Jul
(18) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
(26) |
Dec
(31) |
2004 |
Jan
(14) |
Feb
(5) |
Mar
(6) |
Apr
(1) |
May
(4) |
Jun
(8) |
Jul
(2) |
Aug
|
Sep
(4) |
Oct
(3) |
Nov
(7) |
Dec
|
2005 |
Jan
(8) |
Feb
(8) |
Mar
(1) |
Apr
(6) |
May
(2) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Steve C. <sl...@sg...> - 2003-12-09 20:10:26
|
I decided that surfing for obvious 32-bit problems was not working. DaveW suggested long ago that I just go in and find out EXACTLY what was failing. Sorry, Dave, it's taken me a while to get this straight!! So, I went into ~dpcl/src/daemon_RT/src/os/linux/ShmManager.C, routine shmFObjectAlloc and did the local file descriptor trick. THe failure is occurring thusly: if ((page->page_lock != 1) || (page->is_first != 1) || (page->object_size == 0)) { <failure> And it is failing because : shmFObjectAlloc enter shmFObjectAllocV page_lock != 1 shmFObjectAlloc VERIFY FAIL Why this is failing in 64-bit mode is not obvious to me. SteveC SGI Compilers/Tools |
From: Dave W. <dwo...@us...> - 2003-12-09 18:27:21
|
Steve I took a look at the shared memory code to check out if we were properly allocating the shared memory segments on Linux. It appears we are correctly dealing with the 16MB shared memory allocation by calling shm_initialize with a value of 16MB for Linux and 256MB for AIX. This is called from daemon_shm_init in dpcl/src/daemon/src/os/linux/ShmUsage.C We may want to reconsider the size of the allocation in the future, maybe as a user option or basing on some percentage of the shared memory allocation limit, but it should work for now. There is a second shared memory allocation that is made by a call to ibmBPatchShmManager::ibmBPatchShmManager in dpcl/src/dyninstAPI/src/os/linux/ibmBPatchShmManager.C which also allocates 16MB and works correctly. I'm not sure if the hybrid code does allocate shared memory or how it does so. At this point it looks like we are back to either debugging the daemon or adding log calls to try to get an idea where the shared memory code is failing. Finally, I noticed that the file dpcl/src/daemon/src/os/linux/ShmmanagerAPI.C and the corresponding AIX files are unused and can be deleted. I have deleted them from our cvs library. Dave ----- Forwarded by Dave Wootton/Poughkeepsie/IBM on 12/09/2003 10:19 AM ----- Dave Wootton 12/08/2003 04:31 PM To: sl...@sg... cc: dpc...@os... From: Dave Wootton/Poughkeepsie/IBM@IBMUS Subject: Re: [Dpcl-develop] ASC_insufficient_memory error Steve MAGIC_HEADER_PATTERN looks ok. This is just an integer that is used as a marker for determining shared memory is already initialized BITS_IN_A_WORD looks ok. The shared memory allocation code keeps track of 4096-byte memory pages with a bitmap, 1 bit per 4096 byte page. This constant is used in the calculation to determine the size of the bitmap array needed to hold 65536 (64K) 4096 byte pages, which covers 256MB of shared memory. It is also used in filling in the bitmap array when pages are allocated. The bitmap is defined as an array of unisgned in, so the value 32 is correct. FULL_ONE_MASK also looks correct. This is used in the loop marking shared memory pages allocated. The loop marks 32 shared memory bitmap entries as allocated at one time. There is code following the loop to handle the leftover pages which are not a multiple of 32 pages. WORD_WITH_LEFMOST_BIT_ON looks correct. This is used in a calculation defining which bit to look at in an unsigned int when determining shared memory allocations Now that we are trying to debug shared memory code, I remember a problem with shared memory allocation with the ia32 linux implementation when we first did the port. On AIX, we can get access to a 256MB shared memory segment except in cases where the application uses large amounts of memory for things like the heap (malloc'ed memory) segment. On ia32 linux, the default shared memory allocation limit was 16MB, at least on the system we were using for the port. We fixed the code allocating the shared memory, but apparently did not address the smaller allocation here. I doubt this has anything to do with your problem, but it might be worth investigating. If possible, I would like you to try setting the limit to 256MB I believe you can check the system shared memory allocation limit by issuing 'cat /proc/sys/kernel/shmmax' which will display a single number. I also believe you can set this limit by issuing (as root) the command 'echo nnn > /proc/sys/kernel/shmmax' where nnn is the number of bytes max allocation. For 256MB use 268435456. Whatever you set will be the new limit until changed again or the system is rebooted. If I get a chance to look at code tomorrow, I will see if I can find the code where we originally fixed this and see what possible problems exist by assuming 256MB shared memory exists. Dave sl...@sg... Sent by: dpc...@ww... 12/08/2003 02:58 PM To: dpc...@ww... cc: sl...@sg... Subject: [Dpcl-develop] ASC_insufficient_memory error Tried DaveW's suggestion re: XoredPtrs and this did not move my problem. I will definitely put it on my running list of 64-bit considerations though. Looking thru some #defines in ~dpcl/src/daemon_RT/include/os/linux/ShmManager.h I see a number of 32-bit problems, to wit: #define MAGIC_HEADER_PATTERN 0x88888888 #define BITS_IN_A_WORD 32 #define FULL_ONE_MASK 0xffffffff #define WORD_WITH_LEFMOST_BIT_ON 0x80000000 Are these actually used or they also innocuous like some of my earlier reported suspects?? Thanks again to DaveW - SteveC SGI Compilers/Tools _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |
From: Dave W. <dwo...@us...> - 2003-12-09 00:27:09
|
Steve MAGIC_HEADER_PATTERN looks ok. This is just an integer that is used as a marker for determining shared memory is already initialized BITS_IN_A_WORD looks ok. The shared memory allocation code keeps track of 4096-byte memory pages with a bitmap, 1 bit per 4096 byte page. This constant is used in the calculation to determine the size of the bitmap array needed to hold 65536 (64K) 4096 byte pages, which covers 256MB of shared memory. It is also used in filling in the bitmap array when pages are allocated. The bitmap is defined as an array of unisgned in, so the value 32 is correct. FULL_ONE_MASK also looks correct. This is used in the loop marking shared memory pages allocated. The loop marks 32 shared memory bitmap entries as allocated at one time. There is code following the loop to handle the leftover pages which are not a multiple of 32 pages. WORD_WITH_LEFMOST_BIT_ON looks correct. This is used in a calculation defining which bit to look at in an unsigned int when determining shared memory allocations Now that we are trying to debug shared memory code, I remember a problem with shared memory allocation with the ia32 linux implementation when we first did the port. On AIX, we can get access to a 256MB shared memory segment except in cases where the application uses large amounts of memory for things like the heap (malloc'ed memory) segment. On ia32 linux, the default shared memory allocation limit was 16MB, at least on the system we were using for the port. We fixed the code allocating the shared memory, but apparently did not address the smaller allocation here. I doubt this has anything to do with your problem, but it might be worth investigating. If possible, I would like you to try setting the limit to 256MB I believe you can check the system shared memory allocation limit by issuing 'cat /proc/sys/kernel/shmmax' which will display a single number. I also believe you can set this limit by issuing (as root) the command 'echo nnn > /proc/sys/kernel/shmmax' where nnn is the number of bytes max allocation. For 256MB use 268435456. Whatever you set will be the new limit until changed again or the system is rebooted. If I get a chance to look at code tomorrow, I will see if I can find the code where we originally fixed this and see what possible problems exist by assuming 256MB shared memory exists. Dave sl...@sg... Sent by: dpc...@ww... 12/08/2003 02:58 PM To: dpc...@ww... cc: sl...@sg... Subject: [Dpcl-develop] ASC_insufficient_memory error Tried DaveW's suggestion re: XoredPtrs and this did not move my problem. I will definitely put it on my running list of 64-bit considerations though. Looking thru some #defines in ~dpcl/src/daemon_RT/include/os/linux/ShmManager.h I see a number of 32-bit problems, to wit: #define MAGIC_HEADER_PATTERN 0x88888888 #define BITS_IN_A_WORD 32 #define FULL_ONE_MASK 0xffffffff #define WORD_WITH_LEFMOST_BIT_ON 0x80000000 Are these actually used or they also innocuous like some of my earlier reported suspects?? Thanks again to DaveW - SteveC SGI Compilers/Tools _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |
From: <sl...@sg...> - 2003-12-08 23:27:05
|
Tried DaveW's suggestion re: XoredPtrs and this did not move my problem. I will definitely put it on my running list of 64-bit considerations though. Looking thru some #defines in ~dpcl/src/daemon_RT/include/os/linux/ShmManager.h I see a number of 32-bit problems, to wit: #define MAGIC_HEADER_PATTERN 0x88888888 #define BITS_IN_A_WORD 32 #define FULL_ONE_MASK 0xffffffff #define WORD_WITH_LEFMOST_BIT_ON 0x80000000 Are these actually used or they also innocuous like some of my earlier reported suspects?? Thanks again to DaveW - SteveC SGI Compilers/Tools |
From: Dave W. <dwo...@us...> - 2003-12-08 19:21:03
|
Steve I looked at shared memory management headers a bit this morning to see if there were any obvious errors An error I found was the definition of XoredPtrs in dpcl/src/daemon_RT/include/os/linux/ShmMessage.h. (Our AIX version has the same problem). This variable is used to exclusive or two pointers and store the result. The idea apparently is to use the exclusive or trick to quickly swap two variables. The problem is that since this is declared as int, the exclusive or results will not be as expected. We probably luck out if the two addresses are within the same 4GB range but I don't think it is wise to assume this. The fix is to change the declaration of XoredPtrs to 'long long'. I doubt this has anything to do with your current problems, but this looks like a difficult to track bug if it ever breaks. Dave Steve Collins <sl...@sg...> Sent by: dpc...@ww... 12/05/2003 11:04 AM To: dpc...@ww... cc: sl...@sg... Subject: [Dpcl-develop] ASC_insufficient_memory The shared memory 'header' shmData.header is 32-bit at its inception. I took DaveW's advice and put my own file descript 'fprintf' debug into the runtime daemon code for 'shmblockAlloc' and the buffers being passed in as parameters are all 32-bit. The 32-bit masks: ShmManager.h:97:#define FREE_OBJECT_MAGIC_PATTERN 0xdeadbeaf ShmManager.h:98:#define ALLOC_OBJECT_MAGIC_PATTERN 0xa5a55a5a clearly are not ready for 64-bits. _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |
From: Dave W. <dwo...@us...> - 2003-12-05 22:23:19
|
Steve I'm not sure what you mean about shmData.header being 32 bits at its inception. If I look at the definition of the ShmData structure in dpcl/src/daemon/include/os/linux/ShmUsage.h I see that header is defined as char * which should be ok. One thing I do find questionable is the line of code if ((int) shmData.header != -1) { at approximately line 277 of dpcl/src/daemon/src/os/linux/ShmUsage.C. This should probably read if ((long) shmData.header != 1L) { Looking at a cscope database of the DPCL source, it appears that every other use of shmData.header is either as a pointer of some type or as type uint64_t which is typedefed as unsigned long long. So it appears usage of shmData.header should be 64 bit clean with this one exception. Looking at usage of FREE_OBJECT_MAGIC_PATTERN, it appears this is only used in assignments to an integer field (32 bit) in shared memory headers as a tag to indigate to the shared memory manager that the block in question is available for allocation. I don't see any usages involving either a variable of pointer data type, or any of the 64 bit integer data types long, unsigned long, long long or unsigned long long. Any of thoe would be an error. It looks like the same considerations apply with ALLOC_OBJECT_MAGIC_PATTERN My cscope database shows that AIS_SHM_MASK is not used anywhere in the DPCL code and could probably be deleted entirely. If you found some code where it is not being treated properly, can you provide a reference (source file and code snippet) and I will follow up? Also, if you are seeing somwhere where these are being printed as 0xFFFFFFFFDEADBEAF or 0xFFFFFFFFA5A55A5A then that indicates that they got assigned to a signed 64 bit integer somewhere that is not obvious to me. Dave Steve Collins <sl...@sg...> Sent by: dpc...@ww... 12/05/2003 11:04 AM To: dpc...@ww... cc: sl...@sg... Subject: [Dpcl-develop] ASC_insufficient_memory The shared memory 'header' shmData.header is 32-bit at its inception. I took DaveW's advice and put my own file descript 'fprintf' debug into the runtime daemon code for 'shmblockAlloc' and the buffers being passed in as parameters are all 32-bit. The 32-bit masks: ShmManager.h:97:#define FREE_OBJECT_MAGIC_PATTERN 0xdeadbeaf ShmManager.h:98:#define ALLOC_OBJECT_MAGIC_PATTERN 0xa5a55a5a clearly are not ready for 64-bits. _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |
From: Steve C. <sl...@sg...> - 2003-12-05 19:10:19
|
The mask const unsigned long AIS_SHM_MASK = 0x4337696; in ~dpcl/src/daemon/src/os/linux/ShmUsage.c is also clearly not ready for 64-bits. |
From: Steve C. <sl...@sg...> - 2003-12-05 19:05:15
|
The shared memory 'header' shmData.header is 32-bit at its inception. I took DaveW's advice and put my own file descript 'fprintf' debug into the runtime daemon code for 'shmblockAlloc' and the buffers being passed in as parameters are all 32-bit. The 32-bit masks: ShmManager.h:97:#define FREE_OBJECT_MAGIC_PATTERN 0xdeadbeaf ShmManager.h:98:#define ALLOC_OBJECT_MAGIC_PATTERN 0xa5a55a5a clearly are not ready for 64-bits. |
From: Dave W. <dwo...@us...> - 2003-12-03 15:23:55
|
Steve I re-read your note this morning and saw the reference to a 64big version of the hello sample. My understanding from browsing my ia64 Linux kernel book from HP was that there are just two process models, ia64, which compiles to native ia64 machine code and where the address space is a full 64 bits with 64 bit pointers data type long being 64 bits wide, and an ia32 mode where the ia64 hardware enters an i386 emulation mode and processes are in the 32 bit model with 32 bit pointers and data type long is 32 bit. Does your reference to 64big mean there is a third process model? If so, what are the definitions od the process models for ia64 mode? I can't find a reference to a 64big mode on google. If there are multiple ia64 process models, say 64 and 64big, then multiple copies of libdpclRT would be required. The requirement is that the library be compiled to the same process address space model as the target executable (hello). There is no relationship between the model that DPCL daemon is compiled with and the model the target application is compiled with, except on ia64 you would not be able to invoke an executable that was running in ia32 mode since the instruction set between the two processes is different. Today on AIX, a 32 bit DPCL daemon can successfully control execution of both 32 bit and 64 bit AIX processes. The other thing I noticed in browsing this book is that for ia64 bit processes, the usable address range is 0x2000000000000000 and up. The range 0x0000000000000000 thru 0x1FFFFFFFFFFFFFFF is reserved for ia32 mode processes. So if you have a pointer with a value less than 0x2000000000000000 or less, I would be suspicious of the value of the pointer. Dave Steve Collins <sl...@cl...> Sent by: dpc...@ww... 12/02/2003 03:41 PM To: dpc...@ww... cc: sl...@cl... Subject: [Dpcl-develop] ASC_insufficient_memory error Many, many thanks to DaveW for his suggestions. I can response to some of them. First, the 'odd number of bytes' in the address is because <moi> did the formatting as %11x in the 'log_write' addition I made earlier. Yes, I had already figured out that 'log_write' was the way to go in the daemon code. My declaration for 'daemon_address' (this is JamesW's hybrid DPCL/Dyninst, recall) is (void *) so that should not present truncation problems. I did notice that the daemon_RT/64bit directory was being ignored (except for aix) so I built that directory and made a 64big version of the sample/hello/eut_hello program which directly linked to the '.a' in ~dpcl/ src/daemon_RT/64bit. No difference in the result so I guess we can rule out the RT library? SteveC SGI Compilers/Tools ps: changing %11x to %16x in the log_write just adds 5 more blanks. Obviously a 32-bit address resides in 'daemon_address' but it could still be a valid address. I don't think the ia64 machine we are using has all that much memory. Oh well.... _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |
From: Dave W. <dwo...@us...> - 2003-12-03 01:55:06
|
Steve What I was most suspicious of with the daemon address was that the address was referring to an odd byte boundary. My expectation was that memory allocation would result in an address aligned to an 8 byte boundary, with the last digit being 0 or 8. What you have could be correct, but an address ending with 'd' seems a bit unusual. The daemon_RT directory builds (for AIX) a 32 bit and 64 bit library. The reason this is done is that AIX supports both 32 bit and 64 bit address space models for user applications. Depending on the address model, the appropriate matching library object is used, either the 32 bit or 64 bit mode library. The linux case is a bit simpler, at least right now, since on i386, everything is32 bit mode and on ia64 is 64 bit mode. So when building on ia64, even though you are building in the 32 bit directory, you are in fact getting a 64 bit address space library module and everything should work, in principle. The 64 bit directory is irrelevant in the Linux case. Note that this does not mean the library code is clean. There still could be cases in the library where pointers are being cast to 32 bit integers and getting truncated. I'm not convinced we are getting that far yet though. I think where we are now is that the daemon code in the path you are debugging needs to have more log_write calls added, at least temporarily, to try to follow the flow and see where the code is going and what the critical values are. Either that, or be able to use a debugger to attach to the DPCL daemon and step thru the code and get a better idea what is going on. Dave Steve Collins <sl...@cl...> Sent by: dpc...@ww... 12/02/2003 03:41 PM To: dpc...@ww... cc: sl...@cl... Subject: [Dpcl-develop] ASC_insufficient_memory error Many, many thanks to DaveW for his suggestions. I can response to some of them. First, the 'odd number of bytes' in the address is because <moi> did the formatting as %11x in the 'log_write' addition I made earlier. Yes, I had already figured out that 'log_write' was the way to go in the daemon code. My declaration for 'daemon_address' (this is JamesW's hybrid DPCL/Dyninst, recall) is (void *) so that should not present truncation problems. I did notice that the daemon_RT/64bit directory was being ignored (except for aix) so I built that directory and made a 64big version of the sample/hello/eut_hello program which directly linked to the '.a' in ~dpcl/ src/daemon_RT/64bit. No difference in the result so I guess we can rule out the RT library? SteveC SGI Compilers/Tools ps: changing %11x to %16x in the log_write just adds 5 more blanks. Obviously a 32-bit address resides in 'daemon_address' but it could still be a valid address. I don't think the ia64 machine we are using has all that much memory. Oh well.... _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |
From: Steve C. <sl...@cl...> - 2003-12-02 23:44:44
|
Many, many thanks to DaveW for his suggestions. I can response to some of them. First, the 'odd number of bytes' in the address is because <moi> did the formatting as %11x in the 'log_write' addition I made earlier. Yes, I had already figured out that 'log_write' was the way to go in the daemon code. My declaration for 'daemon_address' (this is JamesW's hybrid DPCL/Dyninst, recall) is (void *) so that should not present truncation problems. I did notice that the daemon_RT/64bit directory was being ignored (except for aix) so I built that directory and made a 64big version of the sample/hello/eut_hello program which directly linked to the '.a' in ~dpcl/ src/daemon_RT/64bit. No difference in the result so I guess we can rule out the RT library? SteveC SGI Compilers/Tools ps: changing %11x to %16x in the log_write just adds 5 more blanks. Obviously a 32-bit address resides in 'daemon_address' but it could still be a valid address. I don't think the ia64 machine we are using has all that much memory. Oh well.... |
From: Dave W. <dwo...@us...> - 2003-12-02 20:42:27
|
Steve I looked some more at the code. Without code or a system to test on, these are just guesses and observations on my part. Hopefully it might provide clues what is going wrong. If you are unable to attach to the DPCL daemon with a debugger using the sleep() trick I mentioned, another approach is to place additional logging code into the DPCL daemon source and rebuild. The logging call is the log_write function which works similarly to printf. The difference is that log_write has a first parameter specifying the log level required to activate it. This corresponsds to the log detail level specified in the AIS_blog_on call. The remainder of the parameters are the print string and parameters. If you just use printf in the DPCL daemon code, your output will disappear since the daemon redirects stdout. You need to use the log_write function call to see your output. Another definition which looks suspect is in dyninstAPI/include/BPatch_threadInt.h for the daemon_address member of the BPatch_shm_key_str structure. This is declared as type BPatch_Address32 which is a typedef of a 32 bit integer, which will truncate 64 bit addresses. This looks like it should be BPatch_Address type. This probably doesn't affect you since this is part of our BPatch code. However, the daemon_address member is used in our shared memory support code, such as daemon/src/os/linux/ShmUsage.C, so it is probably wothwhile to check the definition of daemon_address in your code. There's also a declaration and use of the BPatch_Address32 type in dyninst_RT/src/os/aix/RTInit.C which looks incorrect but which I don't think affects you. I looked at the ShmFObjectAlloc code and don't see anything obvious where an address gets truncated. One thing that does concern me though is the value of daemon_address. The value printed is an odd byte value. I would expect to see an address that is aligned to at least a 4 byte boundary and probably an 8 byte boundary, since I think this is a base address for shared memory in the daeom process. Memory allocators typically allocate storage aligned at 8 byte boundaries so that variables defined as integers and such are aligned at proper boundaries. I'm assuming ia64 has similar preferences for boundary alignments. The other thing to consider is how Linux on ia64 defines the application address space. In 64 bit mode on AIX, all addresses I have seen in user process space are greater than 0x100000000. If ia64 Linux works similarly, then the daemon_address is also suspect as it is less than that value. If you haven't done this already, I would compile the code with options enabled to flag lines where pointers are possibly being truncated or where the compiler warns of the possibility of truncation due to casting a pointer to an integer. In order for the code to be 64-bit safe, any variable that is used in pointer manipulation must be defined as some type of pointer, long, signed long, or unsigned long. If a variable used in a pointer operation is defined as type int, unsigned int or simply unsigned, then the possibility of truncation exists. I think this is the most likely reason for problems with ia64. We should have cleared up any problems with byte ordering with our initial port to Linux since AIX is big-endian and Linux on ia32 is little-endian. If you need more help, let me know and I will see what I can do. Dave Steve Collins <sl...@cl...> Sent by: dpc...@ww... 12/01/2003 02:08 PM To: dpc...@ww... cc: sl...@cl... Subject: [Dpcl-develop] ASC_insufficient_memory error Verified that the 'key->daemon_address' is reasonable: 0x 1012000d. THis is what is being passed to 'shmFObjectAlloc' as the 1st param. I now see that the routine 'shmFObjectAlloc' is in the daemon_RT directory. I did build a new copy of the daemon runtime with the DEBUG_DAEMON_RT defined but the 'printf' info that is supposedly generated is being lost AFAICT. _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |
From: Dave W. <dwo...@us...> - 2003-12-02 00:22:48
|
Steve Since you are connecting to a running process, any printf, which normally goes to stdout, should be sent to whatever stdout is redirected to in this environment. If the process was started from a command line, I would expect output to appear in that command line window. If this process is being forked from something else, then stdout would be sent to wherever the invoking applicatioon is sending stdout. The way I usually deal with this problem is to specifically open a new file descriptor and write any debug output to that descriptor. Something like 'fileHandle = fopen("/tmp/debuginfo", "w")' at a point where I know I have control, such as right above where I want the debug output to be written, then write to that file descriptor using fprintf, not printf. I also noticed something suspicious in looking at how daemon_address is handled. It looks like it is correctly defined in the DPCL daemon and BPatch code as a 64 bit value. However, if I look at the definition in dyninst_RT/include/RTInit.h it is defined as 'unsigned' which gets you an unsigned int (32 bits). This will cause problems with a 64 bit shared memory address from the daemon. As such, I think this should be declared as 'unsigned long'. I don't know if this is your problem or not. This is one area were I would not be surprised that an address is being truncated, which is what I suspect is the problem here. Dave Steve Collins <sl...@cl...> Sent by: dpc...@ww... 12/01/2003 02:08 PM To: dpc...@ww... cc: sl...@cl... Subject: [Dpcl-develop] ASC_insufficient_memory error Verified that the 'key->daemon_address' is reasonable: 0x 1012000d. THis is what is being passed to 'shmFObjectAlloc' as the 1st param. I now see that the routine 'shmFObjectAlloc' is in the daemon_RT directory. I did build a new copy of the daemon runtime with the DEBUG_DAEMON_RT defined but the 'printf' info that is supposedly generated is being lost AFAICT. _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl |
From: Steve C. <sl...@cl...> - 2003-12-01 22:11:39
|
Verified that the 'key->daemon_address' is reasonable: 0x 1012000d. THis is what is being passed to 'shmFObjectAlloc' as the 1st param. I now see that the routine 'shmFObjectAlloc' is in the daemon_RT directory. I did build a new copy of the daemon runtime with the DEBUG_DAEMON_RT defined but the 'printf' info that is supposedly generated is being lost AFAICT. |
From: Steve C. <sl...@cl...> - 2003-12-01 19:18:58
|
The daemon is failing to allocate some sort of shared memory in the routine 'prepare_new_process_shm' of ~dpcl/src/deamon/src/os/linux/ShmUsage.C. The code in question: Pid_Entry *next_entry; if ((next_entry = (Pid_Entry *)shm_daemonObjectAlloc(pid_header->key, 0, &rc)) == NULL) { return AisStatus(ASC_insufficient_memory, ASC_error); } else { Providing 'zero' for the 2nd parameter forces the routine 'shm_daemonObjectAlloc' to do: return shmFObjectAlloc(shm_key.daemon_address, shm_key, rc); At this time I am unable to locate any reference to 'shmFObjectAlloc' beyond its use here. But I'm still searching. |
From: Steve C. <sl...@sg...> - 2003-11-22 01:43:46
|
Last post for today. I'm on vacation all next week and I'm sure DaveW is 'thankful' for that. Anyway, Dave correctly figured out that <somehow> the 'strace' and 'LD_DEBUG' messages were interfering with the communications betwen superdaemon/daemon/client. After adjusting for a 'full path' problem when I light off the 'hello' pid, I am getting a memory error in daemon trace file, to wit: Fri Nov 21 16:35:39 2003: opened log /tmp/dpclsd.11001 @Timing started connect:10997 enter connect_cb: key(60) connect_stopped(0) pid(10997) client(0) connect_cb(): create a new ProcessD for pid 10997 PModEntryInt (0x 240b0) enter ProcessD::connect() cannot find the specified client socket 0, maybe it is a new one get_full_path(): upon entry ps --cols 128 10997 | grep 10997 | awk '{print $5}' bpatch attach: /home/tulip28/slc/dyninst/hybrid/hybrid_071603/dpcl/src/samples/hello/hello 10997enter prepare_new_process_shm() enter locate_process_shm() failed to prepare a shm entry for process 10997 connect_cb(): ERROR, cannot connect to pid 10997:ASC_insufficient_memory. line 1065 in "callbacks.C" enter default_cb() default_cb(): EOF on client socket 0 enter disconnect_client( 0 ) enter client_table_remove_entry( 0 ) client_table_remove_entry(): client 0 is removed enter log_reset() default_cb(): send AIS_DAEMON_TERMINATE_MSG to SD, client_id = 0 enter terminate_ack_cb() terminate_ack_cb(): SD = 9, *** The End *** enter daemon_shm_free() DPCL daemon exited I'm done for today. Will get back to this when I return from vacation. Thanks, Dave!!!! |
From: Steve C. <sl...@sg...> - 2003-11-21 23:02:49
|
Kudos to DaveW for suggesting a xinetd.d config file for the superDaemon which does an strace. This allowed me to see the fork of the daemon child and its eventual execl as well. My look into the trace file shows the daemon child doing a bunch of: 8673 writev(11, [{"08673:\t", 7}, {"symbol=", 7}, {"memset", 6}, {"; lookup in file=", 18}, {"/lib/libpthread.so.0", 20}, {"\n", 1}], 6) = -9223372036854775808 8673 getpid() = -9223372036854775808 8673 writev(11, [{"08673:\t", 7}, {"symbol=", 7}, {"memset", 6}, {"; lookup in file=", 18}, {"/usr/lib/libstdc++-libc6.2-2.so."..., 33}, {"\n", 1}], 6) = -92233720368547758088673 getpid() = -9223372036854775808 8673 writev(11, [{"08673:\t", 7}, {"symbol=", 7}, {"memset", 6}, {"; lookup in file=", 18}, {"/lib/libm.so.6.1", 16}, {"\n", 1}], 6) = -9223372036854775808 8673 getpid() = -9223372036854775808 8673 writev(11, [{"08673:\t", 7}, {"symbol=", 7}, {"memset", 6}, {"; lookup in file=", 18}, {"/lib/libc.so.6.1", 16}, {"\n", 1}], 6) = -9223372036854775808 8673 getpid() = -9223372036854775808 8673 writev(11, [{"08673:\t", 7}, {"binding file ", 13}, {"/usr/lib/libdpcl.so.1", 21}, {" to ", 4}, {"/lib/libc.so.6.1", 16}, {": ", 2}, {"normal", 6}, {" symbol `", 9}, {"memset", 6}, {"\'", 1}], 10) = -9223372036854775808 8673 writev(11, [{" [", 2}, {"GLIBC_2.2", 9}, {"]\n", 2}], 3) = -9223372036854775808 8673 getpid() = -9223372036854775808 8673 writev(11, [{"08673:\t", 7}, {"symbol=", 7}, {"_S_refill__t24__default_alloc_te"..., 45}, {"; lookup in file=", 18}, {"/opt/dpcl/bin/dpcld", 19}, {"\n", 1}], 6) = -9223372036854775808 8673 getpid() = -9223372036854775808 8673 writev(11, [{"08673:\t", 7}, {"symbol=", 7}, {"_S_refill__t24__default_alloc_te"..., 45}, {"; lookup in file=", 18}, {"/usr/lib/libdpcl.so.1", 21}, {"\n", 1}], 6) = -9223372036854775808 8673 getpid() = -9223372036854775808 8673 writev(11, [{"08673:\t", 7}, {"symbol=", 7}, {"_S_refill__t24__default_alloc_te"..., 45}, {"; lookup in file=", 18}, {"/usr/lib/libdpclRT.so.1", 23}, {"\n", 1}], 6) = -9223372036854775808 8673 getpid() = -9223372036854775808 8673 writev(11, [{"08673:\t", 7}, {"symbol=", 7}, {"_S_refill__t24__default_alloc_te"..., 45}, {"; lookup in file=", 18}, {"/home/tulip28/slc/dyninst/hybrid"..., 112}, {"\n", 1}], 6) = -9223372036854775808 and then finally ending up with an infinite number of: 8673 writev(11, [{"\n", 1}], 1) = -9223372036854775808 8673 gettimeofday({1069443874, 29547}, NULL) = -9223372036854775808 8673 select(14, [0 12], [], NULL, NULL <unfinished ...> 8673 --- SIGALRM (Alarm clock) --- 8673 <... select resumed> ) = -1 (in [0 12]]) 8673 select(14, [0 12], [], NULL, NULL) = -1 (in [0 12]]) 8673 gettimeofday({1069443874, 523210}, NULL) = -9223372036854775808 8673 gettimeofday({1069443874, 523294}, NULL) = -9223372036854775808 8673 select(14, [0 12], [], NULL, NULL <unfinished ...> 8673 --- SIGALRM (Alarm clock) --- 8673 <... select resumed> ) = -1 (in [0 12]]) 8673 select(14, [0 12], [], NULL, NULL) = -1 (in [0 12]]) 8673 gettimeofday({1069443875, 24273}, NULL) = -9223372036854775808 8673 gettimeofday({1069443875, 24356}, NULL) = -9223372036854775808 8673 select(14, [0 12], [], NULL, NULL <unfinished ...> 8673 --- SIGALRM (Alarm clock) --- 8673 <... select resumed> ) = -1 (in [0 12]]) 8673 select(14, [0 12], [], NULL, NULL) = -1 (in [0 12]]) 8673 gettimeofday({1069443875, 525319}, NULL) = -9223372036854775808 8673 gettimeofday({1069443875, 525405}, NULL) = -9223372036854775808 8673 select(14, [0 12], [], NULL, NULL <unfinished ...> 8673 --- SIGALRM (Alarm clock) --- 8673 <... select resumed> ) = -1 (in [0 12]]) 8673 select(14, [0 12], [], NULL, NULL) = -1 (in [0 12]]) 8673 gettimeofday({1069443876, 26374}, NULL) = -9223372036854775808 8673 gettimeofday({1069443876, 26459}, NULL) = -9223372036854775808 8673 select(14, [0 12], [], NULL, NULL <unfinished ...> until I terminated the daemon with: 8673 select(14, [0 12], [], NULL, NULL) = -1 (in [0 12]]) 8673 gettimeofday({1069443932, 645484}, NULL) = -9223372036854775808 8673 gettimeofday({1069443932, 645569}, NULL) = -9223372036854775808 8673 select(14, [0 12], [], NULL, NULL <unfinished ...> 8673 --- SIGTERM (Terminated) --- 8672 <... read resumed> 0x20000000006ac2d2, 1952538797) = ? ERESTARTSYS (To be restarted) 8672 --- SIGCHLD (Child exited) --- 8672 read(15, "", 1952538797) = 0 8672 write(8, "DefaultCD EOF on socket 15\n", 27) = 27 What this all means is a mystery to me. But I'm working on it. SteveC |
From: Dave W. <dwo...@us...> - 2003-11-21 22:44:16
|
Steve I looked at the trace file. Can you remove the putenv calls you added to the super daemon code to get loader debug info? These are resulting in many lines of output in the trace file as strace records a trace record for each line of loader debg output. They may be interfering with the communication between the DPCL super daemon and DPCL daemon anyway. What I see in looking at the trace is the DPCL super daemon running for a while, as pid 7596. The super daemon issues a clone() system call to create the process for the DPCL daemon (pid 7597). The trace for the DPCL daemon shows all of the loader debug output being recorded. The last entry for the daemon process is a call to sigaction to ignore SIGCHLD. I'm not sure if this call is in the loader or is somewhere in the libc startup for the daemon. However, I would expect this to get as far as the sleep() call in the daemon at least. It looks like the daemon process has just stopped, which might be what the '?' status means in the ps output. If you can get this situation to occur again, what does a 'ps -lu <userid>' show? When I look at the trace for the DPCL super daemon, I see that processing following the clone call appears ok for a while. The super daemon issues a few more system calls, then issues a select system call to find a socket which is ready to read or write. This is normal since the super daemon is waiting for more connections from clients or messages from the daemon. Then there are a few read system calls where the data that is logged looks like the loader debug output. My guess is that the debug output is somehow interfering in the connection between the super daemon and daemon, since the super daemon should definitely not be getting this kind of data. Once you remove the putenv calls, try the test again. If it still fails, try once more with the /etc/xinetd.d/dpclSD file restored and xinetd reconfigured in case the output from strace might also be interfering. Note that if the sleep in the DPCL daemon is invoked, then the daemon will suspend for about 1 minute then attempt to resume normal operation. I don't understand why the DPCL super daemon log file always gets the same suffix. That seems like there is a super daemon process still running on the system, since the suffix is the result of a getpid call in the super daemon. The /tmp/dpclsd.lock should not have any effect on the super daemon or suffix to the super daemon log file. The purpose of that file is to implement a mechanism to ensure only a single super daemon is running on the node. When the first super daemon starts, it obtains a lock on that file by calling flock, nd holds that lock as long as it is running. A second connect request to the same node results in a second DPCL super daemon process being created. The second process also attempts to obtain the same lock, but cannot. So it transfers the connection to the first super daemon and exits. Note there should be no /dpclsd.lock file. Something odd is going on if you have such a file. Dave |
From: Steve C. <sl...@sg...> - 2003-11-21 20:21:32
|
This file hangs around and is what is causing an old superdaemonpid to be used. If I delete it, a new copy of the superdaemon fires off. Still can't get an 'strace' file from the comm daemon though. Maybe there is a housecleaning problem with the dpclsd.lock file? |
From: Steve C. <sl...@sg...> - 2003-11-21 20:15:26
|
I have noticed that the '/tmp/dpclsdSD.nnnn' logfile always has the same number 'nnnn' (actually it's 5047) even after I remove it and rerun the client. If 'nnnn' is supposed to be the pid #, then I would think a new pid # would be showing up. It's almost like an old superdaemon pid is around somewhere (not showing up via 'ps', though). Just an observation. |
From: Dave W. <dwo...@us...> - 2003-11-21 20:08:17
|
Steve I would expect /tmp/trace to be generated all the time or not at all. Another way to approach this that might get a clue what is happening is to undo the changes I gave you for strace and instead to start the DPCL super daemon under strace control. In order to do this, I modified my /etc/xinetd.d/dpclSD file to look like the following service dpclSD { socket_type = stream protocol = tcp wait = no user = root #server = /opt/dpcl/bin/dpclSD #server_args = /opt/dpcl/bin/dpcld /tmp/dpclSD01 /tmp/dpclsd server = /usr/bin/strace server_args = -f -o /tmp/trace /opt/dpcl/bin/dpclSD /opt/dpcl/bin/dpcld /tmp/dpclSD01 /tmp/dpclsd disable = no } The idea is for xinetd to invoke strace, which in turn invokes /opt/bin/dpclSD under strace control with strace flags -f and -o /tmp/trace. Make these changes, ensuring strace is actually /usr/bin/strace on your system and if not, adjust accordingly. Save the changes, and find the pid for xinetd, such as 'ps -ef | grep xinetd'. Reconfigure xinetd by 'kill -HUP <pid>' where <pid> is the xinetd pid. Check the man page for xinetd since the specific signal to force a reconfigure may be different. Check the reconfigure worked by checking the system log messages, /var/log/messages by default. There should be a message that the dpclSD service has been reconfigured. Once this all is complete, rerun your DPCL test and look at the /tmp/trace output file to see if you can find any indication after the fork call why DPCL is failing. Note that running strace seems to interfere with file descriptors passed to dpclSD since when dpclSD tries to send a message back to the client on my Linux system I get an error status. Hopefully this happens after the real case for the failure on your system. I'm not sure why the old dpclSD is getting run. As far as I know, Linux does not do anything to lock executables or shared libraries in storage until explicitly forced out. I suggest using the command 'ps -ef | grep dpclSD' to see if there is a leftover dpclSD process before starting your test. If you want me to look at the trace file, send it to me. Dave Steve Collins <sl...@sg...> 11/21/2003 10:42 AM To: Dave Wootton/Poughkeepsie/IBM@IBMUS cc: Subject: Re: Loader output from loading dpcld on ia64 Good morning, Dave. I put your 'strace' call in and removed the 'filtering'. Unfortunately '/tmp/trace' does not always get created when I execute the client. When it does get created, it is empty( there is NO <pid> for the strace, ever, AFAICT). Here's another weird fact: when I update dpclSD (e.g. with a new logfile entry) and do a 'make install', it is evident that the new dpclSD in /opt/dpcl/bin is not the one being executed (by xinetd?). It is an older (just slightly older, but older) copy of dpclSD that is being used. Oh well. FYI, I am on vacation all of next week. I plan a full day today, however, so if you want to send me tips, please do. Thanks for all your help, Dave!! SteveC > > Steve > I looked at the loader output you sent and did not see any errors. It > appears from the end of the file that the loader is transferring control > to /opt/dpcl/bin/dpcld. So at least in this case, it appears that the > daemon process is being loaded and invoked and something is going wrong > afterwards. > > Hopefully the strace output provides some clues what is happening here. > Based on this output, I would suggest not filtering strace by the use of > the -e trace=process option, but instead to capture all system calls. If > nothing appears obvious, send the output to my email address. > > Dave > --=_alternative 0052B70085256DE5_= > Content-Type: text/html; charset="US-ASCII" > > > <br><font size=2 face="sans-serif">Steve</font> > <br><font size=2 face="sans-serif">I looked at the loader output you sent > and did not see any errors. It appears from the end of the file that the > loader is transferring control to /opt/dpcl/bin/dpcld. So at least in this > case, it appears that the daemon process is being loaded and invoked and > something is going wrong afterwards.</font> > <br> > <br><font size=2 face="sans-serif">Hopefully the strace output provides > some clues what is happening here. Based on this output, I would suggest > not filtering strace by the use of the -e trace=process option, but instead > to capture all system calls. If nothing appears obvious, send the output > to my email address.</font> > <br> > <br><font size=2 face="sans-serif">Dave</font> > --=_alternative 0052B70085256DE5_=-- > |
From: Dave W. <dwo...@us...> - 2003-11-21 17:59:07
|
Steve I looked at the loader output you sent and did not see any errors. It appears from the end of the file that the loader is transferring control to /opt/dpcl/bin/dpcld. So at least in this case, it appears that the daemon process is being loaded and invoked and something is going wrong afterwards. Hopefully the strace output provides some clues what is happening here. Based on this output, I would suggest not filtering strace by the use of the -e trace=process option, but instead to capture all system calls. If nothing appears obvious, send the output to my email address. Dave |
From: Dave W. <dwo...@us...> - 2003-11-20 23:38:31
|
Steve In order to get a trace, strace needs to be invoked from within the super daemon. It looks like this can be done by the following modifications to dpcl/src/SD/src/SdCreateDaemon.C, which I have not tested here Locate the line } else if ( childpid == 0 ) { // CHILD PROCESS BEGINS *************** The code following this is the child path of the fork, where the DPCL daemon will be exec'ed.We want to suspend execution on this path until strace can start. Do this by coding 'sleep(30);' 30 seconds is probably excessive, but ensures the process is suspended until strace is started Then locate the line // CONTINUATION OF PARENT PROCESS **************************************** The code following this is the continuation of parent process for the DPCL super daemon. Invoke strace at this point by code along the lines of the following char strace_cmd[100]; sprintf(strace_cmd, "strace -p %d -e trace=process -o /tmp/trace &", childpid); system(strace_cmd); This code creates a strace command to attach to the pid of the daemon process. The '-e trace=process' string is an attempt to limit the size of the trace file, although I suspect it won't be large anyway. You can include this filtering if you like. The system() invocation invokes strace in the background so the main DPCL super daemon process is not hung. Once you have made these changes, run the client. Once you get the client termination you may need to find the strace process, which will be running as root, and kill it. Don't use kill -9 since that may result in trace output buffers not being flushed before the process is terminated. Look at the resulting trace file /tmp/trace and see if tehyre are any clues what is failing, or send me the output. Also, if you want to send me the system loader information you previously obtained, send that directly to my email account. Dave Steve Collins <sl...@sg...> Sent by: dpc...@ww... 11/20/2003 10:45 AM To: dpc...@os... cc: sl...@sg... Subject: [Dpcl-develop] re: ASC_daemon_communication_error DaveW suggests that if I am not seeing a pid for 'dpcld' (which I am not - even with the sleep(60) in main.C), then LD_OUTPUT_FILE might show something. It doesn't appear to be helpful - just a bunch of successful links and some symbol lookups. I tried one of DaveW's previous debugging tips and ran 'strace'. Here is what I got: socket(PF_UNIX, SOCK_STREAM, 0) = 7 connect(7, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7 socket(PF_UNIX, SOCK_STREAM, 0) = 8 connect(8, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) connect(7, {sin_family=AF_INET, sin_port=htons(7895), sin_addr=inet_addr("128.162.243.145")}}, 16) = 0 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0 socket(PF_UNIX, SOCK_STREAM, 0) = 8 connect(8, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_UNIX, SOCK_STREAM, 0) = 8 connect(8, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 9 bind(9, {sin_family=AF_INET, sin_port=htons(989), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(9, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(9, "\31\245\355\25\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(9, "\31\245\355\25\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 100 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 9 bind(9, {sin_family=AF_INET, sin_port=htons(990), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(9, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(9, "E\3129\333\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(9, "E\3129\333\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 144 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 9 bind(9, {sin_family=AF_INET, sin_port=htons(991), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(9, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(9, "Nj\345\211\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(9, "Nj\345\211\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 100 socket(PF_UNIX, SOCK_STREAM, 0) = 8 connect(8, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 8 socket(PF_UNIX, SOCK_STREAM, 0) = 9 connect(9, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) connect(8, {sin_family=AF_INET, sin_port=htons(7895), sin_addr=inet_addr("128.162.243.145")}}, 16) = 0 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 socket(PF_UNIX, SOCK_STREAM, 0) = 9 connect(9, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 10 bind(10, {sin_family=AF_INET, sin_port=htons(992), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(10, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(10, "U/p\244\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(10, "U/p\244\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 100 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 10 bind(10, {sin_family=AF_INET, sin_port=htons(993), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(10, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(10, "*\10\3\347\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(10, "*\10\3\347\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 144 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 10 bind(10, {sin_family=AF_INET, sin_port=htons(994), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(10, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(10, "%w\210\275\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(10, "%w\210\275\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 100 _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |
From: Steve C. <sl...@sg...> - 2003-11-20 18:45:40
|
DaveW suggests that if I am not seeing a pid for 'dpcld' (which I am not - even with the sleep(60) in main.C), then LD_OUTPUT_FILE might show something. It doesn't appear to be helpful - just a bunch of successful links and some symbol lookups. I tried one of DaveW's previous debugging tips and ran 'strace'. Here is what I got: socket(PF_UNIX, SOCK_STREAM, 0) = 7 connect(7, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7 socket(PF_UNIX, SOCK_STREAM, 0) = 8 connect(8, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) connect(7, {sin_family=AF_INET, sin_port=htons(7895), sin_addr=inet_addr("128.162.243.145")}}, 16) = 0 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0 socket(PF_UNIX, SOCK_STREAM, 0) = 8 connect(8, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_UNIX, SOCK_STREAM, 0) = 8 connect(8, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 9 bind(9, {sin_family=AF_INET, sin_port=htons(989), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(9, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(9, "\31\245\355\25\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(9, "\31\245\355\25\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 100 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 9 bind(9, {sin_family=AF_INET, sin_port=htons(990), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(9, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(9, "E\3129\333\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(9, "E\3129\333\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 144 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 9 bind(9, {sin_family=AF_INET, sin_port=htons(991), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(9, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(9, "Nj\345\211\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(9, "Nj\345\211\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 100 socket(PF_UNIX, SOCK_STREAM, 0) = 8 connect(8, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 8 socket(PF_UNIX, SOCK_STREAM, 0) = 9 connect(9, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) connect(8, {sin_family=AF_INET, sin_port=htons(7895), sin_addr=inet_addr("128.162.243.145")}}, 16) = 0 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 socket(PF_UNIX, SOCK_STREAM, 0) = 9 connect(9, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ECONNREFUSED (Connection refused) socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 10 bind(10, {sin_family=AF_INET, sin_port=htons(992), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(10, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(10, "U/p\244\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(10, "U/p\244\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 100 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 10 bind(10, {sin_family=AF_INET, sin_port=htons(993), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(10, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(10, "*\10\3\347\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(10, "*\10\3\347\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 144 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 10 bind(10, {sin_family=AF_INET, sin_port=htons(994), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied) setsockopt(10, SOL_IP, IP_RECVERR, [1], 4) = 0 sendto(10, "%w\210\275\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3"..., 76, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, 16) = 76 recvfrom(10, "%w\210\275\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8800, 0, {sin_family=AF_INET, sin_port=htons(872), sin_addr=inet_addr("128.162.236.207")}}, [16]) = 100 |
From: Dave W. <dwo...@us...> - 2003-11-20 17:32:23
|
Steve What's supposed to be happening here is that the DPCL super daemon has either created a DPCL daemon process or found a DPCL daemon process for your userid. The sendmsg call is intended to pass the socket connection established between client and DPCL super daemon to the DPCL daemon. The 'broken pipe' status here appears like the DPCL daemon process has disappeared, inplicitly closing its side of the pipe. If you start your client, can you see the DPCL daemon process running with your userid? If you have a sleep(60) at entry to main(), the daemon should suspend for 1 minute, giving you time to attach with gdb then continue execution once attached to see where the daemon crashes. If you are not seeing a DPCL daemon process, then I would guess there is still a problem loading the DPCL daemon that the LD_DEBUG and LD_OUTPUT_FILE might help identify. Dave Steve Collins <sl...@sg...> Sent by: dpc...@ww... 11/19/2003 06:09 PM To: dpc...@ww... cc: sl...@sg... Subject: [Dpcl-develop] ASC_daemon_communication_error After JamesW got me past the 'execl' problem, I get a little further and seem to have a 'broken pipe' problem with the daemon. The /tmp logfile shows the following: daissd child: chdir() to dir: /home/tulip28/slc SdCreateDaemon.C[445]: nonzero errno before fd[1]: 2(No such file or directory), not sure from what: No such file or directory SdCreateDaemon.C[481]: nonzero errno after STDERR_FILENO[1]: 9(Bad file descriptor), not sure from what: Bad file descriptor LD_LIBRARY_PATH=/home/tulip28/slc/dyninst/hybrid/hybrid_071603/dyninst_071603/core/../lib/ia64-unknown-linux2.4: Success DYNINSTAPI_RT_LIB=/home/tulip28/slc/dyninst/hybrid/hybrid_071603/dyninst_071603/core/../lib/ia64-unknown-linux2.4/libdyninstAPI_RT.so.1: Success daisSD: createDaemonProcess - child process, just before execl DefaultCD EOF on socket 13 daissd: SdRecvSocketDispatch - recv fd = 0 recf_fd: New fd = 9 daissd: SdRecvSocketDispatch client_auth_queued = 1 DefaultCD EOF on socket 8 EOF on socket 8 daissd: Client List 1 is NOT Empty SD: entered SdKeyMsgCB msg size 4 handle 58 Sec buffer = Wed Nov 19 16:44:00 2003 2 911289 Security buffer start: 60000fffffffb2d0 end: 60000fffffffb308 Actual security buffer end: 60000fffffffb308 size: x'60000fff00000038' daissd: Before SSM_send, handle 58 ReqLen2 = 56 SdParseUnsecureCB: entered SdParseUnsecureCB: Argument = hope.americas.sgi.com SdParseUnsecureCB: Reached homename SdParseUnsecureCB: Argument = 0x03030300l SdParseUnsecureCB: Reached version 0x03030300l SdParseUnsecureCB: Argument = 150 SdParseUnsecureCB: Reached userid SdParseUnsecureCB: Argument = slc SdParseUnsecureCB: Reached username SdParseUnsecureCB: Argument = 3F8 SdParseUnsecureCB: Reached groupid SdParseUnsecureCB: Argument = compiler SdParseUnsecureCB: Reached groupname SdParseUnsecureCB: Argument = /home/tulip28/slc SdParseUnsecureCB: Reached home dir SdParseUnsecureCB: Argument = Wed Nov 19 16:44:00 2003 SdParseUnsecureCB: Reached data SdParseUnsecureCB: Reached data arg2 2 SdParseUnsecureCB: Reached data arg3 911289 SdParseUnsecureCB: exit daissd: SdAuthCB, client version, 0x03030300 daissd: SdAuthCB, before SdSecurityCheck SdSecurity: entered SdSecurityCheck: User name is: slc SdSecurityCheck: home name is: hope.americas.sgi.com SdSecurityCheck: after ruserok: rc = OK Authentication successful SdDaisCLient constructor: userID = 150 SdDaisCLient constructor groupID= 3 SdDaisCLient constructor: groupName = compiler SdDaisCLient constructor: userName = slc SdDaisCLient constructor: homeDir = hope.americas.sgi.com daissd: SSM_send error sending msg to daemon send_fd: Sendmsg failure - could not send file descriptor: Broken pipe daissd: Could not send file descriptor to child daissd:createDaemonProcess - exit daissd: SdAuthCB, after addDaisClient daissd: SdAuthCB - BAD STATUS daissd: SdAuthCB - Exit client_auth_queued=0 daissd: SdRecvSocketDispatch - recv fd = 0 recf_fd: New fd = 8 daissd: SdRecvSocketDispatch client_auth_queued = 1 SD: entered SdKeyMsgCB msg size 4 handle 58 Sec buffer = Wed Nov 19 16:44:06 2003 3 139531 _______________________________________________ Dpcl-develop mailing list Dpc...@ww... http://www-124.ibm.com/developerworks/oss/mailman/listinfo/dpcl-develop |