From: SourceForge.net <no...@so...> - 2012-08-28 08:50:25
|
Bugs item #3562119, was opened at 2012-08-27 05:37 Message generated for change (Comment added) made by villettejp You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=3562119&group_id=12694 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: agent Group: hpux Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jean-Paul VILLETTE (villettejp) Assigned to: Niels Baggesen (nba) Summary: segfault in snmpd 5.6.1.1, file at.c on HPUX 11i Initial Comment: file : /agent/mibgroup/mibII/at.c cause : segfault in at.c in line 754 (HPUX specific lines): 748 #elif defined(hpux11) 749 if (arptab_current < arptab_size) { 750 /* 751 * copy values 752 */ 753 *IPAddr = at[arptab_current].NetAddr; 754 memcpy(PhysAddr, at[arptab_current].PhysAddr.o_bytes, 755 at[arptab_current].PhysAddr.o_length); 756 *ifType = at[arptab_current].Type; 757 *ifIndex = at[arptab_current].IfIndex; 758 *PhysAddrLen = at[arptab_current].PhysAddr.o_length; This memcpy segfaults for the two usual reasons : - wrong address - and more often wrong length The at field is used by these lines (HPUX specific lines): 546 if (at) 547 free(at); 548 at = (mib_ipNetToMediaEnt *) 0; 549 arptab_size = 0; 550 551 if ((fd = open_mib("/dev/ip", O_RDONLY, 0, NM_ASYNC_OFF)) >= 0) { 552 p.objid = ID_ipNetToMediaTableNum; 553 p.buffer = (void *) &val; 554 ulen = sizeof(int); 555 p.len = &ulen; 556 if ((ret = get_mib_info(fd, &p)) == 0) 557 arptab_size = val; 558 559 if (arptab_size > 0) { 560 ulen = (unsigned) arptab_size *sizeof(mib_ipNetToMediaEnt); 561 at = (mib_ipNetToMediaEnt *) malloc(ulen); 562 p.objid = ID_ipNetToMediaTable; 563 p.buffer = (void *) at; 564 p.len = &ulen; 565 if ((ret = get_mib_info(fd, &p)) < 0) 566 arptab_size = 0; 567 } 568 569 close_mib(fd); 570 } I solved this by these lines (HPUX specific lines): *** 559,569 **** --- 559,572 ---- if (arptab_size > 0) { ulen = (unsigned) arptab_size *sizeof(mib_ipNetToMediaEnt); at = (mib_ipNetToMediaEnt *) malloc(ulen); + memset(at, 0, ulen); p.objid = ID_ipNetToMediaTable; p.buffer = (void *) at; p.len = &ulen; if ((ret = get_mib_info(fd, &p)) < 0) arptab_size = 0; + else + arptab_size = *p.len / sizeof(mib_ipNetToMediaEnt); } The problem comes from the use of the API get_mib_info() (at least not well documented). The first use of this API is done to get the number of ARP entries and the second one is done to obtain the entries. But, when a machine is very active, the number of entries between the two calls may decrease. In this case when the second call works on a ARP entries list shorter than the first call did, the API get_mib_info() only works on the memory corresponding to the second number of entries (the shortest one), making the rest of the buffer unused. it's the use of this unused so unset portion of the memory which make the memcpy() to segfault. i make two modifications : - zeroed this segment of memory to be sure to pass the correct values to memcpy or at least zeros. - recompute the real length of the list of arp entries. --------------- You can illustrate the get_mib_info()'s behavior by this simple test. First, ping your broadcast address to make all your subnet's neighbors reply. This action should increase the size of your ARP table. The following lines are the source code of the tool i used to duplicate this problem. For a large part, it a copy of the snmpd code. I just added a sleep() to increase artificially the time between the two get_mib_info(), and a conditionnal printf() to see the ARP entries when the numbers of items are differenti (so to print the ARP entries when the problem occurs). #include <stdio.h> #include <sys/fcntl.h> #include <stdlib.h> #include <string.h> #include <sys/time.h> #include <sys/types.h> #include <sys/mib.h> #include <netinet/mib_kern.h> int main(int argc, char **argv) { int fd; struct nmparms p; int val; unsigned int ulen; int ret; static mib_ipNetToMediaEnt *at = (mib_ipNetToMediaEnt *) 0; int arptab_size = 0; int arptab_current = 0; if ((fd = open_mib("/dev/ip", O_RDONLY, 0, NM_ASYNC_OFF)) >= 0) { p.objid = ID_ipNetToMediaTableNum; p.buffer = (void *) &val; ulen = sizeof(int); p.len = &ulen; if ((ret = get_mib_info(fd, &p)) == 0) arptab_size = val; sleep( 2 ); if (arptab_size > 0) { ulen = (unsigned) arptab_size *sizeof(mib_ipNetToMediaEnt); at = (mib_ipNetToMediaEnt *) malloc(ulen); /* memset(at, 0, ulen); */ p.objid = ID_ipNetToMediaTable; p.buffer = (void *) at; p.len = &ulen; if ((ret = get_mib_info(fd, &p)) < 0) arptab_size = 0; /* else arptab_size = *p.len / sizeof(mib_ipNetToMediaEnt); */ } for(arptab_current=0;arptab_current<arptab_size;arptab_current++) { if (arptab_size != (*p.len / sizeof(mib_ipNetToMediaEnt))) { printf("(%u,%u) ", arptab_current+1, arptab_size); printf("%d ", at[arptab_current].PhysAddr.o_length); printf("%02x:%02x:%02x:%02x:%02x:%02x\n", at[arptab_current].PhysAddr.o_bytes[0], at[arptab_current].PhysAddr.o_bytes[1], at[arptab_current].PhysAddr.o_bytes[2], at[arptab_current].PhysAddr.o_bytes[3], at[arptab_current].PhysAddr.o_bytes[4], at[arptab_current].PhysAddr.o_bytes[5]); } } close_mib(fd); } } The effects may be, when the ARP decreases : - wrong last entries, - in rare cases, segfaults. For example, i just did the test. After several minutes, the number of items decreased from 314 to 245. Here is the result you can see as output from the tool : bash-3.00# while true; do ./mygetmib | tee logs ; done (247,318) 6 00:0f:20:2b:55:23 (248,318) 6 00:1a:4b:06:66:7a (249,318) 6 01:00:5e:00:00:00 (250,318) 3755991007 df:df:df:df:df:df (251,318) 3755991007 df:df:df:df:df:df (252,318) 3755991007 df:df:df:df:df:df (253,318) 3755991007 df:df:df:df:df:df (254,318) 3 00:00:00:05:6c:61 (255,318) 0 00:00:00:00:00:00 (256,318) 0 00:00:00:00:00:00 ..... The columns are : 1/ just the index in the list, 2/ the number of items returned by the first call of get_mib_info(), 3/ the size of the arp entry (also the size used for the memcpy()), 4/ the current arp entry This time, for example, the length would have been the origin of a segfault. To check the number of ARP entries at the same time, you can use the following command : bash-3.00# while true ; do arp -an | wc -l; sleep 5; done 309 311 311 311 313 314 314 314 314 314 245 NOTA : the number of entries differs between the two commands (between the tool and the arp -an) because both tests are not done at the exact same time. ---------------------------------------------------------------------- >Comment By: Jean-Paul VILLETTE (villettejp) Date: 2012-08-28 01:50 Message: I added the memset() line to be sure the memcpy() works. The memcpy() accepts the length set to zero by doing ... nothing. I am still concerned by a strange behavior of the API get_mib_info() : sometimes, the p.len returned by the second get_mib_info() is not a multiple of sizeof(mib_ipNetToMediaEnt) which make the arptab_size to be round up. That's why the line memcpy() stays even it should be not necessary. I am not yet successfull reproducing this behavior (and i don't know if i will) but it seems related with "unsucessfull resolutions" visible in "arp -an" with lines marked as "no entry" : #arp -an .... (150.130.22.232) at 0:d:29:84:31:cf ether 10.15.17.102 (10.15.17.102) -- no entry (150.130.22.234) at 0:d:29:cc:1a:b0 ether 10.15.16.102 (10.15.16.102) -- no entry (150.130.22.227) at 0:d:29:4b:f8:5c ether .... ---------------------------------------------------------------------- Comment By: Magnus Fromreide (magfr) Date: 2012-08-27 14:32 Message: I just wanted to chime in and say that I love this bug report. Thanks for doing such detailed work. ---------------------------------------------------------------------- Comment By: Niels Baggesen (nba) Date: 2012-08-27 12:05 Message: Yes, that certainly looks like a bug, not to check the actual amount of data returned. The call to memset on the other hand should not make any difference. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=3562119&group_id=12694 |