From: SourceForge.net <no...@so...> - 2010-07-15 22:18:32
|
Bugs item #2821859, was opened at 2009-07-15 14:20 Message generated for change (Comment added) made by jholtom You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=2821859&group_id=12694 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: agent Group: solaris Status: Open Resolution: None Priority: 5 Private: No Submitted By: Daniel Matuschek (matuschd) Assigned to: Nobody/Anonymous (nobody) Summary: not starting in a Solaris 10 zone, error on subcontainer ... Initial Comment: if installed in a Solaris 10 zone, Net-SNMP 5.5 (tested with pre2 and pre3) does not start. snmpd logs "error on subcontainer 'interface container' insert (-1)" NetSNMP 5.4 is running ok in a Solaris zone truss shows the following: so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 7 so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 8 ioctl(8, SIOCGLIFNUM, 0xFFBF517C) = 0 ioctl(8, SIOCGLIFCONF, 0xFFBF516C) = 0 close(8) = 0 so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 8 ioctl(8, SIOCGLIFINDEX, 0xFFBF4F90) Err#6 ENXIO close(8) = 0 so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 8 ioctl(8, SIOCGLIFINDEX, 0xFFBF4F90) Err#6 ENXIO close(8) = 0 so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 8 ioctl(8, SIOCGLIFINDEX, 0xFFBF4F90) Err#6 ENXIO close(8) = 0 so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 8 ioctl(8, SIOCGLIFINDEX, 0xFFBF4F90) Err#6 ENXIO close(8) = 0 so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 8 ioctl(8, SIOCGLIFINDEX, 0xFFBF4F90) Err#6 ENXIO close(8) = 0 so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 8 ioctl(8, SIOCGLIFINDEX, 0xFFBF4F90) Err#6 ENXIO close(8) = 0 close(7) = 0 putmsg(6, 0xFFBFF400, 0x00000000, 0) = 0 ioctl(6, I_FLUSH, FLUSHRW) = 0 putmsg(6, 0xFFBFF3B8, 0x00000000, 0) = 0 ioctl(6, I_FLUSH, FLUSHRW) = 0 write(3, " e r r o r o n s u b".., 56) = 56 ---------------------------------------------------------------------- Comment By: James Holtom (jholtom) Date: 2010-07-15 23:18 Message: On a non-global zone this problem persists in the released v5.5, and I got bitten by problem this a few days ago, and although the Solaris code appears to have been fixed mid-June, there doesn't seem to be a patch available yet. All is not lost though, since you can dodge the problem by hacking the 'configure' script to stop it checking for the presence of 'if_nameindex' amongst all the other functions (line 34777 - just delete that one function name). I've applied this patch, and compiled it using SunFreeware version of gcc 3.4.6, openssl 1.0.0a. and it all seems to work just fine. ---------------------------------------------------------------------- Comment By: Tim Kennedy (tck1000) Date: 2010-04-07 23:02 Message: rstory: i'm not really a coder, but sometimes I can figure stuff out. if you can point me at the files you're referencing in your message of 2009-11-17, i'd be happy to help with this bug in any way I can. I've recently run into this same bug myself, and would like a solution. thanks ---------------------------------------------------------------------- Comment By: Robert Story (rstory) Date: 2009-11-17 21:44 Message: ok, after some debugging (see 2009-11-17 irc logs), it seems that getMibstat(MIB_INTERFACES, ... GET_EXACT, &IF_cmp, &interface) works (this is what the old 5.3 code does), but that getMibstat(MIB_INTERFACES, ... GET_FIRST, &Get_everyone, NULL) returns an empty (all 0) struct... thus there is no name to look up the ifIndex, and 0 is being used repeatedly... are interfaces sequential in solaris? should we loop from 1..N til one fails to find interfaces? And is there something easy we can test to know when GET_FIRST/GET_NEXT will work, vs iterating with GET_EXACT? ---------------------------------------------------------------------- Comment By: Robert Story (rstory) Date: 2009-11-17 16:32 Message: seems like this is caused by an OS quirk/bug: http://bugs.opensolaris.org/view_bug.do?bug_id=6640675 ---------------------------------------------------------------------- Comment By: Daniel Matuschek (matuschd) Date: 2009-07-31 08:41 Message: I did some more tests. The latest release that is working for me is NOT 5.4 as written above, but 5.3 ---------------------------------------------------------------------- Comment By: Daniel Matuschek (matuschd) Date: 2009-07-17 11:27 Message: ./configure --with-default-snmp-version="2" \ --with-sys-contact="(removed)" \ --with-sys-location="(removed)" \ --with-logfile="/var/log/snmpd.log" \ --sysconfdir="/etc/net-snmp" \ --with-persistent-directory="/var/net-snmp" \ --with-mib-modules="host ucd-snmp/diskio ucd-snmp/lmSensors" \ --enable-mfd-rewrites \ --prefix=/opt/net-snmp --disable-mfd-rewrites instead of --enable-mfd-rewrites or removing this option does not show any change in the resulting binary (the configuration file /include/net-snmp/net-snmp-config.h is different if I use different options, the NETSNMP_ENABLE_MFD_REWRITES seems to be set correctly by configure) ---------------------------------------------------------------------- Comment By: Thomas Anders (tanders) Date: 2009-07-17 11:22 Message: What's your full configure line? ---------------------------------------------------------------------- Comment By: Daniel Matuschek (matuschd) Date: 2009-07-17 10:15 Message: unfortunately using exclusive IP is not an option in our environment. Not using --enable-mfd-rewrites does also not solve the problem. Using configure with and without --enable-mfd-rewrites results in exactly the same snmpd binary (that is not working). Seems, that this configuration option is not working in 5.5pre3 ---------------------------------------------------------------------- Comment By: Anders Persson (apersson) Date: 2009-07-17 00:13 Message: This is due to if_nameindex(3) reporting physical interface names rather than the logical one, even though the zone does not have access to the physical interface. So issuing subsequent SIOCGLIF* ioctls for the physical interface will fail and things break down. A couple of work arounds: (1) Configure the zone to use an exclusive IP-stack (2) Do not use the --enable-mfd-rewrites flag (which will unfortunately also disable ifXTable) I probably need to revert to using SIOCGLIFCONF directly rather than if_nameindex(3) to get this to work nicely on Solaris 10. ---------------------------------------------------------------------- Comment By: Daniel Matuschek (matuschd) Date: 2009-07-16 13:51 Message: more log using debugging: trace: getMibstat(): kernel_sunos5.c, 655: kernel_sunos5: getMibstat (1, *, 224, 2, *, *) trace: getMibstat(): kernel_sunos5.c, 674: kernel_sunos5: ... cache_valid 1 time 30 ttl 30 now 1247748351 trace: getentry(): kernel_sunos5.c, 780: kernel_sunos5: bad cache length -224 - not multiple of entry size 224 trace: getif(): kernel_sunos5.c, 1368: kernel_sunos5: ...... using if_nameindex trace: getMibstat(): kernel_sunos5.c, 739: kernel_sunos5: ... result 1 rc 0 trace: getMibstat(): kernel_sunos5.c, 754: kernel_sunos5: ... getMibstat returns 0 trace: netsnmp_arch_interface_container_load(): if-mib/data_access/interface_solaris2.c, 81: access:interface:container:arch: processing '' trace: netsnmp_access_interface_entry_create(): if-mib/data_access/interface.c, 277: access:interface:entry: create trace: netsnmp_access_interface_index_find(): if-mib/data_access/interface.c, 197: access:interface:find: index trace: getMibstat(): kernel_sunos5.c, 655: kernel_sunos5: getMibstat (4, *, 100, 1, *, *) trace: getMibstat(): kernel_sunos5.c, 674: kernel_sunos5: ... cache_valid 0 time 0 ttl 60 now 1247748351 trace: getmib(): kernel_sunos5.c, 874: kernel_sunos5: ...... getmib (260, 20, ...) trace: getmib(): kernel_sunos5.c, 1030: kernel_sunos5: ...... getmib buffer size is 2000 trace: getmib(): kernel_sunos5.c, 1046: kernel_sunos5: ...... getmib returns 2 trace: getMibstat(): kernel_sunos5.c, 739: kernel_sunos5: ... result 1 rc 2 trace: getMibstat(): kernel_sunos5.c, 754: kernel_sunos5: ... getMibstat returns 1 trace: getMibstat(): kernel_sunos5.c, 655: kernel_sunos5: getMibstat (18, *, 176, 1, *, *) trace: getMibstat(): kernel_sunos5.c, 674: kernel_sunos5: ... cache_valid 0 time 0 ttl 30 now 1247748351 trace: getmib(): kernel_sunos5.c, 874: kernel_sunos5: ...... getmib (268, 25, ...) trace: getmib(): kernel_sunos5.c, 1030: kernel_sunos5: ...... getmib buffer size is 3520 trace: getmib(): kernel_sunos5.c, 1046: kernel_sunos5: ...... getmib returns 2 trace: getMibstat(): kernel_sunos5.c, 739: kernel_sunos5: ... result 1 rc 2 trace: getMibstat(): kernel_sunos5.c, 754: kernel_sunos5: ... getMibstat returns 1 trace: netsnmp_arch_interface_container_load(): if-mib/data_access/interface_solaris2.c, 171: access:interface:container:arch: interface '' have 32-bit stat counters compare:index: compare .0 to .0 compare:index: result was 0 trace: netsnmp_binary_array_insert(): container_binary_array.c, 348: container: not inserting duplicate key error on subcontainer 'interface container' insert (-1) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=2821859&group_id=12694 |