I work on the DNX project (also on sf.net). Recently, one of my testers reported a crash in my code due to a null pointer dereference. It appears that a call to getifaddrs was returning a list of interface structures wherein at least one of the structures in the list contained a NULL address pointer.
I've tested my code on 8 or 10 platforms, and this is the first time I've ever heard of this situation. When I asked him about his network config, he told me it was probably because he was running DNX on a RHEL3 server configured for bonding. He passed me the output of ifconfig on his machine, and sure enough, the first interface was listed as bond0, bonded to the second interface, eth0.
He tells me it works intermittently after reboots. It appears that sometimes eth0 will come up first in the list, followed by the bonded interface. In this situation there is no crash (because DNX is looking for the eth0 interface, and stops walking the list when it finds it).
Does anyone know if this might be a bug in the bonding driver? I don't think the interface list should contain NULL address pointers, but ifconfig has no trouble interpreting the data, so I suppose it's okay. Any thoughts?
I don't know the specifics of your situation, but it is common for the slaves of a bonding master to have no IP address information. In the struct ifaddrs returned by getifaddrs, the man page for getifaddrs() notes specifically that ifa_addr may be NULL if no address exists (and has similar language for the other address related fields).
Also, FWIW, it is possible to send traffic through a bonding slave (e.g., ping -I eth0), but replies coming in to the slave device will be redirected to the bonding master. This may or may not affect your use of the slave device. Generally speaking, it's incorrect to utilize a bonding slave directly; instead, activity should pass through the master device.
Log in to post a comment.