steve - 2011-07-15

Jgroups config in our application that uses jgroups:

<DefaultDistributionChannel groupName="WorkManagerGroup"
multicastAddress="239.193.4.78"

cachingMulticastAddress="239.193.4.79" cachingListeningPort="45557"
multicastPort="45555"

cachingMulticastPort="45556"/>

Background


We have two physical servers hosting Websphere Application Servers. These WAS
instances are hosting a bespoke application that uses jgroups for multicast
comms between the two app nodes.

The two physical Linux servers each have two NICs - bond0 and bond1. Multicast
traffic cannot flow on the VLAN that bond0 is connected to, so we have created
a new VLAN for bond1 that does support multicast. We therefore want to force
all multicast traffic to travel via the bond1 NIC, and have set up the routing
tables on the servers accordingly. However, when we start the apps up, we
observe that multicast subscriptions for the cachingMulticastAddress correctly
bind on bond1, but multicast subscriptions for the multicastAddress
incorrectly bind on bond0. As a result, the app doesn't work correctly because
all of the multicast comms sent to bond0 go nowhere.

We have run the org.jgroups.tests.McastReceiver/SenderTest and see the
bindings always correctly attaching to bond1 and multicast traffic flowing, so
we are wondering what the sublety is that means both the test harness and the
cachingMulticastAddress bindings in our app work correctly, but the
multicastAddress binding does not. Why is this connection ignoring the O/S
routing table and always binding to bond0? How can we force it to bind to
bond1? Is there some flag or parameter we could/should specify when we
instantiate the jgroups library in our app?

The results of our test using McastReceiver/SenderTest are shown below.

Any ideas would be much appreciated!

Cheers,

Steve

=====================

Results from multicast testing on hosts gbnlu0340s and gbnlu0341s.


Scope


Test mulitcast routing between hosts gbnlu0340s and gbnlu0341s using jgroups
org.jgroups.tests.McastReceiverTest and org.jgroups.tests.McastSenderTest .

Test for mulitcast subscriptions on each host and identify multicast traffic
on network interfaces.

Results


The jgroups receiver process was started on gbnlu0341 and the sender process
started on gbnlu0340s using the multicast group 239.193.4.78 and port 5555.

Group subscriptions on gblnlu340s were as show:

IPv6/IPv4 Group Memberships

Interface RefCnt Group


lo 1 224.0.0.1

bond0 1 224.0.0.1

bond1 1 239.193.4.78

bond1 1 224.0.0.1

Tcdump was started on bond1 interface on gbnlu341s. Output was filtered to
multicast traffic for group 239.193.4.78

Data was sent between gbnlu0341s and gbnlu0340s, data was received as expected
on gbnlu0340s. Multicast Traffic was visible via tcpdump on bond1

The test was run again with tcpdump snooping on bond0, no mulitcast traffic
was visible on bond0

The test was repeated using multicast group 239.193.4.79 with same results,
subscriptions shown:

IPv6/IPv4 Group Memberships

Interface RefCnt Group


lo 1 224.0.0.1

bond0 1 224.0.0.1

bond1 1 239.193.4.79

The test was repeated swapping the receiver and sender hosts around with same
results

Conclusions


Our test showed with multicast routing in place for bond1 that the jgroups
test sent all mulitcast traffic our via the correct interface as expected.

The most interesting part was when the receiver process was started it
subscribed on the correct interface. This is shown in the netstat -g output
above.

This is not how the application itself is behaving. We need to look closer at
how the application is starting up its subscription for the non caching
multicasting part that maybe forcing it to use bond0 and if there is anything
different with the 2 mulitcast setups within the application. The jgroups test
shows the mulitcast routing is working as expected.