Re: [jgroups-users] Cant recover from ifdown; ifup when node uses a linux bridge
Brought to you by:
belaban
|
From: Bram K. G. <br...@sh...> - 2014-03-04 09:07:39
|
On 03/03/2014 05:44 PM, Bela Ban wrote:
> The solution would be to recreate the socket on which the send() fails.
> However, as you can see in UDP:201, I catch only selected exceptions,
> e.g. NoRouteToHostException.
>
> In your case, I can see only the very generic IOException. There's no
> NoSuchDevice exception... Hmm, I'm not sure if I want to catch the
> generic IOException and simply set the interface again...
Correct. The exception that send() triggers is a generic
java.io.IOException with the message "Network is unreachable" (in case
if I did not set -Djava.net.preferIPv4Stack=true) or a
java.io.IOException with the message "No such device" when I did set the
preverIPv4Stack flag to true.
>
> Can you confirm that calling
> mcast_sock.setInterface(mcast_sock.getInterface()) works ?
Rebinding this way is not possible. If I do
mcast_sock.setInterface(mcast_sock.getInterface()); after I catch the
IOException I get a SocketException since the interface bound to the
mcast socket is invalid at that point:
java.net.SocketException: IPV6_MULTICAST_IF returned index to
unrecognized interface: 5
at java.net.PlainDatagramSocketImpl.socketGetOption(Native Method)
at
java.net.AbstractPlainDatagramSocketImpl.getOption(AbstractPlainDatagramSocketImpl.java:343)
at java.net.MulticastSocket.getInterface(MulticastSocket.java:491)
at org.jgroups.protocols.UDP._send(UDP.java:208)
at org.jgroups.protocols.UDP.sendMulticast(UDP.java:181)
at org.jgroups.protocols.TP.doSend(TP.java:1667)
at
org.jgroups.protocols.TP$TransferQueueBundler.sendBundledMessages(TP.java:2462)
at
org.jgroups.protocols.TP$TransferQueueBundler.sendMessages(TP.java:2391)
at org.jgroups.protocols.TP$TransferQueueBundler.run(TP.java:2382)
at java.lang.Thread.run(Thread.java:744)
> Or, you
> probably need to grab the interface again and then call
> sock.setInterface()...
Yes, that seems to be the only possible solution to recover and that
works in the tests I did.
Another note: *only* multicast sending/receiving is not working at that
point. Sending unicast messages works perfectly fine after the device is
back up again. It seems to me that only the mcast_socket needs to be
rebound.
>
> If this works, I'll create a JIRA and fix this.
>
> On 03/03/14 13:48, Bram Klein Gunnewiek wrote:
>> I used McastSenderTest to debug a bit whats happening. I modified the
>> tester to only use one address instead of all. After the ifdown;ifup
>> command the initial interface that is stored in the socket is not
>> valid/available any more. I can't get any of the original information
>> about that device back:
>>
>> McastSenderTest.send(): java.io.IOException: No such device
>> java.io.IOException: No such device
>> at java.net.PlainDatagramSocketImpl.send(Native Method)
>> at java.net.DatagramSocket.send(DatagramSocket.java:676)
>> at
>> nl.shockmedia.udptest.McastSenderTest.send(McastSenderTest.java:114)
>> at
>> nl.shockmedia.udptest.McastSenderTest.main(McastSenderTest.java:101)
>>
>> Getting any information about the socket results in this exception:
>>
>> java.net.SocketException: IPV6_MULTICAST_IF returned index to
>> unrecognized interface: 13
>> at java.net.PlainDatagramSocketImpl.socketGetOption(Native Method)
>> at
>> java.net.AbstractPlainDatagramSocketImpl.getOption(AbstractPlainDatagramSocketImpl.java:343)
>> at java.net.MulticastSocket.getInterface(MulticastSocket.java:491)
>> at
>> nl.shockmedia.udptest.McastSenderTest.send(McastSenderTest.java:116)
>> at
>> nl.shockmedia.udptest.McastSenderTest.main(McastSenderTest.java:101)
>>
>> Information I tried to retreive:
>> try {
>> System.err.println("Interface of socket: " + socket.getInterface());
>> } catch (SocketException e1) {
>> e1.printStackTrace();
>> }
>> try {
>> System.err.println("Network interface: " +
>> socket.getNetworkInterface());
>> } catch (SocketException e1) {
>> e1.printStackTrace();
>> }
>>
>> Getting the local address of the socket results in this:
>>
>> System.err.println("Local address of socket: " + socket.getLocalAddress());
>> Local address of socket: 0.0.0.0/0.0.0.0
>>
>> So, there is no way of knowing to what interface we have to rebind. I
>> don't know if a solution can be made inside JGroups for this case or
>> that we have to re-create the channel ourselfs, but as it is right now
>> we have no way of detecting this error and we can't act at all. The
>> JGroups cluster is simply in a zombie state and we can only see that
>> through the debug logs.
>>
>> If I enumerate all available interfaces after this happends an interface
>> with the initial binding address is available:
>>
>> Found available inet address: fe80:0:0:0:a00:27ff:fec3:fbb7%14
>> Found available inet address: 192.168.8.76
>>
>> When I rebind to the InetAddress with the IP address that I originally
>> had bound to using socket.setInterface(addr); everything works again.
>> The (Linux) device ID seems to increase with one after doing
>> ifdown;ifup. E.G. I get an error that there is no interface with ID 15.
>> If I rebind the socket (making everything work again) and do ifdown;ifup
>> again I get the same error exept the ID of the device it can't find has
>> increased to 16.
>>
>> Again, I don't exactly know how this can be solved. Ideas?
>>
>> On 02/28/2014 11:27 AM, Bela Ban wrote:
>>> Hmm, "network unreachable" and "no such device" point to an incorrectly
>>> set up interface [1].
>>> Could be that the routing table is not set up right. The first error is
>>> caused by a missing interface or route. I've never seen the second...
>>>
>>> Re debugging: if you run the debugger on the same box, you should still
>>> be able to connect via TCP as the loopback device will be used.
>>>
>>> Can't really help you as I have no means of reproducing this on my end.
>>> What you could do is write a simple program (or copy
>>> McastSenderTest/McastReceiverTest from JGroups) which catches the
>>> exception on a send() and tries the following things:
>>> - Rebind the interface of the mcast socket (setInterface())
>>> - Re-join the multicast group
>>> - Re-create the socket
>>>
>>>
>>> [1]
>>> http://wiki.libvirt.org/page/Unable_to_add_bridge_br0_port_vnet0:_No_such_device
>>>
>>> On 28/02/14 10:29, Bram Klein Gunnewiek wrote:
>>>
>>>> Since remote debugging is hard when I put the network card down I added
>>>> ex.printStackTrace(); to line 213 of UDP.java (using version 3.4.2.Final
>>>> from GIT). This is what happens after ifdown;ifup and sending a message
>>>> every second:
>>>>
>>>> Received message from bram-ubuntuvm-3-21243: flood!
>>>> java.io.IOException: Network is unreachable
>>>> at java.net.PlainDatagramSocketImpl.send(Native Method)
>>>> at java.net.DatagramSocket.send(DatagramSocket.java:676)
>>>> at org.jgroups.protocols.UDP._send(UDP.java:199)
>>>> at org.jgroups.protocols.UDP.sendMulticast(UDP.java:181)
>>>> at org.jgroups.protocols.TP.doSend(TP.java:1667)
>>>> at
>>>> org.jgroups.protocols.TP$TransferQueueBundler.sendBundledMessages(TP.java:2462)
>>>> at
>>>> org.jgroups.protocols.TP$TransferQueueBundler.sendMessages(TP.java:2391)
>>>> at org.jgroups.protocols.TP$TransferQueueBundler.run(TP.java:2382)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> 10:14:52.734 [TransferQueueBundler,test,bram-ubuntuvm-3-21243] ERROR
>>>> org.jgroups.protocols.UDP - JGRP000029: bram-ubuntuvm-3-21243: failed
>>>> sending message to cluster (50 bytes): java.lang.Exception:
>>>> dest=/ff0e:0:0:0:0:8:8:8:7600 (53 bytes), headers: NAKACK2: [MSG,
>>>> seqno=18], UDP: [channel_name=test]
>>>>
>>>> The message is sent as a broadcast and is delivered to the node itself
>>>> but can't be broadcasted over UDP.
>>>>>> (I don't know why the destination is an IPv6 address. The address
>>>>>> JGroups was previously bound on whas an internal IPv4 address)
>>>>> Interesting ! Did you run your app with -Djava.net.preferIPv4Stack=true
>>>>> ? I assume the IPv6 address points to a unicast member (port 7600)?
>>>>>
>>>> No, we don't. If I give that flag to both instances it works and it
>>>> indeed uses an IPv4 address:
>>>>
>>>> java.io.IOException: No such device
>>>> at java.net.PlainDatagramSocketImpl.send(Native Method)
>>>> at java.net.DatagramSocket.send(DatagramSocket.java:676)
>>>> at org.jgroups.protocols.UDP._send(UDP.java:199)
>>>> at org.jgroups.protocols.UDP.sendMulticast(UDP.java:181)
>>>> at org.jgroups.protocols.TP.doSend(TP.java:1667)
>>>> at
>>>> org.jgroups.protocols.TP$TransferQueueBundler.sendBundledMessages(TP.java:2462)
>>>> at
>>>> org.jgroups.protocols.TP$TransferQueueBundler.sendMessages(TP.java:2391)
>>>> at org.jgroups.protocols.TP$TransferQueueBundler.run(TP.java:2382)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> 10:26:07.179 [TransferQueueBundler,test,bram-ubuntuvm-3-47409] ERROR
>>>> org.jgroups.protocols.UDP - JGRP000029: bram-ubuntuvm-3-47409: failed
>>>> sending message to cluster (50 bytes): java.lang.Exception:
>>>> dest=/228.8.8.8:7600 (53 bytes), headers: NAKACK2: [MSG, seqno=21], UDP:
>>>> [channel_name=test]
>>>>
>>>> However, if I set that flag to only one of the two instances they don't
>>>> see eachother and no cluster is created. Strange?
>>>>
>>
>> ------------------------------------------------------------------------------
>> Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
>> With Perforce, you get hassle-free workflows. Merge that actually works.
>> Faster operations. Version large binaries. Built-in WAN optimization and the
>> freedom to use Git, Perforce or both. Make the move to Perforce.
>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>> _______________________________________________
>> javagroups-users mailing list
>> jav...@li...
>> https://lists.sourceforge.net/lists/listinfo/javagroups-users
>>
|