Re: [Linux-decnet-user] Can only connect to three out of four of our VMS nodes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

> 
> Hi,
> 
> I am running Debian/unstable on a 2.4.20 kernel.  After resolving the
> problem described in sections 4.9 and 4.10 of the FAQ[0] by setting my
> default_device to eth0, I can now dnping or sethost to three out of four
> VMS nodes on our network.  I know at least two of these nodes run DECnet
> phase V.  I'm not sure about the other.
> 
> But I still cannot contact one DECnet phase V host on our network.  The
> symptom is that dnping or sethost will just hang and never connect.  No
> error messages are returned.  I need to ^C the client to regain control
> of the terminal.
>
Can you find out the OS and version of the problematic node?

> The host I cannot contact does appear in /proc/net/decnet_neigh after a
> failed dnping attempt, e.g.
> 
> $ cat /proc/net/decnet_neigh 
> Addr    Flags State Use Blksize Dev
> 1.1     ---   40    02  0000230 eth0    
                          ^^^^^^^ This set to the minimum size, which usually
                                  means this entry was added due to an outgoing
                                  route request, rather than an incoming hello
                                  message. i.e. No hello messages have been seen
                                  from this node.

> 1.3     ---   40    01  0001492 eth0    
> 1.4     ---   40    02  0001492 eth0    
> 1.20    ---   40    01  0001498 lo      
> 
> In the above listing, 1.1 is the host I cannot contact and 1.20 is the
> node from which I am attempting to make the connection.
> 
> I can use telnet or LAT (llogin) to connect to this host successfully,
> however the LAT connection sometimes drops, especially after displaying
> a lot of output at once (e.g. SHOW TERM fairly reliably causes the
> connection to drop).  LAT connections to other hosts don't drop.
>
Thats a bit odd I think. The LAT connection suggests that there is MAC
level connectivity between the two machines, which would normally
indicate that the Linux box should be getting hello messages too.

I have seen problems in the past with Phase V not sending regular enough
hello messages to keep an entry in the neighbour table. This can be fixed
by forcing the information into the table with iproute2.

[snip] 
> 
> I know there are a number of switches deployed in our building, but I am
> not familiar with how they are configured.  I am guessing that since all
> of the servers live in the machine room on the same bench, they all talk
> to the same switch, whereas my desktop Linux workstationm, which is in
> an adjacent room along with a few dozen other workstations could be
> connected via a different switch.  I wonder if the network topology has
> a bearing on my problem.
>
Might well do. I wonder if there isn't some filtering on the network
affecting the hello messages. Not that this should prevent you from
setting up a connction with 1.1, it would only explain the lack of an
automatically generated entry in the neighbour table.

> 
> Any pointers as to how to proceed with debugging this problem?
>
Could you send a tcpdump of the Linux box trying to connect to 1.1 ?
The first thing to establish is whether 1.1 replies at all, and if it
does, to find out whether its sending something unexpected. Make sure
that you get a full hex dump of each packet as I don't trust tcpdump
to decode the packets correctly,

Steve.