Hi,
>
> On Fri, 2003-02-21 at 16:24, Steven Whitehouse wrote:
> > Can you find out the OS and version of the problematic node?
>
> OpenVMS V7.1-2 running on an AlphaServer DS10 466. Has DECNET OSI V7.1,
> ECO 07.
>
> > Thats a bit odd I think. The LAT connection suggests that there is MAC
> > level connectivity between the two machines, which would normally
> > indicate that the Linux box should be getting hello messages too.
>
> I thought it should be too.
>
> > I have seen problems in the past with Phase V not sending regular enough
> > hello messages to keep an entry in the neighbour table. This can be fixed
> > by forcing the information into the table with iproute2.
>
> Sure, but wouldn't I eventually see a hello? Anyway, I tried playing
> with iproute2 but I'm not sure if I have the necessary stuff in my
> kernel to support it. Will look into this and recompile kernel if
> necessary.
>
Yes, you should see a hello at some stage even if it times out after
60 seconds. The Linux neighbour cache is based on a timeout system which
records the time of last update in each entry and then times out the entry
at a later time which is set per table. This means that its rather tricky
to take the variable timeouts into account that are sent in the hello messages
as it should. We could "fix" the last update times to push them back in time
a certain amount, but I'd rather do the fix properly and allow entries to
specify their own timeouts. So the iproute2 route is a bit of a hack, but
useful nonetheless.
> > Might well do. I wonder if there isn't some filtering on the network
> > affecting the hello messages. Not that this should prevent you from
> > setting up a connction with 1.1, it would only explain the lack of an
> > automatically generated entry in the neighbour table.
>
> Since I'm just toying around to see if I can get this working, getting
> information about what filtering, if any, might be in place on the
> switches may prove to be difficult to extract from the net admins. It's
> not that they're unreasonable people, but it's very difficult to get any
> of their time.
>
Yes I guessed that might be the case - hence my request of the tcpdump.
> > Could you send a tcpdump of the Linux box trying to connect to 1.1 ?
> > The first thing to establish is whether 1.1 replies at all, and if it
> > does, to find out whether its sending something unexpected. Make sure
> > that you get a full hex dump of each packet as I don't trust tcpdump
> > to decode the packets correctly,
>
> Sure thing.
>
> # tcpdump -s0 -x decnet host 1.20
> tcpdump: listening on eth0
> 21:10:41.495812 1.20 > 1.1 50 conn-initiate 8213>0 ver 4.1 segsize 1450
> 3200 8126 0000 aa00 0400 0104 0000 aa00
> 0400 1404 0000 0000 1800 0015 2001 03aa
> 0500 1902 0000 0000 0005 4c49 4e55 5803
> 0000 0000
> 21:10:43.489676 1.20 > 1.1 50 retrans-conn-initiate 8213>0 ver 4.1 segsize 1450
> 3200 8126 0000 aa00 0400 0104 0000 aa00
> 0400 1404 0000 0000 6800 0015 2001 03aa
> 0500 1902 0000 0000 0005 4c49 4e55 5803
> 0000 0000
> ...
>
> Then lots of "retrans-conn-initiate" identical to the second packet. I
> eventually ^C'd.
>
Ah. So unless you can tell by looking on the remote end that it has seen
these packets and is responding, then I very much suspect that you don't
have communication in either direction between 1.1 and 1.20 since no
reply is being received. It looks like a network problem to me I'm afraid,
Steve.
|