Menu

Issues connecting to SFB using sipe 1.21.1 & pidgin 2.11.0... IPv6 related?

Help
hay jumper
2016-09-16
2016-09-29
  • hay jumper

    hay jumper - 2016-09-16

    Hello,

    Since pidgin 2.11.0, connection to Skype for Business/Lync/whatever-MS-calls-it-tomorrow is failing. The same connection settings work well when using pidgin 2.10.2 & below.

    While collecting the debug log, I noticed in 2.10.2 pidgin attempts to connect to sipdir.online.lync.com via IPv4. In 2.11.0 the connection attempt is via IPv6. In both cases the connection succeeds, but under 2.11.0 the connection is severed shortly after, with a generic error.

    I've tried hard-coding the IPv4 address of Microsoft's server into the connection settings, and this gets a little further (thru cert exchange), but fails after that... looks like MS redirects to a different URL, which is then resolved into IPv6 and fails there.

    I am hoping this will make sense & the issue can be resolved. I'll try to attach a log run with debug symbols installed below.

    Thanks in advance!

     
  • hay jumper

    hay jumper - 2016-09-16

    Attached is a debug log of connecting thru Sipe 1.21.1 on Pidgin 2.11.0 to Skype for Business. The username & domain have been changed to protect the guilty.

     
  • Stefan Becker

    Stefan Becker - 2016-09-16

    SIPE doesn't handle any IP addresses in connection handling. It only request a connection to be set up from the backend. I doubt that even libpurple handles IP addresses in connections. It just calls the lower network layers, i.e. on Windows that would be winsock2, and gets back a socket.

    But libpurple (at least 2.x.x) does not know how to handle IPv6 addresses. SIPE calls purple_network_get_my_ip() to get the IP address to insert in the SIP message and that only returns IPv4 addresses:

    In your case it returns an unroutable IPv4 class C address 192.168.x.x. If the server uses that for anything while trying to answer, then this would lead to a failure, because it can't be connected to from the Internet.

    Things to try that come to mind:

    • disable IPv6 on your Windows machie.
    • if possible tell Windows to disable IPv6 for the Pidgin application.
    • set the correct IPv4 address in the Pidgin settings or set up a STUN server from the Internet (that will discover the Internet IP address for your machine)
     
  • Stefan Becker

    Stefan Becker - 2016-09-19

    I checked my own logs and see 192.168.x.x in Via: lines. So the private IP address can't be the problem, i.e. setting an IP address or STUN server in the Pidgin settings won't help.

    My guess is that the server which answers on the IPv6 address is a SIP server, but not OCS/Lync. It therefore doesn't know what do do with our registration request and errors out.

     
  • hay jumper

    hay jumper - 2016-09-19

    Pidgin 2.11.0 with latest SIPE plugin works properly if I disable IPv6 on my machine. Something must be changed in latest Pidgin when a 6 address is present.

    The function defs you posted don't seem to handle 6 addresses at all, maybe the problem is further up the calling stack. I'll try to isolate the changes made in 2.11.0 to see if I can find any candidates.

    Thanks for your help in debugging the issue. I can't leave IPv6 disabled on this machine, so I'll be hanging onto Pidgin 2.10.2 (which works) for now.

     
  • hay jumper

    hay jumper - 2016-09-19

    Update: I think I found the smoking gun:

    https://developer.pidgin.im/ticket/1075

    Appears they've changed DNS lookups to include IPv6 results, but haven't done anything to actually make it work in the code. At this point I'll look into submitting a Pidgin bug... definitely will be holding my breath on that being fixed.

     
  • Stefan Becker

    Stefan Becker - 2016-09-20

    Yes, that definitely explains the change in behaviour.

    But IMHO this is not a bug in libpurple, so please do not report one to them. They will close it anyway if they see "third party plugin" in the description :-)

    IMHO your test with IPv4 shows that the errors lies elsewhere:

    1. there is no OCS/Lync server running at that IPv6 address/port,
    2. OCS/Lync client are supposed to only use an IPv4 connection, or
    3. OCS/Lync client messages need to be different when using an IPv6 connection.

    ad (1): that would be a bug on M$ side, because if they do not support IPv6 then they should not put an IPv6 address into the DNS. This could be fixed by adding a local host name entry that is hard-coded to one or more of the IPv4 addresses to the name sipdir.online.lync.com (AKA in UNIX world: add a line to /etc/hosts). The local entry would override the DNS response and make sure you get a IPv4 connection.

    ad (2): can you take a log from the Lync client on the machine? Then we could see how it behaves. If that log shows an IPv4 address then I assume that the Lync code enforces an IPv4 connection. That is something SIPE can't do, because libpurple connection handling doesn't offer such a feature. You would need to use the solution from (1)

    ad (3): again a log from Lync client would show what those differences are and we could try to implement them for SIPE, if possible.

     
  • hay jumper

    hay jumper - 2016-09-21

    Yes, I've tried to fake it using hard-coded IPv4 in my hosts file. This gets me further, and I can see the certificate exchange & handshake aparently succeeding, however afterwards I get a TCP reset from the server. I haven't had a chance to look into that further.

     
  • Stefan Becker

    Stefan Becker - 2016-09-21

    You might have to repeat the excersize with the home server you are probably redirected to (look for a 301 response).

    How about the log from Lync client? That would help the most.

    I've already refactored the IP address query code in the backend. If we have to generate IPv6 addresses in SIP messages headers for IPv6 connections, then we can remove the call to libpurple, simply ask the socket for its address with getsockname() and handle IPv4/IPv6 ourselves.

     
  • hay jumper

    hay jumper - 2016-09-22

    Unfortunately there are multiple secondary servers that it uses, and they use dynamic pooling so the host name changes frequently.

    Attached is a log from Skype for Business client (this is basically still Lync, thanks MS marketing). From the log it seems like this client does resolve the IPv6 addresses like Pidgin 2.11.0, but it ignores them (or times out) in favor of the IPv4 addresses, which are what it actually seems to use.

    Hopefully the log will be useful to you... I appreciate the continued interest in making things work.

     
  • Stefan Becker

    Stefan Becker - 2016-09-22

    Are you sure you attached the correct log? As far as I can tell this Lync client is hard-coded to one SIP server sippooldm12a03.infra.lync.com which according to DNS has only one IPv4 address and no IPv6 addresses. I assume that that is your home server.

    It does not go through the usual chain of redirects, i.e. via DNS SRV-discovered sipdir.online.lync.com. This could be an effect of caching the detected home server from an earlier successful login.

    BTW: git HEAD has a replacement for the libpurple call and will now return an IPv6 address from a connected IPv6 socket.

     
  • hay jumper

    hay jumper - 2016-09-22

    The attached log shows only one or two connections, and it's likely the DNS call is cached (as you suggest). But if I ran it again today, the lookup would likely return a different server. Consistent within a single connection, but not consistent across multiple connections/days/etc.

    If desired I could produce a new log after flushing my lookup cache. Thanks for the commit, I'm more than happy to try out the next release when it drops.

     
  • Stefan Becker

    Stefan Becker - 2016-09-23

    No, the problem is not the DNS call or caching. The log only shows the end result of the home server discovery, i.e. for the SIP connection setup it only uses the home server name. Those seem only to have IPv4 addresses and therefore avoid the issue altogether, i.e. SIP is only used over a IPv4 connection.

    That would mean that the Lync client is doing something different, again, for server discovery, which does not involve DNS SRV discovery, DNS A discovery and SIP connections like previously.

    The Lync log unfortunately only shows the SIP communication. You would need to use a network analyzer like wireshark to discover if it creates other network connections before connecting to the SIP server. My guess is there is a new HTTP(S) based protocol behind it.

    After a little searching I'm sure this is the likely candidate [MS-OCDISCWS].

    I've created [feature-requests:#93] for this,

     

    Related

    Feature Requests: #93

  • Stefan Becker

    Stefan Becker - 2016-09-25

    Implemented in git commit 025c055

    I provided an updated DLL, compiled against Pidgin 2.11.0, in the download area. Can you please try this?

     
  • hay jumper

    hay jumper - 2016-09-26

    Debug log indicates this time it gets further in than I've seen with previous methods. It still eventually hits a point where it times out, prior to full login.

    I've tried to clean up the debug log as much as possible. Let me know if you need more info:

     
    • Stefan Becker

      Stefan Becker - 2016-09-27

      You forgot to clear the server field in the account settings:

      (16:24:50) sipe: sipe_core_connect: user specified SIP server sipdir.online.lync.com:443
      (16:24:50) sipe: transport_connect - hostname: sipdir.online.lync.com port: 443
      

      Therefore no autodiscover is attempted at all...

       
      • Stefan Becker

        Stefan Becker - 2016-09-27

        Or you attached the wrong log?

         
  • hay jumper

    hay jumper - 2016-09-28

    Indeed, patched DLL works well with Pidgin 2.11.0 if I set the server name to <blank>.</blank>

    I didn't realize autodiscover was for the initial server address, I thought it was only active on the redirects after initial connect. I've been using hard-coded server name for years. Thanks!

     
  • hay jumper

    hay jumper - 2016-09-28

    In case it is useful, I am adding a debug from the latest (successful) login attempt. FWIW, it appears the initial server name is discovered as webdir.online.lync.com.

    Thanks again, and let me know if I can provide any other info.

     
    • Stefan Becker

      Stefan Becker - 2016-09-28

      The debug log is truncated, i.e. it doesn't contain the end auf the Lync Autodiscover protocol. Hence it doesn't show what home server was discovered.

       
      • hay jumper

        hay jumper - 2016-09-29

        Sorry... sanitizing these logs is time consuming, so I went the lazy route.

        Attached is a more complete debug log, up to right before the contact list is synced.

         
  • Stefan Becker

    Stefan Becker - 2016-09-29

    The new log shows it works correctly for your account too, i.e. Lync Autodiscover delivers the name of your home server straight away:

     <SipClientInternalAccess fqdn="sippooldm12a03.infra.lync.com" port="443" />
      ...
      <SipClientExternalAccess fqdn="sipfed2a.online.lync.com" port="443" />
     ...
    (11:26:55) sipe: lync_autodiscover_cb: got server list
    (11:26:55) sipe: transport_connect - hostname: sippooldm12a03.infra.lync.com port: 443
    ...
    (11:26:55) dnsquery: IP resolved for sippooldm12a03.infra.lync.com
    (11:26:55) proxy: Attempting connection to 131.253.161.209
    

    As that server has only an IPv4 address the whole issue of IPv6 not working is avoided altogether.

    Thanks for providing the logs so that I could root-cause the issue and implement the correct solution.

     

Log in to post a comment.

MongoDB Logo MongoDB