#8 DNS failure (race condition?)

closed-fixed
nobody
None
7
2011-07-21
2011-03-16
aditsu
No

I have pydns-2.3.4 and I noticed some DNS failures when using an SPF mail filter.
I debugged the problem and found out that this instruction:
print DNS.DnsRequest("imsavscan.netvigator.com", qtype="A", server=['8.8.8.8'], protocol='tcp', timeout=300).req().answers
throws DNS.Base.DNSError: EOF, and occasionally DNS.Base.DNSError: no working nameservers found
and occasionally (seldom) even works.
It always seems to work when I step through it in debug.

Discussion

  • aditsu

    aditsu - 2011-03-16

    I'm raising the priority because resolution failure and the related email rejection (from spf) are serious issues.

     
  • aditsu

    aditsu - 2011-03-16
    • priority: 5 --> 7
     
  • aditsu

    aditsu - 2011-03-16

    Even with server=['127.0.0.1'], it sometimes throws DNS.Base.DNSError: no working nameservers found
    (I have dnsmasq running on my computer)

     
  • Scott Kitterman

    Scott Kitterman - 2011-03-16

    What O/S and version and what Python version are you using? Is it's distributed through a Linux distro,,which one?

     
  • aditsu

    aditsu - 2011-03-16

    Gentoo Linux, Python 2.6.6
    I'm pretty sure the problem is due to the non-blocking tcp connection and the program not waiting for the reply. There are even some comments in the source:
    # FIXME: Since we are non-blocking, it could just be a large reply
    # that we need to loop and wait for.

     
  • Scott Kitterman

    Scott Kitterman - 2011-03-16

    Does DNS.DiscoverNameservers find a namserver? I'M guessing it does, but I'd like to know for sure.

     
  • Stuart D. Gathman

    I have a fix for this in CVS, hasn't been tested to my satisfaction yet, but it Works For Me on our production mail server. (I just haven't verified if it has actually hit any large TCP responses yet.) The fix is in pydns-2.3.5. I am trying to get more patches in 2.3.5. There is a 2.3.5 tagged that you could grab and use (tag r235).

     
  • aditsu

    aditsu - 2011-03-16

    kitterma: yes, it parses resolv.conf correctly, but my sample code doesn't use that, it directly specifies the nameserver to use (8.8.8.8)
    customdesigned: thanks, you can test it with my sample code, that is a large TCP response. I'll try to test the CVS code too.

     
  • aditsu

    aditsu - 2011-03-16

    I'm afraid the current code still fails with the same errors. In particular, it often goes into this in processTCPReply:
    if len(header) < 2:
    raise DNSError,'EOF'

     
  • Stuart D. Gathman

    Added same logic to reading header (actually made a _readall() method) and committed to CVS. Try again when you get a chance. I'll keep working on remaining patches for 2.3.5 when I can.

     
  • aditsu

    aditsu - 2011-03-16

    Tag r235 is still the same, where can I find the new changes?

     
  • aditsu

    aditsu - 2011-03-17

    Nevermind, I switched to HEAD and back, and it got updated.
    Now I don't seem to get the EOF anymore, but most of the time I get DNS.Base.DNSError: no working nameservers found
    and sometimes DNS.Base.DNSError: incomplete reply

     
  • aditsu

    aditsu - 2011-03-17

    Apparently the reason for the "no working nameservers found" is:
    socket.error: [Errno 11] Resource temporarily unavailable
    which seems to be the same thing as EWOULDBLOCK, and is not handled in _readall

     
  • Nobody/Anonymous

    Only happens to me when specifying an invalid name server IP. I can supply a valid nameserver with server=, and it works fine. I guess with tcp, when no connection can be made, the failure happens before the timeout (hence why failure is immediate).

     
  • aditsu

    aditsu - 2011-03-17

    8.8.8.8 is a valid nameserver

     
  • Nobody/Anonymous

    Switched processTCPReply() to blocking IO with timeout. This breaks with google DNS (8.8.8.8) unless we comment out the SHUT_WR. Committed and tagged to r235

     
  • Stuart D. Gathman

    So, is this problem fixed in 235? I still don't have permission to update the bug status. :-(

     
  • aditsu

    aditsu - 2011-07-21

    It's fixed for me in 2.3.5

     
  • aditsu

    aditsu - 2011-07-21
    • status: open --> closed-fixed
     
  • aditsu

    aditsu - 2011-07-21

    And some more comments:
    - I discussed this bug on irc with a pydns developer (sorry I don't remember his username, might be you for all I know) and he worked out the solution and let me try a patch, but this comment thread doesn't reflect all the details
    - using Google's dns for a mail server was a dumb idea in hindsight; in the meantime I did the right thing and installed bind; with a local dns server this bug would probably be hard to reproduce but I think it's still very good that it got fixed

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks