Menu

#44 New thread for WIN crashes in 0.92.3+

v0.93 show-stopper
open-accepted
None
9
2012-04-22
2012-03-18
No

Please redirect your posts form the discussion forum here, to the bug tracker. This is now the official thread for that issue.

I found one crashing bug (thread race condition) in 0.93.1, so unless there's another one, this should be fixed in 0.92.2+.

If nobody replies in 14 days, I'll close the topic as fixed.

Discussion

  • Robin Parker

    Robin Parker - 2012-03-21

    Using 0.92.3 on Windows XP Pro SP 3:
    Event Type: Error
    Event Source: cntlm
    Event Category: None
    Event ID: 0
    Date: 21/03/2012
    Time: 14:06:54
    User: NT AUTHORITY\SYSTEM
    Computer: NGT2664939
    Description:
    The description for Event ID ( 0 ) in Source ( cntlm ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: cntlm: PID 5324: service `cntlm' failed: signal 11 raised.

    I have a 38 Mb dump file created by the Miscrosoft User Mode Process Dumper if that's useful...

     
  • Robin Parker

    Robin Parker - 2012-03-21

    Also found this in system32 directory

    cntlm.stackdump:
    Exception: STATUS_ACCESS_VIOLATION at eip=0040A3D5
    eax=00000000 ebx=000000D1 ecx=00000000 edx=611883A0 esi=0000003C edi=00000000
    ebp=7E32CDD8 esp=7E30CD40 program=C:\Program Files\Cntlm\cntlm.exe, pid 5544, thread unknown (0x8AC)
    cs=001B ds=0023 es=0023 fs=003B gs=0000 ss=0023
    Stack trace:
    Frame Function Args
    End of stack trace

     
  • David Kubicek

    David Kubicek - 2012-03-23

    Please, get the BETA2 from http://ftp.awk.cz/pub/

    It has debugging symbols enabled and optimizations disabled. Repeat the crash and upload that dump somewhere, I'll try to learn how to work with it. Do you have any proven method of triggering the crash? Something easily reproducible I could do on my own? What's your exact command line and which URL is processing before and at the time of the crash?

    You'll get this detailed info ina file with "-v -T cntlm.log" arguments.

    For now, I think I have found the place where the crash happens from your stackdump, but I'd like to be able to reproduce it...

    Thanks for cooperating.

     
  • Robin Parker

    Robin Parker - 2012-03-27

    The Beta crashed.

    C:\Program Files\Cntlm-0.93beta2>cntlm.exe -v -T C:\cntlm.log
    cygwin warning:
    MS-DOS style path detected: C:\cntlm.log
    Preferred POSIX equivalent is: /cygdrive/c/cntlm.log
    CYGWIN environment variable option "nodosfilewarning" turns off this warning.
    Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
    Redirecting all output to C:\cntlm.log
    3 [unknown (0xC94)] cntlm 3268 exception::handle: Exception: STATUS_ACCESS
    _VIOLATION
    15479 [unknown (0x16D4)] cntlm 3268 exception::handle: Exception: STATUS_ACCES
    S_VIOLATION
    101041 [unknown (0x16D4)] cntlm 3268 exception::handle: Error while dumping sta
    te (probably corrupted stack)

    The stackdump file was empty...

     
  • Robin Parker

    Robin Parker - 2012-03-27

    And again...

    C:\Program Files\Cntlm-0.93beta2>cntlm.exe -v -T C:\cntlm.log
    cygwin warning:
    MS-DOS style path detected: C:\cntlm.log
    Preferred POSIX equivalent is: /cygdrive/c/cntlm.log
    CYGWIN environment variable option "nodosfilewarning" turns off this warning.
    Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
    Redirecting all output to C:\cntlm.log
    3 [unknown (0x1474)] cntlm 3292 exception::handle: Exception: STATUS_ACCES
    S_VIOLATION
    29990 [unknown (0x1474)] cntlm 3292 open_stackdumpfile: Dumping stack trace to
    cntlm.exe.stackdump

    This time the stackdumpfile contains..

    Exception: STATUS_ACCESS_VIOLATION at eip=004090F9
    eax=00020000 ebx=204DF270 ecx=00000011 edx=00001005 esi=200ABB60 edi=047FCD70
    ebp=7A8FCD28 esp=7A8FCCE0 program=C:\Program Files\Cntlm-0.93beta2\cntlm.exe, pid 3292, thread unknown (0x1474)
    cs=001B ds=0023 es=0023 fs=003B gs=0000 ss=0023
    Stack trace:
    Frame Function Args
    End of stack trace

     
  • David Kubicek

    David Kubicek - 2012-03-27

    Does it crash immediately after start or after some time of successful browsing, BTW?

     
  • David Kubicek

    David Kubicek - 2012-03-27
    • status: open --> pending-accepted
     
  • David Kubicek

    David Kubicek - 2012-03-27
    • status: pending-accepted --> open-accepted
     
  • David Kubicek

    David Kubicek - 2012-03-27

    Also, please attach the debug log file from from the crashed run. The stackdump is apparently useless for me. :(

    It is important to run cntlm with "-T cntlm.log" and upload the log file here after you make it crash. If you can make it crash with "-s" added too, it would be great, because we'd eliminate thread issues. -s makes cntlm run in only one thread.

    Please try "-s -T cntlm.log" and if it doesn't crash, then at least the log file without the -s.

    Crap, this issue bugs me. I can't reproduce it on my system. Do you browse some particular site when it happens?

     
  • Robin Parker

    Robin Parker - 2012-03-27

    Can't attach files here so I sent the log file to cntlm(at)awk(dot)cz

    Not produced with (-s) unfortunately.

    Crash happens after it's been running for a while.

    We have about 4 or 5 tunnels defined as well as people using it as a standard http(s) proxy...

     
  • David Kubicek

    David Kubicek - 2012-04-08

    Please use BETA3 from now on so we have the same binaries to work with, thanks! Even stackdump file contents will help. If you already have BETA3, redownload the Windows version, because there was a fluke while comiling it (debug symbols actually missing :)). Thanks!

    Get it here: http://ftp.awk.cz/pub/

     
  • David Kubicek

    David Kubicek - 2012-04-08

    OK, I have some news guys. From the Cntlm log file provided by R. Parker, I could see the problem emerges when a large number of CONNECT (or tunnel) connections is opened.

    I wrote a simple proxy client program that makes many of these requests in parallel and can reproduce the crash every time the number reaches a certain number. Mind you, this only happens on Windows (on Unix probably too but with much much higher number of opened connections).

    To me this looks exactly like what happens, when OS cannot fulfil more open connections (allocate new sockets) and cntlm crashes simply because somewhere in the code, there is a place where this situation is not handled and code flows forward as if it has a valid handle, which inevitably makes it crash as soon as it starts working with that handle.

    - The good news is that it should be easy to find the place where we should handle failed connection better and simply return error to the client.
    - The bad news is that this limit is imposed by the OS on some level and I cannot help you scale Cntlm's capability further. I'd suggest finding what is the limit of open files and open socket connections on your Windows version and raising that value to suit your needs, as well as other limits imposed on processes.

    >>When I run the same test application on Unix, Cntlm handles 100x more load and still no crash.<<

    As mentioned, Cntlm is very particular in using as little system resources as possible aready, so I won't be able to do more than handle "hitting the system's roof" more gracefully -- i.e. not crash. :) In my test cases, at the time of a crash, Cntlm consumes mere 11MB of RAM, which is damn *little*, considering what it's doing and how many connections are being handled. Other applications of this type would easily eat 10x-100x more RAM at the same moment.

    All this fits perfectly with the fact that Unix is much better and scalable in terms of network performance and capabilities and we don't have this issue there. Needless to say, there *is* a bug in Cntlm that needs to be fixed: when additional connections are denied by the system, it must not crash, but return an error to the client gracefully.

    Thanks so much for helping me pinpointing this issue!!!

     
  • David Kubicek

    David Kubicek - 2012-04-22

    I found that there is a limit in many versions of Windows to max a of 10 half-opened conections.

    In all cases reported here, the crash on Windows machines happened where there were +150 connections opened at the same time. After I installed a patch that allows to raise this absurd limit, I haven't been able replicate the crash with the same testing tools I used before to kill running Cntlm every time.

    Even after I "uninstalled" this patch:
    http://www.lvllord.de/?lang=en&url=downloads

    I'm unable to replicate the crash. Can you please do the same test as me and try to raise your limit from 10 to, say much more appropriate, 65535? If that helps, you'll have solved your issue, because despite there being an unhandled case in Cntlm, fixing it will only prevent the crash. The sevice for the "overhead" client would be refused more gracefully, but without a proper reply nonetheless.

    I will be quite busy at my professional work for the next few weeks and won't have as much time for Cntlm anyway, so until you test with solution/workaround and report back, there's not much I can do for a week or two.

    I'm looking forward to your feedback w.r.t. to beta3 and beta4 crashes with or without the aforementioned system patch and to finalizing this issue based on those results soon after.

    Thanks for your support and cooperation,
    David

     
  • David Kubicek

    David Kubicek - 2012-04-22

    Also please, when reporting a continuing crashes, do include your logs and more importantly the exact version of your Windows build and 32/64-bit-ness of your architecture.

    So far I have no idea whether this happens only on a particular Windows build or a particular architecture or on a more wide spectrum of Windows versions.

    Even if you don't want to do any more testing, I'd really like to hear the Windows build specs of your HW related to any older reports you've submitted so far. This will help me very much to be able to install the same testing environment matching your specs.

    Unless I can replicate the issue myself, I am not likely to find and fix this annoying bug.

     
  • David Kubicek

    David Kubicek - 2012-04-22
    • milestone: --> v0.93 show-stopper
     
  • David Kubicek

    David Kubicek - 2012-04-22

    Before you try any system patching to raise the number of allowed open connection (as mentioned below), please test BETA5!

    The BETA5 has a major change in handling cached open proxy server connections. All your logs had one thing in common, where the FD numbers of a client/server connection were very distant from each other. I began to suspect that some dysfunctional connections are not recognized under Windows as such and that's what causes the crashes.

    I applied all testing tools before the change and after the change and I was able to crash Cntlm 50% of the time with BETA3, but never with BETA5. I'm really hoping this change fixes the underlying issue in Cntlm that caused the crashes. If not, we'll continue to investigate, but it fixes an important bug anyway.

    Another issue has been fixed in BETA5 and that's the connection backlog size for all listening ports from 5 to MAX. Previous versions caused clients to report "proxy refused connection" when under heavy load, whereas this won't happen anymore unless the OS is actually running out of resources.

    The BETA download directory now contains only BETA5 and the older BETA3 for comparison:
    http://ftp.awk.cz/pub/

    Please report back your experience with BETA5 - crashes gone?

     
  • David Kubicek

    David Kubicek - 2012-05-10

    I was hoping I found the bugger! :( Well, next BETA, hopefully... Thank you for the feedback and back to the magnifying glass with me, then... :)

    I JUST WANT TO PUT THIS OUT THERE, FYI:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    0.91rc1 and newer versions were all testing releases. However, there was no negative feedback for months after the release, so I marked the code as stable and notched it up to 0.92, because many people were pushing for the DIRECT mode and other new features, which the stable branch 0.35 doesn't have.

    After 0.92 several functional bugs were reported all of a sudden (as a result of the major code rewrite) and I couldn't mark 0.92 BETA retroactively. The Windows/Cygwin crashing bug is a *major* issue, which is still open and hard to solve, because of lack of proper debugging tools for Cygwin EXE's.

    People should use 0.35.1 whenever possible, and 0.92.x/0.93beta's only when really necessary. I use 0.92 branch as stable, because on non-Windows systems, it works without any issues (except for one POST-handling bug, which - luckily - is not triggered in common use).

    To fix these two issues and properly test all new features, 0.93 BETAs were created on for *non-production* environments and our loyal/brave beta-testers. :)

    Until 0.93 stable is released, the only *real* stable version is still 0.35.1. I'm going to mark it as such on the home page and *revoke* 0.92's stable status now, because more than 50% of our users use Windows and any crashing whatsoever is simply unacceptable. Cntlm maintains the highest standards for code quality and memory management (hence no crashing on real POSIX OSes), but there is obviously some kind of incompatibility/quirk with the Cygwin pthreads emulation on Windows which affects Cntlm when a lot of parallel connections are processed (after thorough investigation, I'm positive the issue is not primarily memory/stack related, but instead is a race condition somewhere in thread interaction; that's the hardest kind of bug to find and Cygwin+Windows POSIX emulation makes this even harder).

    The biggest problem at this moment is *time*, my free & productive time, to be exact. For details, I refer you to this thread (about the 2nd show-stopper issue before releasing the 0.93 stable) and my last few comments:
    http://sourceforge.net/tracker/?func=detail&aid=3522110&group_id=197861&atid=963162

    If you really want these issues fixed, or better put, allow me to allocate enough time to Cntlm to be able to do so any time soon, I have no other option but to ask you to consider donating enough funds so that I can work on Cntlm in my professional capacity alongside my other paying contracts, rather than just a hobby project.

    This is my last call upon you: businesses, professionals depending on Cntlm daily in corporate environments, veteran users and fans in the non-commercial sphere and anybody else who feels Cntlm deserves appropriate nourishment to be allowed to flourish, despite the change in my personal preferences and professional responsibilities.

    I offer this explanation openly to satisfy anybody who wonders why there is suddenly, after 5 years without a single donation, any need for them now (actually, I think there was one donation a couple of years back, now that I think about it, and of course now, after the changes, another 2). I AM NOT ABANDONING CNTLM. If you - like me - can work with what is currently available, because you have no issues manifesting personally, you can simply wait it out. Others can employ workarounds like auto-restarts or scheduled regular daily restarts on Windows, etc.

    I believe I'll be able to get back to Cntlm development/support on regular daily/weekly basis - like I've been for the past years - not so far in the future, 6 months tops.

    TO PROMOTE THE DONATION SUPPORT SCHEME *AND* MAINTAIN COMPLETE TRANSPARENCY, I'VE DECIDED TO PUT UP EVERY DONOR'S NAME DIRECTLY ON OUR WEB SITE, DISPLAYING THE FULL AMOUNT DONATED TO DATE. This will take effect in a couple of days.

    Donors and actual supporters of Cntlm are a very rare breed and as such they deserve - even necessitate - to be publicly known for their support. Let their names be known and forever linked with Cntlm in the search engines. :)

     
  • Anonymous

    Anonymous - 2012-05-11

    Hi David.

    Happy to send stack dumps and logs at any time.

    Version: 0.93 beta 5
    System: Windows 7 (64Bit).
    MTBF: About 40 minutes.

     

    Last edit: Anonymous 2016-01-29

Log in to post a comment.