From: SourceForge.net <no...@so...> - 2005-06-15 16:47:19
|
Support Requests item #1183591, was opened at 2005-04-15 12:00 Message generated for change (Comment added) made by monas You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541483&aid=1183591&group_id=74601 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 3 Submitted By: Nobody/Anonymous (nobody) Assigned to: Aidas Kasparas (monas) Summary: Racoon Freezing Initial Comment: Kernel 2.6.8 with ipsec-tools 0.3 and 2.6.11 with ipsec-tools 0.5 with over 250 vpn connections randomly racoon freezes, the connection stay working till their lifetime expires. Killing racoon with kill -9 is the only way to stop it and then restart racoon and all is well till it freezes again. No information in the log to indicate the reason for failure. Any idea on how a watchdog would work with racoon ?? ---------------------------------------------------------------------- >Comment By: Aidas Kasparas (monas) Date: 2005-06-15 19:47 Message: Logged In: YES user_id=39627 Oops, old log won't help -- it will not show what was on pfkey queue while freezing... :-// ---------------------------------------------------------------------- Comment By: Aidas Kasparas (monas) Date: 2005-06-15 19:40 Message: Logged In: YES user_id=39627 FIRST, I'm sorry to tell, but by providing level 2 debug log, you provided confidential PSK too. Please act like that PSK was compromised. In future, please do not use level 2 debug without explicit request. Or remove DEBUG2 lines from the log before providing it to anyone. What I have spotted in your log is this: When you were restarting racoon very rapidly, one of first things what racoon did was requesting SPD from kernel. It gets this info in series of pfkey messages, one policy at a time. Meanwhile, he also gets "delete" payload in an informational message over ISAKMP socket. To handle this, it calls pfkey_dump_sadb, which opens **another** pfkey socket and **in blocking** mode tries to send request for kernel to dump SADB over this socket. Then, it waits for this request to be fulfilled, while kernel still trying to dump remaining SPD policies over the other socket. If, my hypothesis, that racoon waits on wrong socket and therefore freezes, holds, then it could be fixed by rearanging code to use single select for any pfkey communications. (not easy to do, but doable). But then, the only explanation why racoon freezes not during startup would be that it gets remove message from peer while acquire (or similar) message is already in the queue for main pfkey socket. Is it sounds feasible? Maybe you have last few hundred lines of racoon log in debug mode, when it froze after much longer period of time? That would help to see if my hypotesis is right. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-06-15 17:00 Message: Logged In: NO system is in production so had to reboot.. (that stops the problem for a few weeks) a Log file is available at http://dev.waveworks.co.uk/racoon.log.gz which contains 10 seconds worth (on debug2) covering 22MB (gziped to 822k) ---------------------------------------------------------------------- Comment By: Aidas Kasparas (monas) Date: 2005-06-15 15:29 Message: Logged In: YES user_id=39627 Can you try to attach to that process with gdb? (gdb -p process-number) and get backtrace what it is doing (bt command) Also, could you please provide not just last log entry, but more, at least few seconds. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-06-15 14:43 Message: Logged In: NO Extra Info. currently runing and racoon stops accepting requests (including requests from racoonctl) Last log event is 'call pfkey_send_dump' each time that appears racoon stops accepting requests. ---------------------------------------------------------------------- Comment By: Mike Robinson (sundialservices) Date: 2005-05-12 18:15 Message: Logged In: YES user_id=854356 So, the racoon server goes dead and takes down all those connections with it? That kind of problem would be extremely difficult to diagnose under any circumstances. The first thing I'd try is to use a system-monitor (like 'top') to see why the racoon process is frozen... is it churning away CPU-time in a 100%-busy loop, or is it waiting, and if so, for what? The next thing I'd try is to see if you can make any sort of correlation between the number-of-clients and the freezes. Notice also if any particular client, group of clients, part-of-the-building and so forth appear to be the ones who did it. Look to see who logged-in most recently. Things like that. You can get a core-dump when you kill a process, to see exactly what the process was doing pre-mortem. Anyhow, this kind of problem is definitely going to take some forensics before the cause becomes apparent. The mere fact that it is freezing-at-random, by itself, is just not sufficient to point at a solution. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541483&aid=1183591&group_id=74601 |