Thread: [Oserl-questions] [BUG] Race condition in gen_smsc:listener (possibly rare)
Brought to you by:
mpquique
From: Edwin F. <em...@gm...> - 2008-09-01 03:13:08
|
*Note: I submitted this bug report to the sf.net bug tracker ( http://sourceforge.net/tracker/index.php?func=detail&aid=2086029&group_id=101062&atid=628956) but the code formatting got all tangled up, so please forgive the duplicated posting.* I found a bug in gen_smsc:listener/3. This bug is present on the HEAD branch (revision 1.18). It only happens under certain conditions which may be quite rare. The bug manifests if you connect to the socket on which the listener is waiting, and then immediately disconnect. Although this possibly sounds like a stupid thing to do, it happened because some system management software was set up to make sure the SMPP port 2775 was alive by connecting to it every 5 minutes and immediately dropping the connection. The effect of the bug is to leave the socket open, linked to the listener. After enough requests, the listener eventually owns hundreds of open sockets and crashes. It is easy to see this by running a lot of fast connect/disconnects and looking at the links in the listener process, e.g. 1> proplists:get_value(links,i(0,3336,0)). [#Port<0.6828>,#Port<0.6840>,#Port<0.6847>,#Port<0.6849>, <0.3335.0>,#Port<0.6848>,#Port<0.6846>,#Port<0.6832>, #Port<0.6835>,#Port<0.6839>,#Port<0.6833>,#Port<0.6830>, #Port<0.6831>,#Port<0.6829>,#Port<0.6818>,#Port<0.6823>, #Port<0.6826>,#Port<0.6827>,#Port<0.6824>,#Port<0.6821>, #Port<0.6822>,#Port<0.6819>,#Port<0.6807>,#Port<0.6811>, #Port<0.6812>,#Port<0.6810>,#Port<0.6804>,#Port<0.6805>, #Port<0.6803>] Of course, the listener is linked to the smsc and if the smsc is left to crash and restart when the listener crashes this issue probably won't be noticed, but in my application I don't want the smsc to crash so trap_exit is used. The bug is due to a race condition in the listener that occurs under the circumstances described earlier. It is necessary to reproduce the code of the listener here to show the issue. Line numbers correspond to revision 1.18 of gen_smsc.erl. 1070:listener(ServerRef, LSocket, Count) -> 1071: case gen_tcp:accept(LSocket) of 1072: {ok, Socket} -> 1073: inet:setopts(Socket, ?CONNECT_OPTIONS), 1074: *case gen_server:call(ServerRef, {accept, Socket}, infinity) of* 1075: {ok, Pid} -> 1076: *gen_tcp:controlling_process(Socket, Pid*), 1077: if 1078: Count > 1 -> 1079: listener(ServerRef, LSocket, ?DECR(Count)); 1080: true -> 1081: gen_server:cast(ServerRef, listen_stop) 1082: end; 1083: _Error -> 1084: gen_tcp:close(Socket), 1085: listener(ServerRef, LSocket, Count) 1086: end; 1087: _Error -> 1088: gen_server:cast(ServerRef, listen_error) 1089: end. On line 1074, once the socket has been accepted, the code calls through gen_server to {accept, Socket}. If that call comes back as {ok, Pid} in line 1075, the code then hands over ownership of the socket to Pid in line 1076, by calling gen_tcp:controlling_process. However, this is vulnerable to a race condition. If the process corresponding to Pid dies between line 1075 and line 1076, gen_tcp:controlling_process will fail. This leaves the socket open and still belonging to the listener process. The fix is to check the return from gen_tcp:controlling_process and close the socket if it fails, as shown in the code below. I verified this by connecting and disconnecting 10 times a second for a few thousand times without seeing the problem reoccur. I also verified this by checking the links in the listener process as described earlier and seeing no links to socket ports. 1070:listener(ServerRef, LSocket, Count) -> 1071: case gen_tcp:accept(LSocket) of 1072: {ok, Socket} -> 1073: inet:setopts(Socket, ?CONNECT_OPTIONS), 1074: case gen_server:call(ServerRef, {accept, Socket}, infinity) of 1075: {ok, Pid} -> *1076: case gen_tcp:controlling_process(Socket, Pid) of ok -> * 1077: if 1078: Count > 1 -> 1079: listener(ServerRef, LSocket, ?DECR(Count)); 1080: true -> 1081: gen_server:cast(ServerRef, listen_stop) 1082: end; * _Error -> * * gen_tcp:close(Socket), listener(ServerRef, LSocket, Count) end; *1083: _Error -> 1084: gen_tcp:close(Socket), 1085: listener(ServerRef, LSocket, Count) 1086: end; 1087: _Error -> 1088: gen_server:cast(ServerRef, listen_error) 1089: end. -- For every expert there is an equal and opposite expert - Arthur C. Clarke |