Jamshed Kakar - 2001-12-21

Hi,

I'm having a strange problem with a threaded server I've written that uses Common C++ and I can't figure out if the problem is with my code or with Common C++ although I'm slowly beginning to wonder if it's Common C++...  I thought I'd ask here to see if anyone's experienced this; I can post code if necessary but it's not public-clean just now.  Ugh, I really want to figure out how to get paid to work on free software... that's a discussion for later. =)

My socket listener is derived from ost::TCPSocket.  It listens on the port using default buffer size (512) and backlog (5).  It accepts connections and spawns threads no problem.  I have a thread pool which as far as I can tell is thread-safe; I've even stopped using MutexLock (in favour of mutex.EnterMutex (), mutex.LeaveMutex ()) to make sure the the compiler doesn't make it go away before the end of the method as an "optimisation".

I have a generic session derived from ost::TCPSession that has a pool of session handlers that the generic session passes control to in Run ().  The generic session releases the session handler back to the pool in Final ().  When the handler pool is empty a special session is used that sends a "Server Full" message to the client and then "hangs up.".  This all works... until I bombard the server with many many connections at once.  It seems to be a timing issue as it's easier to reproduce the problem if I run a whole whack of sessions that very quickly connect/disconnect on the same box as the server.  Thus far running clients over the net (DSL at both ends) hasn't brought the server down.

In order to make sure that my code isn't the problem I've cooked up a version of the server that just returns ServerFullHandler's -- these are new'd as required and delete'd when they've done their job. With this configuration I've not been able to cause a segfault... I'm still wary though given that from what I can tell it seems to be a subtle timing problem.

The next step has been to add a read operation to the handler- each handler just reads a document off from the client and then hangs up.  This is causing segfaults... sadly the core dump file is useless and I can't effectively debug this in GDB because of the threads.

I'm using CommonC++ 1.9.1 and have been able to duplicate the problem on a RedHat server (Linux 2.4.12-ac1) and a Debian system (Linux 2.4.14-k7).

Has anyone experienced this?  Any suggestions... From scouring the archive I'm beginning to wonder if I'm missing something re: setCancellation...?

Thanks,
Jamu.