|
From: David C N. <dc...@ad...> - 2004-01-09 22:55:48
|
Thanks for the info. I think I have figured out what the problem is, though it is not entirely clear to me how to solve it. Here's what I think is happening: All the action is in pro_frame_available() in my profile, which is dealing with frames, not whole messages. Due to the fragmentation bug in beepcore-c, even my small messages often get broken into several small frames during transmission (usually a few hundred bytes). To deal with this, I have a linked list tracking the status of each message by number as it comes in. But there is a fair amount of processing involved in locking the list and adding an entry in it, so I am not doing it immediately upon receiving the first frame in the message. Instead, I have been doing it towards the end of pro_frame_available when I have determined that this is the first frame but not the last frame (although come to think of it I know this earlier on). So anyway, what happens is that the first frame comes in, pfa() thinks about it a while, and then the second frame for the message comes in and pfa() gets called again. Both of these instances of pfa() think that it's a new message, and it's a race condition as to which one records it first, causing great confusion. It's possible the dropped sessions are due to this bug as well. To cure this, I really need to lock the linked list before I even pick up a frame, to prevent another instance of pfa() from sneaking in with a subsequent frame in the same message oblivious to the fact that the first frame has already been received. And if it is a new but not complete frame, I need to create the entry for it then. If this isn't a new one, it'll have to wait for a lock, which is good, because then by the time it can get the lock the new entry will be in place for it to find. Does this make sense? Is there anything about a frame that indicates intrinsically if it is the first frame in a message? DCN On Thu, 8 Jan 2004, Lei Zhang wrote: > Looks like your TCP connection is broken. revents=0x19/0x19 means > POLLIN | POLLERR | POLLHUP. > > I've been fiddling with beepcore-c for a few months, haven't noticed > frames getting lost - can you tell how to trigger that kind of error? > > Thanks, > Lei > > David C Niemi wrote: > > >I am again working on my application built on beepcore-c (under Linux). > >I occasionally have BEEP sessions (or at least channels) just drop for no > >apparent reason. The log file shows this: > > > >01/08 10:33:00 beepd-re 2.core start logging > >01/08 10:33:00 beepd-re 2.wrap loaded profile for http://www.adeptech.com/beryllium/BERYL/RECEIVE > >01/08 10:33:00 beepd-re 1.wrap duplicate library libberyl.so, continuing... > >01/08 10:33:00 beepd-re 2.wrap loaded profile for http://www.adeptech.com/beryllium/BERYL/SEND > >01/08 10:33:00 beepd-re 2.wrap listening on 5 (backlog 128) > >01/08 10:33:32 beepd-re 0.wrap wrapper created: 6 > >01/08 10:33:32 beepd-re 1.wrap wrote 218 octets > >01/08 10:33:32 beepd-re 1.wrap wrote 15 octets > >01/08 10:33:33 beepd-re 1.wrap read 67 octets > >[...] > >01/08 10:36:04 beepd-re 1.wrap read 18 octets > >01/08 10:36:04 beepd-re 1.wrap wrote 395 octets > >01/08 10:36:04 beepd-re 1.wrap wrote 1913 octets > >01/08 10:36:04 beepd-re 1.wrap wrote 353 octets > >[normal so far] > >01/08 10:36:04 beepd-re 2.wrap id=0 fd=6 revents=0x19/0x19 > >01/08 10:36:04 beepd-re 0.wrap socket error: 6 > >01/08 10:36:04 beepd-re 0.wrap stopping iostate: 6 > >01/08 10:36:04 beepd-re 0.wrap deleting iostate: 6 > > > >What is a socket error, and how can I prevent it or recover from it? > > > >I am also seeing problems with frames getting lost, I'm not quite sure > >why yet but perhaps they are related. The traffic is over a lightly > >loaded 100TX network. > > > >>From the peer's perspective, I see messages like this: > >12/30 10:49:16 runberyl 2.wrap id=0 fd=5 revents=0x11/0x1 > >12/30 10:49:16 runberyl 0.wrap socket error: 5 > >12/30 10:49:16 runberyl 0.wrap stopping iostate: 5 > >12/30 10:49:16 runberyl 0.wrap deleting iostate: 5 > >12/30 10:49:29 runberyl 2.core start logging > >12/30 10:49:29 runberyl 0.wrap wrapper created: 5 > >12/30 10:49:29 runberyl 1.wrap wrote 150 octets > >[...] > >12/30 10:49:29 runberyl 1.wrap wrote 15 octets > >12/30 10:49:29 runberyl 1.wrap read 16 octets > >12/30 10:49:50 runberyl 0.wrap wrapper destroyed: 5 > >12/30 10:49:50 runberyl 0.wrap stopping iostate: 5 > >12/30 10:49:50 runberyl 0.wrap deleting iostate: 5 > >12/30 10:49:50 runberyl 2.core done logging > > > > > > > > > >------------------------------------------------------- > >-- David C. Niemi Adeptech Systems, Inc. -- > >-- Reston, Virginia, USA http://www.adeptech.com/ -- > >------------------------------------------------------- > > > > > > > >------------------------------------------------------- > >This SF.net email is sponsored by: Perforce Software. > >Perforce is the Fast Software Configuration Management System offering > >advanced branching capabilities and atomic changes on 50+ platforms. > >Free Eval! http://www.perforce.com/perforce/loadprog.html > >_______________________________________________ > >Beepcore-c-users mailing list > >Bee...@li... > >https://lists.sourceforge.net/lists/listinfo/beepcore-c-users > > > > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Perforce Software. > Perforce is the Fast Software Configuration Management System offering > advanced branching capabilities and atomic changes on 50+ platforms. > Free Eval! http://www.perforce.com/perforce/loadprog.html > _______________________________________________ > Beepcore-c-users mailing list > Bee...@li... > https://lists.sourceforge.net/lists/listinfo/beepcore-c-users > -- ------------------------------------------------------- -- David C. Niemi Adeptech Systems, Inc. -- -- Reston, Virginia, USA http://www.adeptech.com/ -- ------------------------------------------------------- |