|
From: David C N. <dc...@ad...> - 2004-01-09 22:55:48
|
Thanks for the info. I think I have figured out what the problem is, though it is not entirely clear to me how to solve it. Here's what I think is happening: All the action is in pro_frame_available() in my profile, which is dealing with frames, not whole messages. Due to the fragmentation bug in beepcore-c, even my small messages often get broken into several small frames during transmission (usually a few hundred bytes). To deal with this, I have a linked list tracking the status of each message by number as it comes in. But there is a fair amount of processing involved in locking the list and adding an entry in it, so I am not doing it immediately upon receiving the first frame in the message. Instead, I have been doing it towards the end of pro_frame_available when I have determined that this is the first frame but not the last frame (although come to think of it I know this earlier on). So anyway, what happens is that the first frame comes in, pfa() thinks about it a while, and then the second frame for the message comes in and pfa() gets called again. Both of these instances of pfa() think that it's a new message, and it's a race condition as to which one records it first, causing great confusion. It's possible the dropped sessions are due to this bug as well. To cure this, I really need to lock the linked list before I even pick up a frame, to prevent another instance of pfa() from sneaking in with a subsequent frame in the same message oblivious to the fact that the first frame has already been received. And if it is a new but not complete frame, I need to create the entry for it then. If this isn't a new one, it'll have to wait for a lock, which is good, because then by the time it can get the lock the new entry will be in place for it to find. Does this make sense? Is there anything about a frame that indicates intrinsically if it is the first frame in a message? DCN On Thu, 8 Jan 2004, Lei Zhang wrote: > Looks like your TCP connection is broken. revents=0x19/0x19 means > POLLIN | POLLERR | POLLHUP. > > I've been fiddling with beepcore-c for a few months, haven't noticed > frames getting lost - can you tell how to trigger that kind of error? > > Thanks, > Lei > > David C Niemi wrote: > > >I am again working on my application built on beepcore-c (under Linux). > >I occasionally have BEEP sessions (or at least channels) just drop for no > >apparent reason. The log file shows this: > > > >01/08 10:33:00 beepd-re 2.core start logging > >01/08 10:33:00 beepd-re 2.wrap loaded profile for http://www.adeptech.com/beryllium/BERYL/RECEIVE > >01/08 10:33:00 beepd-re 1.wrap duplicate library libberyl.so, continuing... > >01/08 10:33:00 beepd-re 2.wrap loaded profile for http://www.adeptech.com/beryllium/BERYL/SEND > >01/08 10:33:00 beepd-re 2.wrap listening on 5 (backlog 128) > >01/08 10:33:32 beepd-re 0.wrap wrapper created: 6 > >01/08 10:33:32 beepd-re 1.wrap wrote 218 octets > >01/08 10:33:32 beepd-re 1.wrap wrote 15 octets > >01/08 10:33:33 beepd-re 1.wrap read 67 octets > >[...] > >01/08 10:36:04 beepd-re 1.wrap read 18 octets > >01/08 10:36:04 beepd-re 1.wrap wrote 395 octets > >01/08 10:36:04 beepd-re 1.wrap wrote 1913 octets > >01/08 10:36:04 beepd-re 1.wrap wrote 353 octets > >[normal so far] > >01/08 10:36:04 beepd-re 2.wrap id=0 fd=6 revents=0x19/0x19 > >01/08 10:36:04 beepd-re 0.wrap socket error: 6 > >01/08 10:36:04 beepd-re 0.wrap stopping iostate: 6 > >01/08 10:36:04 beepd-re 0.wrap deleting iostate: 6 > > > >What is a socket error, and how can I prevent it or recover from it? > > > >I am also seeing problems with frames getting lost, I'm not quite sure > >why yet but perhaps they are related. The traffic is over a lightly > >loaded 100TX network. > > > >>From the peer's perspective, I see messages like this: > >12/30 10:49:16 runberyl 2.wrap id=0 fd=5 revents=0x11/0x1 > >12/30 10:49:16 runberyl 0.wrap socket error: 5 > >12/30 10:49:16 runberyl 0.wrap stopping iostate: 5 > >12/30 10:49:16 runberyl 0.wrap deleting iostate: 5 > >12/30 10:49:29 runberyl 2.core start logging > >12/30 10:49:29 runberyl 0.wrap wrapper created: 5 > >12/30 10:49:29 runberyl 1.wrap wrote 150 octets > >[...] > >12/30 10:49:29 runberyl 1.wrap wrote 15 octets > >12/30 10:49:29 runberyl 1.wrap read 16 octets > >12/30 10:49:50 runberyl 0.wrap wrapper destroyed: 5 > >12/30 10:49:50 runberyl 0.wrap stopping iostate: 5 > >12/30 10:49:50 runberyl 0.wrap deleting iostate: 5 > >12/30 10:49:50 runberyl 2.core done logging > > > > > > > > > >------------------------------------------------------- > >-- David C. Niemi Adeptech Systems, Inc. -- > >-- Reston, Virginia, USA http://www.adeptech.com/ -- > >------------------------------------------------------- > > > > > > > >------------------------------------------------------- > >This SF.net email is sponsored by: Perforce Software. > >Perforce is the Fast Software Configuration Management System offering > >advanced branching capabilities and atomic changes on 50+ platforms. > >Free Eval! http://www.perforce.com/perforce/loadprog.html > >_______________________________________________ > >Beepcore-c-users mailing list > >Bee...@li... > >https://lists.sourceforge.net/lists/listinfo/beepcore-c-users > > > > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Perforce Software. > Perforce is the Fast Software Configuration Management System offering > advanced branching capabilities and atomic changes on 50+ platforms. > Free Eval! http://www.perforce.com/perforce/loadprog.html > _______________________________________________ > Beepcore-c-users mailing list > Bee...@li... > https://lists.sourceforge.net/lists/listinfo/beepcore-c-users > -- ------------------------------------------------------- -- David C. Niemi Adeptech Systems, Inc. -- -- Reston, Virginia, USA http://www.adeptech.com/ -- ------------------------------------------------------- |
|
From: David C N. <dc...@ad...> - 2004-01-10 18:58:32
|
On Fri, 9 Jan 2004, Lei Zhang wrote: > I've noticed this too. I changed the threading model to one thread per > channel a couple of months ago, that probably explains why I'm not > having the problem that's bothering David. As far as I can see, this is NOT what thread_per_channel does -- without it, there would be only one thread allocated total, as opposed to one per channel. Although pfa() does seem to get called multiple times concurrently, evidently meaning there are multiple threads. ------------------------------------------------------- -- David C. Niemi Adeptech Systems, Inc. -- -- Reston, Virginia, USA http://www.adeptech.com/ -- ------------------------------------------------------- |
|
From: Lei Z. <lz...@ju...> - 2004-01-10 23:44:07
|
I did not mean thread_per_channel, I meant "thread per channel" - one worker thread for each channel a session contains. It's probably 10 - 20 lines of wrapper code change. BTW, there is memory leak/socket leak/thread leak in the beepcore-c library (or should I say pilot error). An example: the worker threads probably should call pthread_detach() as the first thing... Lei On Sat, 10 Jan 2004, David C Niemi wrote: > On Fri, 9 Jan 2004, Lei Zhang wrote: > > I've noticed this too. I changed the threading model to one thread per > > channel a couple of months ago, that probably explains why I'm not > > having the problem that's bothering David. > > As far as I can see, this is NOT what thread_per_channel does -- without > it, there would be only one thread allocated total, as opposed to one per > channel. Although pfa() does seem to get called multiple times > concurrently, evidently meaning there are multiple threads. |
|
From: David C N. <dc...@ad...> - 2004-01-12 05:27:56
|
I believe I've solved (or at least papered over) the problems I was seeing by doing my own rigorous locking before I even start looking at a frame in pfa(). Now only one thread can get anywhere useful in pfa() on a given message at one time, and I am not misplacing messages, nor am I getting dropped sessions, at least not in moderate testing. Although it would seem this sort of per-message locking should not even be needed, since I've worked around it, the main problem I have to complain about now is the still unexplained frame fragmentation that beepcore-c is doing to even small (<2K) messages. I expect it hurts performance and should be something I can avoid. I may have time to pursue it myself in a few months, but in the mean time everyone should be aware that it is lurking. For those people doing message-oriented stuff rather than frame-oriented, I'd guess it is afflicting you as well, but it is low-level enough you don't notice. ------------------------------------------------------- -- David C. Niemi Adeptech Systems, Inc. -- -- Reston, Virginia, USA http://www.adeptech.com/ -- ------------------------------------------------------- |
|
From: David C N. <dc...@ad...> - 2004-01-12 19:44:27
|
I am trying to do some smarter use of BEEP's config file support in my
application. Is there some easy way for a profile to find out what
"dataname" its configuration came from? I would like to make use of this.
Without knowing the "dataname", how can one reliably know what to load in
for various parameters? config_get seems to require an exact path to
where the parameter you want is. So for example:
<beep ...>
<profiles>
...
</profiles>
<myappname>
<application>
<dataname1>
<criticalitem>foo1</criticalitem>
</dataname1>
<dataname2>
<criticalitem>foo2</criticalitem>
</dataname2>
</application>
</myappname>
</beep>
I'd like the profile to be able to figure out that its dataname is
"dataname1", and to figure out that its criticalitem is "foo1", as opposed
to "foo2". do I need to use the config_search_* functions? Or can I look
for a disembodied <criticalitem> and expect that the tree is pruned to
just the dataname that applies to this profile instance?
-------------------------------------------------------
-- David C. Niemi Adeptech Systems, Inc. --
-- Reston, Virginia, USA http://www.adeptech.com/ --
-------------------------------------------------------
|
|
From: William J. M. <wm...@es...> - 2004-01-09 23:12:33
|
David, Is the threading model in the wrapper broken to the point where two pfa() get called on the same channel in different threads? That would be bad. Yes, there should be header information in the frame telling you exactly what is going on, and they get delivered in sequence. The specifics are in the rfc, theres a single char to indicate continuation. Why use pfa instead of pma? -bill I beielev you get the header infor for the frame as well? On Fri, Jan 09, 2004 at 05:55:34PM -0500, David C Niemi wrote: > > Thanks for the info. > > I think I have figured out what the problem is, though it is not entirely > clear to me how to solve it. Here's what I think is happening: > > All the action is in pro_frame_available() in my profile, which is dealing > with frames, not whole messages. Due to the fragmentation bug in > beepcore-c, even my small messages often get broken into several small > frames during transmission (usually a few hundred bytes). > > To deal with this, I have a linked list tracking the status of each > message by number as it comes in. But there is a fair amount of > processing involved in locking the list and adding an entry in it, so I am > not doing it immediately upon receiving the first frame in the message. > Instead, I have been doing it towards the end of pro_frame_available when > I have determined that this is the first frame but not the last frame > (although come to think of it I know this earlier on). > > So anyway, what happens is that the first frame comes in, pfa() thinks > about it a while, and then the second frame for the message comes in and > pfa() gets called again. Both of these instances of pfa() think that it's > a new message, and it's a race condition as to which one records it first, > causing great confusion. It's possible the dropped sessions are due to > this bug as well. > > To cure this, I really need to lock the linked list before I even pick up > a frame, to prevent another instance of pfa() from sneaking in with a > subsequent frame in the same message oblivious to the fact that the first > frame has already been received. And if it is a new but not complete > frame, I need to create the entry for it then. If this isn't a new one, > it'll have to wait for a lock, which is good, because then by the time it > can get the lock the new entry will be in place for it to find. Does this > make sense? > > Is there anything about a frame that indicates intrinsically if it is the > first frame in a message? > > DCN > > On Thu, 8 Jan 2004, Lei Zhang wrote: > > Looks like your TCP connection is broken. revents=0x19/0x19 means > > POLLIN | POLLERR | POLLHUP. > > > > I've been fiddling with beepcore-c for a few months, haven't noticed > > frames getting lost - can you tell how to trigger that kind of error? > > > > Thanks, > > Lei > > > > David C Niemi wrote: > > > > >I am again working on my application built on beepcore-c (under Linux). > > >I occasionally have BEEP sessions (or at least channels) just drop for no > > >apparent reason. The log file shows this: > > > > > >01/08 10:33:00 beepd-re 2.core start logging > > >01/08 10:33:00 beepd-re 2.wrap loaded profile for http://www.adeptech.com/beryllium/BERYL/RECEIVE > > >01/08 10:33:00 beepd-re 1.wrap duplicate library libberyl.so, continuing... > > >01/08 10:33:00 beepd-re 2.wrap loaded profile for http://www.adeptech.com/beryllium/BERYL/SEND > > >01/08 10:33:00 beepd-re 2.wrap listening on 5 (backlog 128) > > >01/08 10:33:32 beepd-re 0.wrap wrapper created: 6 > > >01/08 10:33:32 beepd-re 1.wrap wrote 218 octets > > >01/08 10:33:32 beepd-re 1.wrap wrote 15 octets > > >01/08 10:33:33 beepd-re 1.wrap read 67 octets > > >[...] > > >01/08 10:36:04 beepd-re 1.wrap read 18 octets > > >01/08 10:36:04 beepd-re 1.wrap wrote 395 octets > > >01/08 10:36:04 beepd-re 1.wrap wrote 1913 octets > > >01/08 10:36:04 beepd-re 1.wrap wrote 353 octets > > >[normal so far] > > >01/08 10:36:04 beepd-re 2.wrap id=0 fd=6 revents=0x19/0x19 > > >01/08 10:36:04 beepd-re 0.wrap socket error: 6 > > >01/08 10:36:04 beepd-re 0.wrap stopping iostate: 6 > > >01/08 10:36:04 beepd-re 0.wrap deleting iostate: 6 > > > > > >What is a socket error, and how can I prevent it or recover from it? > > > > > >I am also seeing problems with frames getting lost, I'm not quite sure > > >why yet but perhaps they are related. The traffic is over a lightly > > >loaded 100TX network. > > > > > >>From the peer's perspective, I see messages like this: > > >12/30 10:49:16 runberyl 2.wrap id=0 fd=5 revents=0x11/0x1 > > >12/30 10:49:16 runberyl 0.wrap socket error: 5 > > >12/30 10:49:16 runberyl 0.wrap stopping iostate: 5 > > >12/30 10:49:16 runberyl 0.wrap deleting iostate: 5 > > >12/30 10:49:29 runberyl 2.core start logging > > >12/30 10:49:29 runberyl 0.wrap wrapper created: 5 > > >12/30 10:49:29 runberyl 1.wrap wrote 150 octets > > >[...] > > >12/30 10:49:29 runberyl 1.wrap wrote 15 octets > > >12/30 10:49:29 runberyl 1.wrap read 16 octets > > >12/30 10:49:50 runberyl 0.wrap wrapper destroyed: 5 > > >12/30 10:49:50 runberyl 0.wrap stopping iostate: 5 > > >12/30 10:49:50 runberyl 0.wrap deleting iostate: 5 > > >12/30 10:49:50 runberyl 2.core done logging > > > > > > > > > > > > > > >------------------------------------------------------- > > >-- David C. Niemi Adeptech Systems, Inc. -- > > >-- Reston, Virginia, USA http://www.adeptech.com/ -- > > >------------------------------------------------------- > > > > > > > > > > > >------------------------------------------------------- > > >This SF.net email is sponsored by: Perforce Software. > > >Perforce is the Fast Software Configuration Management System offering > > >advanced branching capabilities and atomic changes on 50+ platforms. > > >Free Eval! http://www.perforce.com/perforce/loadprog.html > > >_______________________________________________ > > >Beepcore-c-users mailing list > > >Bee...@li... > > >https://lists.sourceforge.net/lists/listinfo/beepcore-c-users > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: Perforce Software. > > Perforce is the Fast Software Configuration Management System offering > > advanced branching capabilities and atomic changes on 50+ platforms. > > Free Eval! http://www.perforce.com/perforce/loadprog.html > > _______________________________________________ > > Beepcore-c-users mailing list > > Bee...@li... > > https://lists.sourceforge.net/lists/listinfo/beepcore-c-users > > > > -- > ------------------------------------------------------- > -- David C. Niemi Adeptech Systems, Inc. -- > -- Reston, Virginia, USA http://www.adeptech.com/ -- > ------------------------------------------------------- > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Perforce Software. > Perforce is the Fast Software Configuration Management System offering > advanced branching capabilities and atomic changes on 50+ platforms. > Free Eval! http://www.perforce.com/perforce/loadprog.html > _______________________________________________ > Beepcore-c-users mailing list > Bee...@li... > https://lists.sourceforge.net/lists/listinfo/beepcore-c-users |
|
From: Lei Z. <lz...@ju...> - 2004-01-09 23:23:30
|
William J. Mills wrote: >David, > >Is the threading model in the wrapper broken to the point where >two pfa() get called on the same channel in different threads? > I've noticed this too. I changed the threading model to one thread per channel a couple of months ago, that probably explains why I'm not having the problem that's bothering David. > >Why use pfa instead of pma? > I'm using pfa() for one of my profiles as well, because messages on that channel easily goes way over window_size. The 'How to write profile' documentation suggests to use pfa() in this situation after all... so I give up figuring out how beepcore-c's window resizing mechanism works... Lei |
|
From: William J. M. <wm...@es...> - 2004-01-09 23:25:14
|
Fair enough. :) On Fri, Jan 09, 2004 at 03:22:50PM -0800, Lei Zhang wrote: > William J. Mills wrote: > > >David, > > > >Is the threading model in the wrapper broken to the point where > >two pfa() get called on the same channel in different threads? > > > I've noticed this too. I changed the threading model to one thread per > channel a couple of months ago, that probably explains why I'm not > having the problem that's bothering David. > > > > >Why use pfa instead of pma? > > > I'm using pfa() for one of my profiles as well, because messages on that > channel easily goes way over window_size. The 'How to write profile' > documentation suggests to use pfa() in this situation after all... so I > give up figuring out how beepcore-c's window resizing mechanism works... > > Lei > > |
|
From: Lei Z. <lz...@ju...> - 2004-01-10 02:49:22
|
Just tripped over another beepcore-c problem: race condition at session
shutdown time. I'm gonna use null-profile.c to explain the problem:
In null-profile.c:
pro_frame_available ()
{
1. f = bpc_query_frame(...)
2. /* do fancy processing */
3. bpc_frame_destroy();
}
In 2. a RPY can be sent to acknowlege a MSG; now the channel 0 thread
takes over and destroys the null-profile channel instance (because the
channel is considered QUIESCENT); now back to the above 3, bummer!! core
dump on GET_WRAPPER(inst->conn) then grabbing wrap->core_lock!
I'm now just working around this by doing bpc_frame_destroy() before
sending RPY. Any better/real fix?
Thanks,
Lei
William J. Mills wrote:
>Fair enough. :)
>
|
|
From: Lei Z. <lz...@ju...> - 2004-01-10 03:21:00
|
bummer, scratch that. It's my bad, I modified
waitfor_chan_stat_quiescent() to debug another problem... sorry about
the fuss.
Lei Zhang wrote:
> Just tripped over another beepcore-c problem: race condition at
> session shutdown time. I'm gonna use null-profile.c to explain the
> problem:
>
> In null-profile.c:
> pro_frame_available ()
> {
> 1. f = bpc_query_frame(...)
>
> 2. /* do fancy processing */
>
> 3. bpc_frame_destroy();
> }
>
> In 2. a RPY can be sent to acknowlege a MSG; now the channel 0 thread
> takes over and destroys the null-profile channel instance (because the
> channel is considered QUIESCENT); now back to the above 3, bummer!!
> core dump on GET_WRAPPER(inst->conn) then grabbing wrap->core_lock!
>
> I'm now just working around this by doing bpc_frame_destroy() before
> sending RPY. Any better/real fix?
>
|