You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(70) |
Aug
(15) |
Sep
|
Oct
(2) |
Nov
(47) |
Dec
(12) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(5) |
Feb
(1) |
Mar
(15) |
Apr
(16) |
May
(2) |
Jun
|
Jul
|
Aug
(8) |
Sep
|
Oct
|
Nov
|
Dec
|
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2004 |
Jan
|
Feb
(1) |
Mar
(7) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Brian D H. <bdh...@c4...> - 2001-07-18 06:35:25
|
Chuck, I guess my thinking is that we don't want to filter to tighly based on the current spec. As you report there are a lot of technically non-compliant packets out there. Since aprsd is serving in a transport function of the network it shouldn't (IMHO) be the primary protocol policing method. If we filter too tightly on the spec I think we limit the experimentation that can be done with the network and cause a lot of extra work as the spec evolves. I think we've done a good job of filtering in the current code. We make sure that the packet looks/smells like the proper general format, that it isn't long enough to overflow the buffers of a program without strenuous buffer checking, and we've made sure that the ax25Source call appears reasonable. If we go much further I think we create a continuing mess as we try to keep up with changes and new applications. Would it be reasonable (and/or a "good idea") to operate under the concept that it's up to the software generating a packet to ensure it is compliant with the spec, up to the software parser on the client end to decide what it can deal with, and up to the transport implementation (aprsd) to filter out garbage and do very basic sanity checking? I'm thinking along the lines of TCP/IP transport server implementations (INN and Sendmail come to mind). As long as the protocol looks right at the outer layers they don't sniff into the payload to see if it's clean. I've got an image in my head of someone coming up with the next neat'o - super cool - whiter whites - brighter brights implementation. Unfortunately in some cases it can be non-spec compliant. The mob with torches end up at our doors (virtual of course) demanding a new aprsd release to allow the new implementation and they want it yesterday. Of course they don't want to tweak the code themselves. <g> These are just my thoughts. Whatever the group decides I'm happy to help implement. FWIW - I finally got my offer letter from the company in DFW. I'll be moving this weekend so I may be off-line for a few days. 73/N5VFF -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 On 2001.07.17 12:25 Chuck Byam wrote: SNIP - SNIP - SNIP > Mmmk, I've been working in this part of the code over the past couple of > days. What I need is discussion on is how strict should we make our > checks. For example, what I've done with message packets is tested for a > > length of 69 bytes, check for illegal chars (|~{), preserved the id > ({xxx), > and truncated the rest of the message. I've done similar chops in other > areas as well, eg, poisition reports. What I'm finding is a lot of > packets > are being truncated. I can see where some folks may get upset about > this... but on the other hand, it's spec. > > Chuck > > > _______________________________________________ > Aprsd-devel mailing list > Apr...@li... > http://lists.sourceforge.net/lists/listinfo/aprsd-devel > |
From: Hamish M. <ha...@cl...> - 2001-07-18 04:19:04
|
On Mon, Jul 16, 2001 at 10:31:18PM -0600, Brian D Heaton wrote: > We will ensure that we only delete the "*". I'm testing this now, > but wanted to make sure I understand the implementation. Can you give me an example of a packet which will be discarded by this check? I'm still on holiday in NZ, and not really thinking about APRS at all :-) Hamish -- Hamish Moffatt VK3SB <ha...@de...> <ha...@cl...> |
From: Brian D H. <bdh...@c4...> - 2001-07-18 04:14:39
|
Chuck, This email is even longer... <g> On 2001.07.17 13:30 Chuck Byam wrote: > I've seen this happen a couple of times and the out come is always the > same. > In other words I think I've see the symptom of our problem. Please > excuse > the long post: > > Server Up Time = 1.1 hours > Total TNC packets = 0 > TNC stream rate = 0 bytes/sec > Msgs gated to RF = 0 > Connect count = 129 > Users = 71 > Peak Users = 78 > APRS Stream rate = 1.1 Kbps > Server load = 31.2 Kbps > History Items = 3348 > TAprsString Objs = 3348 > Items in InetQ = 0 > InetQ overflows = 0 > TncQ overflows = 0 > conQ overflows = 0 > charQ overflow = 0 > Hist. dump aborts = 02 > > .... > > Session overrun (w5ks) > Session overrun (KB2QHA-2) > SNIP - SNIP - SNIP > ... This goes on for nearly all connections > > ... lots of session throttles > > ... more overruns and disconnects > I don't suppose you've got any way of looking at traffic on the campus backbone you are connected to? MRTG might be interesting, but the default 5-minute averaging might skew things a bit. Alternatively it might be interesting to keep continuous pings (say 5 secs apart) running to each of the IGATEs that you create outbound connections to. The goal would be to discover whether it's a problem on the box (possibly kernel or TCP/IP stack related), in the network between the hosts, or on the distant host. Since it appears to happen to all hosts at once I think we can rule out the distant host. Also since it affects all hosts simultaneously we can likely rule out Internet difficulties beyond the interface router of the provider which supplies the IP bandwidth to the university. If the university is multi-homed then we can step back even further into the network. In general since it's affecting all hosts simultaneously I would start looking towards first.aprs.net from where the network becomes highly redundant and/or multi-homed. FOR THE NETWORK CASE: I don't recall if the backbone you are connected to is L2 switched. If not (or at least if there are a decent number of hosts sharing your segment) then if you've got another host available to run EtherApe then it might be interesting. Even more interesting would be some sniffer traces of the network activity at the time of the event. FOR THE HOST (first.aprs.net) CASE: Does the Ethernet interface have anything interesting in it's stats. I'm primarily thinking blocked packets and/or overruns. I'm wondering if we may have the same type of situation as Dale found with setting the socket to non-blocking, but manifesting itself on the primary stream connections in this case. Beyond what you can get from netstat and ifconfig I think ntop would have the most interesting output for looking at this. Before we go bonkers running through the code I think we should eliminate the network as a possible cause. I've got some more notes below on the queue overflows as I've seen this in my stress testing. > Server Up Time = 1.2 hours > Total TNC packets = 0 > TNC stream rate = 0 bytes/sec > Msgs gated to RF = 0 > Connect count = 158 > Users = 21 > Peak Users = 78 > APRS Stream rate = 1.7 Kbps > Server load = 0.0 Bps > History Items = 3156 > TAprsString Objs = 4181 > Items in InetQ = 1024 > InetQ overflows = 1797 > TncQ overflows = 0 > conQ overflows = 0 > charQ overflow = 0 > Hist. dump aborts = 0 > > Now note the server load (0) and connectons (this happens to be the > number of > igates + 1) Even more interesting (to me at least) here is that the difference between the History Items (3156) and the TAprsString Objs (4181) is exactly the size of the InetQ. I've created the same situation on my test box and once they start diverging you can watch the InetQ fill up slowly and the History/TAprsString counts diverge at exactly the same rate. Since items are both pulled off the queue and added to the history list in the "DeQueue" thread that makes me think it's a likely place to look for suspects. As I read it the flow looks something like this: 1 - Loop awaiting senqueue.ready 2 - Pop an item off the sendqueue 3 - dupcheck the item 4 - Test to see if the item should go in the history list 5 - if it should place the item in the history list 6 - Send it out via SendToAllClients I'm guessing that either the DeQueue thread is dieing (Need to figure out which it is and check for the pid in this scenario); the thread is deadlocking (possibly on a mutex lock); the socket is overflowing; or we are getting a non-reentrant case from another thread calling one of the involved functions. Functions noted are: SendToAllClients - Only called in the DeQueue thread sendQueue.ready - Only caled in the DeQueue thread sendQueue.read - Only called in the DeQueue thread AddHistoryItem - Only called in DeQueue thread dupFilter.check - Called in DeQueue and DeQueueTNC Interesting, I just took a core dump on my test box and it looks like it was unable to unlock pmtxHistory at the "getPositAndUpdate" tag. I guess the short version is that I'm suspicious of both the history routines (especially how the pmtxHistory lock is handled) and the non-blocking socket. > > ... > 24.23.210.235 has connected to port 23 > 24.23.210.235 has connected to port 23 > 199.227.86.221 has connected to port 23 > 24.23.210.235 has connected to port 23 > 24.177.214.61 has connected to port 23 > 206.159.119.88 has connected to port 10151 > 24.23.210.235 has connected to port 23 > 24.23.210.235 has connected to port 23 > 24.177.214.61 has connected to port 23 > 24.23.210.235 has connected to port 23 > ... > > This continues until the maxclient limit is reached and I start getting > the > "error creating new client thread" > > Note the multiple connects from the same host. > > Now its off to see why this is happening... > Are the multiple connects from a subset of the total host table at the time of the event? I would be curious to figure out if they might all the same type of IGATE/Client software. Are you still on the 2.4.2SMP kernel you started with? I'd be curious if there is any change under a newer release. Also, I don't recall if you ever told me what Ethernet board you were running. There has been some traffic on LKML lately about some problems with SMP and specific Ethernet boards. Probably enough babble. We now return you to your regularly scheduled head scratching and staring at code.. <g> 73/N5VFF -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 |
From: Chuck B. <cb...@vi...> - 2001-07-17 19:29:55
|
I've seen this happen a couple of times and the out come is always the same. In other words I think I've see the symptom of our problem. Please excuse the long post: Server Up Time = 1.1 hours Total TNC packets = 0 TNC stream rate = 0 bytes/sec Msgs gated to RF = 0 Connect count = 129 Users = 71 Peak Users = 78 APRS Stream rate = 1.1 Kbps Server load = 31.2 Kbps History Items = 3348 TAprsString Objs = 3348 Items in InetQ = 0 InetQ overflows = 0 TncQ overflows = 0 conQ overflows = 0 charQ overflow = 0 Hist. dump aborts = 02 .... Session overrun (W0IBM) Session overrun (W0IBM) Session overrun (W0IBM) Session overrun (w5ks) Session overrun (W0IBM) Session overrun (LC3VAT) Session overrun (w5ks) Session overrun (W8MSU-10) Session overrun (w9da) Session overrun (VE3ZRD) Session overrun (W0IBM) Session overrun (N2UTH) Session overrun (ON4AWV-12) Session overrun (LC3VAT) Session overrun (w5ks) Session overrun (KB2QHA-2) Session overrun (W8MSU-10) Session overrun (KF3DY-2) Session overrun (w9da) Session overrun (VE3ZRD) Session overrun (W0IBM) Session overrun (N2UTH) Session overrun (ON4AWV-12) Session overrun (LC3VAT) Session overrun (w5ks) Session overrun (KB2QHA-2) ... This goes on for nearly all connections ... lots of session throttles ... more overruns and disconnects Server Up Time = 1.2 hours Total TNC packets = 0 TNC stream rate = 0 bytes/sec Msgs gated to RF = 0 Connect count = 158 Users = 21 Peak Users = 78 APRS Stream rate = 1.7 Kbps Server load = 0.0 Bps History Items = 3156 TAprsString Objs = 4181 Items in InetQ = 1024 InetQ overflows = 1797 TncQ overflows = 0 conQ overflows = 0 charQ overflow = 0 Hist. dump aborts = 0 Now note the server load (0) and connectons (this happens to be the number of igates + 1) ... 24.23.210.235 has connected to port 23 24.23.210.235 has connected to port 23 199.227.86.221 has connected to port 23 24.23.210.235 has connected to port 23 24.177.214.61 has connected to port 23 206.159.119.88 has connected to port 10151 24.23.210.235 has connected to port 23 24.23.210.235 has connected to port 23 24.177.214.61 has connected to port 23 24.23.210.235 has connected to port 23 ... This continues until the maxclient limit is reached and I start getting the "error creating new client thread" Note the multiple connects from the same host. Now its off to see why this is happening... Chuck |
From: Chuck B. <cb...@vi...> - 2001-07-17 18:28:03
|
On Tuesday 17 July 2001 00:31, Brian D Heaton wrote: :: As currently implemented aprsd doesn't check or comply with :: "NOGATE" in the ax25Path. Based on the current thread on aprssig is :: this something we want to do? I think we could implement it with (with :: the other filtering code in aprsString.cpp): :: -------------- :: if (ax25Path.find('NOGATE') != npos) { :: aprsType = APRSERROR; :: return'; :: } :: -------------- That's doable. We could add it as an option in the conf. :: :: A second concern is the current way we are stripping "*" from the :: ax25Source field of packets. At present it appears to be erasing the :: full ax25Source of the packet and thus the other filtering code is :: marking it as an error packet. I've got badpacket logging turned on and :: I see everything with a "*" in the ax25Source field (mostly digi ID's) :: being dropped. Thus we are dropping any packets directly heard by the :: IGATE. :: :: As currently implemented it looks like this: :: :: ------------------- :: if (int nfind = ax25Source.find_first_of('*') <= ax25Source.length()) { :: //cerr << "Found * in source at position: " << nfind << endl; :: ax25Source.erase(nfind); :: } :: :: ------------------- :: :: I think if we change to: :: :: -------------------- :: if (int nfind = ax25Source.find_first_of('*') <= ax25Source.length()) { :: //cerr << "Found * in source at position: " << nfind << endl; :: ax25Source.erase(nfind,1); :: } :: -------------------- :: :: We will ensure that we only delete the "*". I'm testing this now, :: but wanted to make sure I understand the implementation. :: Mmmk, I've been working in this part of the code over the past couple of days. What I need is discussion on is how strict should we make our checks. For example, what I've done with message packets is tested for a length of 69 bytes, check for illegal chars (|~{), preserved the id ({xxx), and truncated the rest of the message. I've done similar chops in other areas as well, eg, poisition reports. What I'm finding is a lot of packets are being truncated. I can see where some folks may get upset about this... but on the other hand, it's spec. Chuck |
From: Brian D H. <bdh...@c4...> - 2001-07-17 05:05:41
|
Actually my first thought on cleaning the "*"s or "}"s from the ax25Source didn't work either. The following does: -------------- if (ax25Source.find_first_of("*") <= ax25Source.length()) { int nfind = ax25Source.find_first_of("*"); //cerr << "Found * in source at position: " << nfind << endl; ax25Source.erase(nfind,1); } -------------- Something didn't like the initial "int nfind" portion of the IF conditional. Everytime it matched was returning position of "1" for either "*" or "}". I've got this running on the test machine now and it's simply deleting the character without any other adverse affects. I'll run some more and if it looks clean I'll commit it. 73/N5VFF -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 On 2001.07.16 22:31 Brian D Heaton wrote: > A second concern is the current way we are stripping "*" from the > ax25Source field of packets. At present it appears to be erasing the > full > ax25Source of the packet and thus the other filtering code is marking it > as > an error packet. I've got badpacket logging turned on and I see > everything > with a "*" in the ax25Source field (mostly digi ID's) being dropped. > Thus > we are dropping any packets directly heard by the IGATE. ---- SNIP SNIP------ > I think if we change to: > > -------------------- > if (int nfind = ax25Source.find_first_of('*') <= ax25Source.length()) { > //cerr << "Found * in source at position: " << nfind << endl; > ax25Source.erase(nfind,1); > } > -------------------- > > We will ensure that we only delete the "*". I'm testing this > now, > but wanted to make sure I understand the implementation. > > 73/N5VFF > > > > -- > ============================================================ > Brian D Heaton | I fear that we have awakened > Principal Consultant | a sleeping giant and instilled > C4I2.com System Consultants | in him a terrible resolve. > bdh...@c4... | -- Admiral Isoruku Yamamoto > USA (719) 623-0381 | -- Imperial Japanese Navy > UK +44 (0)845 127-5400 | -- December 7, 1941 > > _______________________________________________ > Aprsd-devel mailing list > Apr...@li... > http://lists.sourceforge.net/lists/listinfo/aprsd-devel > |
From: Brian D H. <bdh...@c4...> - 2001-07-17 04:22:02
|
As currently implemented aprsd doesn't check or comply with "NOGATE" in the ax25Path. Based on the current thread on aprssig is this something we want to do? I think we could implement it with (with the other filtering code in aprsString.cpp): -------------- if (ax25Path.find('NOGATE') != npos) { aprsType = APRSERROR; return'; } -------------- A second concern is the current way we are stripping "*" from the ax25Source field of packets. At present it appears to be erasing the full ax25Source of the packet and thus the other filtering code is marking it as an error packet. I've got badpacket logging turned on and I see everything with a "*" in the ax25Source field (mostly digi ID's) being dropped. Thus we are dropping any packets directly heard by the IGATE. As currently implemented it looks like this: ------------------- if (int nfind = ax25Source.find_first_of('*') <= ax25Source.length()) { //cerr << "Found * in source at position: " << nfind << endl; ax25Source.erase(nfind); } ------------------- I think if we change to: -------------------- if (int nfind = ax25Source.find_first_of('*') <= ax25Source.length()) { //cerr << "Found * in source at position: " << nfind << endl; ax25Source.erase(nfind,1); } -------------------- We will ensure that we only delete the "*". I'm testing this now, but wanted to make sure I understand the implementation. 73/N5VFF -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 |
From: Brian D H. <bdh...@c4...> - 2001-07-16 22:48:21
|
All, I've committed an update that contains updates to change "delete" to "delete[]" where required and to see variables to "NULL" after the delete. Let me know if I've missed any (I think there are still a "delete posit" and "delete telementry" that I didn't get the NULLs on. I'll wait until the HTTPStats settles down a bit to try fiddling with status reporting for queue high-water marks. 73/N5VFF -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 |
From: Dale H. <da...@wa...> - 2001-07-16 17:51:12
|
Well, the first attempt to fix the user list truncation didn't quite work. It drops some html in a couple of places but completed the page. That's progress :-) I've put yet another aprsd.cpp up which changes the socket to BLOCKING from non-blocking mode. Since there are no locked mutexs blocking shouldn't matter. I don't understand why the non-blocking mode didn't work. I also cleaned up the HTML generation so it now passes the w3c validator. http://validator.w3.org/ As usual the ultimate test is first.aprs.net. -- Dale Heatherington da...@wa... Web Page http://www.wa4dsy.net Sent by KMail for Linux |
From: Dale H. <da...@wa...> - 2001-07-15 19:10:32
|
I put a new version of aprsd.cpp on sourceforge. It has the changes to the html server to hopefully fix the truncated user status problem. It needs to be tested on first.aprs.net. -- Dale Heatherington da...@wa... Web Page http://www.wa4dsy.net Sent by KMail for Linux |
From: Chuck B. <cb...@vi...> - 2001-07-15 17:59:59
|
On Sunday 15 July 2001 11:47, Dale Heatherington wrote: :: Chuck, :: I dunno if two threads can change data at the same time in cpQueue. I :: think that's why the mutex is in there. But, I don't think there :: should be a wait mutex to cause the caller to pause if the queue is :: full. This will block the caller. He may have other resources tied up :: at the time. Who knows what sort of deadlocks might occur. :: << snip, snip >> Just reaching here. Deadlocks are the problem though. When I run first from the console eventually I'll see messages of not being able to create a new client thread. This occurs in TCPSessionThread and results from rc being != 0 from a call to pthread_create (either tcp server or http thread). << snip, snip >> :: Speaking of "delete[]"..... :: Actually I'm still a bit confused about delete[]. :: A char* is an array and needs the []. :: A string object is what? Internally it's an array :: but it was not declared an array so I assume :: it gets a plain "delete" ? aprsString would also get :: a plain "delete"? :: In C++ there are two kinds of pointers, pointers to a single object and pointers to an array of objects. The important thing is if your call to new uses [], so should your call to delete. Chuck |
From: Dale H. <da...@wa...> - 2001-07-15 15:47:29
|
Chuck, I dunno if two threads can change data at the same time in cpQueue. I think that's why the mutex is in there. But, I don't think there should be a wait mutex to cause the caller to pause if the queue is full. This will block the caller. He may have other resources tied up at the time. Who knows what sort of deadlocks might occur. The basic plan was to throw away data that could not be put in the queue, not wait until space was available. What happens to the data depends on the state of the "dyn" variable. If TRUE then the memory containing the data is freed. I should have also set the pointer to NULL but didn't. If "dyn" is false the data is simply ignored. The dyn setting is set at the time the queue is created. Potential pitfall... If the caller puts an item on the queue then uses the data afterwards he will be in trouble if the queue was full and the memory was freed. Once data is put on the queue it should be considered gone forever by the caller if the queue has the "dyn" flag set. Only the queue reader should access it. The queue reader must free the memory or pass it to another function that does. Only one queue reader is allowed. Stuff that needs work: delete needs [] Pointer should be set to NULL after delete. Speaking of "delete[]"..... Actually I'm still a bit confused about delete[]. A char* is an array and needs the []. A string object is what? Internally it's an array but it was not declared an array so I assume it gets a plain "delete" ? aprsString would also get a plain "delete"? On Saturday 14 July 2001 15:49, Chuck Byam wrote: > On Friday 13 July 2001 19:54, you wrote: > :: The killer bug that will not die. (sigh) > :: I had high hopes that fixing the > :: unterminated string problem was gonna really help. > :: > :: On Friday 13 July 2001 17:33, Chuck Byam wrote: > :: > On Friday 13 July 2001 15:55, you wrote: > :: > :: I see it's been running 1.1 hours now. good start. > :: > > :: > Well... after about 2.5 hours first is chewing up the CPU cycles. > :: > I'ts still accepting connections and handing out data, but one of the > :: > threads is hogging the CPU (88% with top running). > > Your changes may very well have fixed an issue, that being the segfaults. > What this is, I think is a race condition that occurs between two (or more) > threads. I've been looking at the cpqueue code and trying to figure out if > it's possible for two threads to change data there at the same time. Look > at the attached and let me know if it make any sense to you. It > essentially provides wait variables that has the caller wait until a > condition is true, in this case whether the queue is full or empty. > > Chuck > > int cpQueue::write(char *cp, int n) > { > int rc=0; > > if (lock) > return -2; // Lock is only set true in the > destructor > > if(pthread_mutex_lock(mut) != 0) > cerr << "Unable to lock mut - cpQueue:Write-char *cp.\n" << flush; > > inWrite = 1; > int idx = write_p; > > while (base_p[idx].full) { > cerr << "Queue is full... waiting" << endl; > pthread_cond_wait(base_p[idx].notFull, mut); > } > > if (base_p[idx].rdy == false) { // Be sure not to overwrite old > stuff > base_p[idx].qcp = (void*)cp; // put char* on queue > base_p[idx].qcmd = n; // put int (cmd) on queue > base_p[idx].rdy = true; // Set the ready flag > base_p[idx].empty = false; > idx++; > itemsQueued++; > if (itemsQueued > HWitemsQueued) > HWitemsQueued = itemsQueued; > > if (idx >= size) > idx = 0; > > write_p = idx; > } else { > overrun++ ; > > if (dyn) > delete cp; > > rc = -1; > } > > inWrite = 0; > > if(pthread_mutex_unlock(mut) != 0) > cerr << "Unable to unlock mut - cpQueue:Write - char *cp.\n" << > flush; > > pthread_cond_signal(base_p[idx].notEmpty); > return(rc); > } > > void* cpQueue::read(int *ip) > { > if(pthread_mutex_lock(mut) != 0) > cerr << "Unable to lock mut - cpQueue:read - int.\n" << flush; > > while (base_p[read_p].empty) { // wait here if the queue is empty > cerr << "Queue empty... waiting." << endl; > pthread_cond_wait(base_p[read_p].notEmpty, mut); > } > > inRead = 1; > void* cp = base_p[read_p].qcp ; // Read the TAprsString* > > if (ip) > *ip = base_p[read_p].qcmd ; // read the optional integer > command > > base_p[read_p].qcp = NULL; // Set the data pointer to NULL > base_p[read_p].rdy = false; // Clear ready flag > read_p++; > itemsQueued--; > > if (read_p >= size) > read_p = 0; > > inRead = 0; > > if (pthread_mutex_unlock(mut) != 0) > cerr << "Unable to unlock mut - cpQueue:read - int.\n" << flush; > > pthread_cond_signal(base_p[read_p].notFull); > > return(cp); > } -- Dale Heatherington da...@wa... Web Page http://www.wa4dsy.net Sent by KMail for Linux |
From: Brian D H. <bdh...@c4...> - 2001-07-15 01:58:26
|
Chuck, I think Dale's on a much better track than I am here. FWIW, pmtxSend locks at the top of the loop and unlocks before the send in my current trial. It doesn't actually stay locked long at all. With my ancient P166 (on the same Enet segment unfortunately) which throttles SendHistory all they way down to 2400bps, it doesn't even cause a blip in the pace of data transmission. I may have to map a couple ports through my NAT and see what it does from the outside. I don't really have any good clues on the static variables/class definitions. I'll do a little more research and let you know if I find anything definitive. I've been looking at the Xastir code for some ideas. At least for a test I borrowed the idea of error-checking mutexes (debug path only). I'm still playing with it, but it looks promising. THX/BDH On 2001.07.14 18:14 Chuck Byam wrote: > While I haven't tried it yet, Dale's earlier comment about the socket > being > non-blocking may be the key to this problem. I wonder though, about > locking a function as busy as the http thread at the beginning and > leaving > it locked for so long (relatively). > > Speaking of thread-safe and re-entrant code... As I understand it, a > re-entrant function shouldn't have any variables declared static. Does > this apply to class definitions as well? And speaking of classes, what > are > your thoughts on creating a couple more classes to replace the structures > > declared at the top of aprsd.cpp? For example, one to handle sessions > and > history. -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 |
From: Brian D H. <bdh...@c4...> - 2001-07-15 01:58:16
|
Dale, Doh! I think you're much closer to the root cause than I am. For the transmit/wait loop would it make sense to reuse the SendHistory throttle up/down/sleep code from history.cpp? The HTTPstats won't be as long, but it might be a ready made drop-in. I've been testing it at up to 50 users on my development machines, but I've of course got much lower latency than across the net. I'm still able to cause a core dump. To do it i've got 50 sessions open and running a continuous: connect wait for end of history dump wait another 1-3 minutes disconnect loop to beginning After about 200-250 of these cycles I get a segfault and dump core. If I limit this same continuous cycle to 8-10 of the 50 sessions I can run without problems. THX/BDH On 2001.07.14 18:26 Dale Heatherington wrote: > > Well Brian, I too developed a theory about the html user status > truncation problem. Since it only > seems to show up on first.aprs.net which has lots of users and igate > connections > plus heavy cpu and network load, I think that the send() function is > returning > an errno code such as EAGAIN due to running out of buffers somewhere down > in the tcpip stack. Since the code does not check for errors - it just > keeps feeding > html ascii strings to send() (which, in theory, is not taking any) until > it's done then closes the > socket and exits the thread. > > I reworked the code so it builds the complete html page in memory > (actually a cpQueue object) > before sending anything. At this point all the mutex locks can be > unlocked. No more latency probs! I then > start sending the data and check for errors while doing so. If an error > happens I wait 1 second > and retry. After 5 seconds I give up and close the socket and exit. It > works here. I'm testing > to make sure there are no side effects or memory leaks. If all is well > I'll let Chuck test it on first. > > If you guys see any flaws with this theory or the fix pls let me know. -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 |
From: Dale H. <da...@wa...> - 2001-07-15 00:26:15
|
On Saturday 14 July 2001 17:47, Brian D Heaton wrote: > Dale, > > I would think that the result of the "unpredictable" behavior would > most likely be a memory leak. IIRC, Hamish went through and corrected a > bunch of them, but probably didn't hit all of them. I think it's > definitely in the "can't hurt" category and may well help. Ok. It doesn't sound too serious. I'll continue with what I was doing... > > I've also been looking at the trucation issue in the HTTPstats. > I'm currently playing with pmtxSend locks inside the loop for both the > IGATEs and USERs. I'm working from the theory that it may not be > thread-safe to be assigning variables to the output stream while they may > be modified by another thread. I initially started with the locks outside > the loop, but I'm concerned about the latency involved in the real world. > > 73/N5VFF Well Brian, I too developed a theory about the html user status truncation problem. Since it only seems to show up on first.aprs.net which has lots of users and igate connections plus heavy cpu and network load, I think that the send() function is returning an errno code such as EAGAIN due to running out of buffers somewhere down in the tcpip stack. Since the code does not check for errors - it just keeps feeding html ascii strings to send() (which, in theory, is not taking any) until it's done then closes the socket and exits the thread. I reworked the code so it builds the complete html page in memory (actually a cpQueue object) before sending anything. At this point all the mutex locks can be unlocked. No more latency probs! I then start sending the data and check for errors while doing so. If an error happens I wait 1 second and retry. After 5 seconds I give up and close the socket and exit. It works here. I'm testing to make sure there are no side effects or memory leaks. If all is well I'll let Chuck test it on first. If you guys see any flaws with this theory or the fix pls let me know. -- Dale Heatherington da...@wa... Web Page http://www.wa4dsy.net Sent by KMail for Linux |
From: Chuck B. <cb...@vi...> - 2001-07-15 00:16:43
|
On Saturday 14 July 2001 17:47, Brian D Heaton wrote: :: Dale, << snip, snip >> :: I've also been looking at the trucation issue in the HTTPstats. :: I'm currently playing with pmtxSend locks inside the loop for both the :: IGATEs and USERs. I'm working from the theory that it may not be :: thread-safe to be assigning variables to the output stream while they :: may be modified by another thread. I initially started with the locks :: outside the loop, but I'm concerned about the latency involved in the :: real world. While I haven't tried it yet, Dale's earlier comment about the socket being non-blocking may be the key to this problem. I wonder though, about locking a function as busy as the http thread at the beginning and leaving it locked for so long (relatively). Speaking of thread-safe and re-entrant code... As I understand it, a re-entrant function shouldn't have any variables declared static. Does this apply to class definitions as well? And speaking of classes, what are your thoughts on creating a couple more classes to replace the structures declared at the top of aprsd.cpp? For example, one to handle sessions and history. Chuck |
From: Chuck B. <cb...@vi...> - 2001-07-15 00:05:16
|
On Saturday 14 July 2001 16:49, Dale Heatherington wrote: :: Greetings. :: I have just been granted access to the sourceforge aprsd project. :: :: While working on the html status page generation code to fixup the :: truncation problem I learned that the delete operator needs a "[]" :: after it to work on arrays. My documentaton says results are :: unpredictable if just delete alone is used. So instead of "delete cp" :: it should be "delete [] cp". As you would expect, aprsd is full of cases :: where "delete" is used to free character arrays. Is this a real :: problem? Could this be the cause of come of the stability problems? I've been changing these as I've come across them. delete[] is technically only required when deleting a pointer to a char array as delete has no way of knowing what [class] cp is pointing to and some classes require specially handling. However, using the '[]' syntax is generally a good practice as well as setting the pointer to NULL. Chuck |
From: Brian D H. <bdh...@c4...> - 2001-07-14 21:38:24
|
Dale, I would think that the result of the "unpredictable" behavior would most likely be a memory leak. IIRC, Hamish went through and corrected a bunch of them, but probably didn't hit all of them. I think it's definitely in the "can't hurt" category and may well help. I've also been looking at the trucation issue in the HTTPstats. I'm currently playing with pmtxSend locks inside the loop for both the IGATEs and USERs. I'm working from the theory that it may not be thread-safe to be assigning variables to the output stream while they may be modified by another thread. I initially started with the locks outside the loop, but I'm concerned about the latency involved in the real world. 73/N5VFF -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 On 2001.07.14 14:49 Dale Heatherington wrote: > Greetings. > I have just been granted access to the sourceforge aprsd project. > > While working on the html status page generation code to fixup the > truncation problem I learned that the delete operator needs a "[]" > after it to work on arrays. My documentaton says results are > unpredictable > if just delete alone is used. So instead of "delete cp" it should be > "delete [] cp". > As you would expect, aprsd is full of cases where "delete" is used > to free character arrays. Is this a real problem? Could this be the > cause > of come of the stability problems? > > > -- > Dale Heatherington > da...@wa... > Web Page http://www.wa4dsy.net > Sent by KMail for Linux > > > _______________________________________________ > Aprsd-devel mailing list > Apr...@li... > http://lists.sourceforge.net/lists/listinfo/aprsd-devel > |
From: Dale H. <da...@wa...> - 2001-07-14 20:50:03
|
Greetings. I have just been granted access to the sourceforge aprsd project. While working on the html status page generation code to fixup the truncation problem I learned that the delete operator needs a "[]" after it to work on arrays. My documentaton says results are unpredictable if just delete alone is used. So instead of "delete cp" it should be "delete [] cp". As you would expect, aprsd is full of cases where "delete" is used to free character arrays. Is this a real problem? Could this be the cause of come of the stability problems? -- Dale Heatherington da...@wa... Web Page http://www.wa4dsy.net Sent by KMail for Linux |
From: Brian D H. <bdh...@c4...> - 2001-07-14 18:33:19
|
All, Here with random morning musings... Hamish's thoughts: 1 - new config file - Sounds good to me 2 - Multiple RF port support - Yes, Yes, and Yes. I think the RF portion should definitely be a seperate program. It could connect like any other user. This would make multiple band IGATEs much easier to implement. 3 - Internal Restructuring - Yep, I'm also for more C++. (even thought that means I have to learn more C++). I'd like to see aprsd.cpp broken up a bit too. 4- Better Filtering - I think our best bet here would perhaps to get "smarter filtering", but not overall more restrictive filtering. We could screw things down very tightly for compliance with aprs-spec, but I think we would greatly reduce the utility of the network for experimentation and development of new applications. I think if we took an approach of: A - Is packet reasonable: (contains ">" AND ":", ">" occurs before ":", origin callsign < 11 chars, destinatation call less than 10 chars, has something in the data field) B - "Do No Harm": length less than 256bytes, no unprintable characters (other than MIC-E required) - Do we want to get 8-bit clean at some point?? My thoughts: 1 - Single TCP port: Desired connection type could be specified on the command line. Alpa tags would identify known defined types. Numeric values could directly specify the ECHOMASK (for experimenters). We could then register that port (14439 comes to mind) with IANA. Since this would require client software changes the current port structure would be maintained for at least 1-2 years. In the long term I think this would give us much greater flexibility for new uses and advanced client software. 2 - Active loop detection and quenching: Yes, path headers. The current method relies very heavily on configuration and dup detection. I'll even volunteer to write some kind of general spec and implement the proof-of-concept code. Especially if we got to a single TCP port we have an easy route to implement this. I'm still noodling it around. 3 - Individual queues for each connection (say 50 entries). This would reduce the time spent in the DeQueue thread and in SendToAllClients. Once this is done we can also do a bunch of other nifty things in each thread. 4 - Geographic limited feeds - Give the client a means to specify that it only wants data from a specific geographic area. If we have the individual queues from above we can check each packet in the queue against the specified geographic area and send it to the client or dump it. This is going to add some processor load (perhaps a lot). I haven't thought about this one in depth yet. Perhaps the client could specify the area in an APRS object format. The thread would then pull a packet of it's queue, check it against the area, send if it matches, dump if not. This might be a big win for NWS and/or EOC type stations that need to serve areas outside RF range (ie - The TX state EOC for the Houston floods was in Austin), but don't want the worldwide feed knocking at their doors. At the moment I'm thinking of just squares and circles for area limits. If we only allow one bounding box/circle then most applications will have some out-of-area traffic (the state of Texas is a good example), but it will be orders of magnitude less than it is now. This is something that we probably would want to give the IGATE operator the choice of disabling in the config file since it could conceivably overload some boxes. 5 - Strong authentication support - This would need to be very optional. I think this needs to be kicked around until a good method is decided upon. It's probably something we should have on our todo list. 6 - "Contrib" files - While not directly tied to a 3.0.0 feature it would be nice to host a "Contrib" area where scripts that do things with the UDP port could be hosted in a single place. We've got a very powerful feature here. Probably enough babble for the moment... 73/N5VFF -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 On 2001.07.08 00:55 Hamish Moffatt wrote: > On Sat, Jul 07, 2001 at 04:36:58PM -0600, Brian D Heaton wrote: > > Hamish, you raise a couple interesting points. Once 2.2.0 gets > out > > the door do we want to compile a 3.0 (I'm guessing changes this major > would > > rate a new major release) wishlist and/or roadmap? The 2.2.x series > could > > continue with incremental improvements/bugfixes. The 3.0.0 release > could > > be the one with all the major rework. It would essentially become a > new > > branch under CVS. > > Sounds like a good plan. > > My biggest wish list for future versions would be.. > > 1. New configuration file format. Painful for users to upgrade, but > a perl script could do the conversion. I could write such a perl > script if needed. > > 2. Multiple RF ports. Not essential, but nice to have. I already started > work on it, but it's a major restructure of some of the code (new > classes > etc), so it won't be easy to take the other changes that have been > made > in CVS for 2.2.0 and apply them. > > Actually, it might be worth separating the RF interface into a separate > program which connects to the main APRS server like any other client.. > possibly on a special port if necessary. Then APRSD could be just a hub > program, and a separate program could handle a port. Multiple instances > of the port program gives you multiple ports. Bruno Quesnel VA2BMG > suggested this a while ago and I never gave it much thought until > recently. > > 3. Internal restructuring? > > Bruno did some work on APRSD for a university project.. his aim was > to remove all of the C++ code and convert it to plain C.. he wanted > to get it running on an SGI system he had which had poor C++ support. > > However I am of the opinion that we should be going for more C++ code, > not less. For the multiport version, I put all of the RF code into a > new class (with subclasses for an AX.25 interface or a serial TNC), > then had a linked list of port objects to handle multiple ports. > The history buffer also recorded what port a packet came in on. > > Similarly I reckon there's probably a lot in common between the > various TCP sockets available, so this might benefit from some > new classes. I haven't looked at this much, so I could be wrong. > > aprsd.cpp is too long. Lots of that code belongs in separate > source files, ideally (but not necessarily) as separate classes. > > 4. More/better filtering? > > > Hamish > -- > Hamish Moffatt VK3SB <ha...@de...> <ha...@cl...> > > _______________________________________________ > Aprsd-devel mailing list > Apr...@li... > http://lists.sourceforge.net/lists/listinfo/aprsd-devel > |
From: Chuck B. <cb...@vi...> - 2001-07-13 19:38:05
|
I'd like to welcome Dale Heatherington (WA4DSY) to the sourceforge project. Dale's expertise and insight in aprsd is a major assest to what we are trying to accomplish. Chuck |
From: Brian D H. <bdh...@c4...> - 2001-07-09 05:45:17
|
Chuck, just sent my latest mods up to CVS. They include: 1 - error checking on (almost) all pthread_mutex_lock/unlocks 2 - Change back to a 1ms delay in deQueue 3 - Comment out the nice(-10) in deQueue 4 - Comment out the lastpacket = abuff in deQueue 5 - change the name of the mutex in cpqueue.cpp/h to "pmtxQ" to jibe with the naming convention in all the other files. 6 - lock the pmtxCount mutex before AddHistory and DeleteOldItems calls. 7 - wrap some if(tncPresent)'s around the TNC checking and tncQueue.ready calls 8 - add the beginnings of some high-water variables to cpqueue.cpp/h. Next step will be to add reporting of these variables into the HTTPStats. This way not only can we see how many are in the queue now, but also see what the maximum amount of items in the queue has been since server start. Items 2/3 above are the main cause of the very high CPU utilization in the deQueue thread. It's running in a continuous loop with a 1ms sleep checking the queue and calling SendToAllClients so it should have the highest process utilization of all the aprsd threads. Based on the way the queue is locked it won't try to read from an empty queue. The only thing reading the queue is the deQueue thread and it uses the sendQueue.ready call to check before reading. sendQueue.ready won't be set unless there are items in the queue. Continuing to write to a queue in an overflow state is another matter. That part of what I'm hoping the highwater marks will help us scope out. I have found that the "lastpacket = abuff" code appears to be a part of the lock-up scenario. I've tried the current code with and without it. Without it I don't get the lockups. In the current debug context, since I'm not getting segfaults, the last packet isn't germane to the failures I'm seeing. I've left it in, but commented out for the time being. I also highly recommend you update your kernel on first. In testing I ran the current aprsd code (as sent to CVS) under 2.4.5-ac24 (equivalent to a late 2.4.6pre) and was able to lock it up in the previously described manner. Based on the changelogs in 2.4.6ac1/2 I upgraded to 2.4.6ac2. Since that upgrade, *running the same aprsd code* I've run twice as many heavy connect/disconnect cycles and haven't managed to lock it up yet. The VM changes from 2.4.2 up to 2.4.6ac2 are very significant. The latest 2.4.7pre may also work for you. I've been running the AC kernels as they are more friendly to AX25sockets and have some important fixes in for my VIA VP6 SMP motherboard that I'm not sure have fully made into Linus's tree. My next steps are: 1 - flesh out the highwater marks and get the reporting into the HTTPStats. I'm going to experiment with a new layout for the top section of the report. I'll post here when it's checked in and ya'll can tell me what you think. 2 - Add pid reporting to the startup stats as each server/deQueue thread is started. I'd also like to dump this information (on startup) to aprsd.log. It will be an additional 13-15 lines per start, but I think the additional data will be a big win. 3 - finish adding the error checking the the mutexes in dupCheck.cpp, rf.cpp, and utils.cpp Guess there were some more things I could do with the code. <g> THX/BDH -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 On 2001.07.08 21:31 Chuck Byam wrote: > Current server stats: > > uptime: 24.0 hours > current connections: 120 > Peak users: 127 > > top output: > 10:32pm up 17 days, 6:43, 7 users, load average: 2.42, 2.29, 2.13 > > (note load) > > 27479 root 17 0 11492 11M 1132 R 42.7 9.1 1073m aprsd > 1392 root 18 0 11492 11M 1132 R 42.7 9.1 162:54 aprsd > > These 2 threads are in a data race (my guess). Just not sure which > threads > they are. Based on their time, I'd guess the first is one of the initial > > threads created on startup. > > As usual nearly all segfaults occur in SendToAllClients, which we know is > > called from dequeue. Just thinking out loud here... but my guess is > there > is a problem with cpqueue where a contention is setup between two (or > more) > threads trying to access the queue. While I'm far from being proficient > > in threaded programming, I wonder if a [pthread] conditional variable > would > be in order in the queue class so that a thread would not try to write to > a > full queue or another thread wouldn't try to delete from an empty queue. > > While reading up on this it sounds like a classic "producer/consumer data > > race." > > In pseudocode: > > (thread writing data) > > pthread_mutex_lock(mutex); > while (queue->full) { > pthread_cond_wait(queue->notfull, queue->mutex) > // waits until condition met - notfull > } > queueAdd(data) > pthread_mutex_unlock(mutex) > pthread_cond_signal(queue->notempty) > > > (thread removing data) > > pthread_mutex_lock(mutex) > while (queue->empty) { > pthread_cond_wait(qaueue->notempty, queue->mutex) > // waits until condition met - not empty > } > queueDel(data) > pthread_mutex_unlock(mutex) > pthread_cond_signal(queue->notfull) > > > Thoughts? > > Chuck > > _______________________________________________ > Aprsd-devel mailing list > Apr...@li... > http://lists.sourceforge.net/lists/listinfo/aprsd-devel > |
From: Chuck B. <cb...@vi...> - 2001-07-09 03:32:22
|
On Sunday 08 July 2001 10:47, Brian D Heaton wrote: :: Below :: have observed the following symptoms on all my recent lockups: ::=A0=A0=A0=A0=A0=A0=A0=A0 1 - no segfault ::=A0=A0=A0=A0=A0=A0=A0=A0 2 - TAprsString/History object counters diverg= e by X (TAS > ::History) ::=A0=A0=A0=A0=A0=A0=A0=A0 3 - stats shows X (from above) in InetQ. ::=A0=A0=A0=A0=A0=A0=A0=A0 4 - although web stats are unreachable console= stats show items=20 ::in ::InetQ continues to grow until queue overflow. =A0 ::=A0=A0=A0=A0=A0=A0=A0=A0 5 - In 90% of the cases the console interpeter= is still active=20 ::and ::I can stop with "q" ::=A0=A0=A0=A0=A0=A0=A0=A0 Does this jibe with what anyone else is seeing= ? =A0I can cause this Current server stats: uptime: 24.0 hours current connections: 120 Peak users: 127 top output: 10:32pm up 17 days, 6:43, 7 users, load average: 2.42, 2.29, 2.13 (note load) 27479 root 17 0 11492 11M 1132 R 42.7 9.1 1073m aprsd 1392 root 18 0 11492 11M 1132 R 42.7 9.1 162:54 aprsd These 2 threads are in a data race (my guess). Just not sure which threa= ds=20 they are. Based on their time, I'd guess the first is one of the initial= =20 threads created on startup. As usual nearly all segfaults occur in SendToAllClients, which we know is= =20 called from dequeue. Just thinking out loud here... but my guess is ther= e=20 is a problem with cpqueue where a contention is setup between two (or mor= e)=20 threads trying to access the queue. While I'm far from being proficient= =20 in threaded programming, I wonder if a [pthread] conditional variable wou= ld=20 be in order in the queue class so that a thread would not try to write to= a=20 full queue or another thread wouldn't try to delete from an empty queue. = =20 While reading up on this it sounds like a classic "producer/consumer data= =20 race." In pseudocode: (thread writing data) pthread_mutex_lock(mutex); while (queue->full) { pthread_cond_wait(queue->notfull, queue->mutex) // waits until condition met - notfull } queueAdd(data) pthread_mutex_unlock(mutex) pthread_cond_signal(queue->notempty) (thread removing data) pthread_mutex_lock(mutex) while (queue->empty) { pthread_cond_wait(qaueue->notempty, queue->mutex) // waits until condition met - not empty } queueDel(data) pthread_mutex_unlock(mutex) pthread_cond_signal(queue->notfull) Thoughts? Chuck |
From: Brian D H. <bdh...@c4...> - 2001-07-08 14:38:53
|
Below -- ============================================================ Brian D Heaton | I fear that we have awakened Principal Consultant | a sleeping giant and instilled C4I2.com System Consultants | in him a terrible resolve. bdh...@c4... | -- Admiral Isoruku Yamamoto USA (719) 623-0381 | -- Imperial Japanese Navy UK +44 (0)845 127-5400 | -- December 7, 1941 On 2001.07.08 08:24 Chuck Byam wrote: > I don't have any problem if this finds its way into 2.2. In fact I've > just > added 2 more config options; respondToIgateQueries and > respondToAprsdQueries. What I really want to see in this release tree is > > those pesky sig 11's and threading issues resloved (if possible) ;-) > > :: By the way, I'm away on holidays from this Wednesday 11th until > :: Sunday 21st. > :: > Have fun, I'll have to wait till September :/ with the exception of a > couple mini getaways. > > Chuck Chuck, When you get back into work Monday give the latest CVS a shot. I've been plowing through some more over the long weekend and it's getting better. I can still lock it up, but I have to really be trying hard. I have observed the following symptoms on all my recent lockups: 1 - no segfault 2 - TAprsString/History object counters diverge by X (TAS > History) 3 - stats shows X (from above) in InetQ. 4 - although web stats are unreachable console stats show items in InetQ continues to grow until queue overflow. 5 - In 90% of the cases the console interpeter is still active and I can stop with "q" Does this jibe with what anyone else is seeing? I can cause this by keeping 1-2 client sessions running (netcat) and running 6-8 more in continual history dumps. After about 100-120 connects it stops sending data out to the clients. It is still receiving and processing from the IGATEs though. It just stacks it up in the sendQueue. I'm thinking that it's the InetdeQueue thread that gets lost in lala-land. I'm working my way back through and adding return code checking to every mutex lock/unlock. I probably should have done it earlier, but it seems like one of the last things to hit. If you are still getting segfaults can you post the snips from segfault.log to the list? I'd like to see if they match up with where it was faulting ealier on my test system. 73/N5VFF |
From: Chuck B. <cb...@vi...> - 2001-07-08 14:34:04
|
On Sunday 08 July 2001 03:00, Hamish Moffatt wrote: :: On Sat, Jul 07, 2001 at 09:20:49PM -0400, Chuck Byam wrote: :: > I've already started on a fresh codebase that uses the nb++ class :: > library (another sourceforge project). This is a threadsafe library :: > that I hope will handle all of the socket issues. I've got a basic :: > server working (that handles the client connections) and am rewriting :: > TAprsString and the queues (including the history buffer). Once this :: > is done I'll start working on the client side so I can test the input :: > queues. :: :: OK.. will this be a different CVS module? :: :: Do you plan to get the whole lot working and implementing the same :: features as now? Or get something basic working then bring it up :: gradually to the same feature set? :: New major version number ala new branch. I'd like to start by gettting the basic client/server internet service working and add in the remaining functionality over time. Chuck |