From: Steve D. <k4...@ta...> - 2001-12-27 20:59:59
|
On 12/27/01 3:47 PM KC7ZRU - Tate (kc...@ar...) wrote: >Anyone else seeing their recent CVS versions of aprsd die after a few >minutes? > >Trying to find out why this thing keeps shut'n down after anything from 30 >to 60 seconds. > Yes, really sucks, there are lots of holes in the findu database because it takes about 5 seconds to restart aprsd and the parser. Crashing twice a minute (not uncommon lately) means losing 15% of the data. Yesterday I ran an experiment, hoping to figure out what was up. I made findu connect only to second.aprs.net (instead of its usual connections to all the primary and a couple AHub machines), and modified my parser to print each line as it processed it. No one else was trying or able to connect to aprsd (behind a firewall), and with data coming only from one source it should eliminate any asyncronous type of problem. It died just as often. I opened up a terminal window to second, so by looking for the last line to get to the parser, the problem, if it is caused by input, should be on that line or one of the next few. I found nothing unusual there. I tried just using one AHub machine (claimed to be the cleanest output), and still got frequent crashes and nothing suspect or consistant in the output at and following the crash. This has been an ongoing problem with findU and aprsd, I sure hope someone can find the cause... Steve K4HG |
From: KC7ZRU - T. <kc...@ar...> - 2001-12-27 21:50:52
|
Similar attempts here Steve, Using ethereal - watching the feed from either first, second or third.aprs.net - when the data flow stops, aprsd has died. Nothing I can find in the capture would indicate what's going on. It all looks good. Also running xastir at the same time connected to the same server - it keeps on going just fine. Can't get any data from third, FWIW... This all started happening in the last couple of days. aprsd V2.2.1(CVS from about a week or so ago) had been running fine up to then. I snagged a new copy from the sourceforge CVS server today - same results. 73 Steve Dimse wrote: > > Yes, really sucks, there are lots of holes in the findu database because > it takes about 5 seconds to restart aprsd and the parser. Crashing twice > a minute (not uncommon lately) means losing 15% of the data. > |
From: Steve D. <k4...@ta...> - 2001-12-27 22:07:59
|
On 12/27/01 4:50 PM KC7ZRU - Tate (kc...@ar...) wrote: >Can't get any data from third, FWIW... > I don't think third is running a watchdog, so when it crashes it stays crashed. >This all started happening in the last couple of days. aprsd V2.2.1(CVS >from about a week or so ago) had been running fine up to then. I snagged a >new copy from the sourceforge CVS server today - same results. > This has been going on episodically for me from when I first used aprsd about 2 years ago. It is the reason I created the watchdog in the first place. It seems to be related to the specific machine, I had a 233 PII box that never had an aprsd crash. Both of the dual PIII boxes for findu aren't so lucky, but it seems to go in waves. Right now the backup box is working much better, stays up for 2-3 hours, while the primary is crashing constantly. I had started to work on a hub type of program in Perl, no IGate connections, just connected to a list of servers, eliminated the dups and accepted connects to pass the data. No IGate fundtion, messaging, etc. Just what I nned for findU...guess it is time to restart work on that... Steve K4HG |
From: Steve D. <k4...@ta...> - 2001-12-28 02:54:40
|
On 12/27/01 9:49 PM Brian D Heaton (bdh...@c4...) wrote: > What I'm seeing are interlaces of what should be a seperate packet >following positions from KC7O-9. My thought is that it is somehow >crashing the packet parser. I need to go back through the source and >see where we discard overly long packets. > > I'm guessing that the IGATE taking the packets off RF and inserting >them on the Inet stream has MIC-E conversion turned on. You can see >that it's a MIC-E station, but it's coming across as full posit packets. > The short second line is actually part of the first. The question is where the CR got inserted, more likely in the handling after the APRS IS, like in a mail program. In the 10 crashes I watched yesterday, there was none of this. All the crashes happened in areas without any illegal packets (which you would expect, as I was using second.aprs.net and ahub, which have cleaner streams. What I saw makes it unlikely this is simply related to an illegal packet... Steve K4HG |
From: KC7ZRU - T. <kc...@ar...> - 2001-12-28 18:13:27
|
Whatever the cause - just as 'suddenly' things are working 'like normal' again. Been up for over 17 hours now no prob. Yesterday she wouldn't run for more than 90 seconds at a go. Running the CVS version from yesterday morning (MST) and connected to second.aprsd.net Is it possible that some of the less respectable members of the I-net community have discovered aprsd and are exploring weaknesses? Or is that likely jumping at shadows.... 73 Steve Dimse wrote: > > On 12/27/01 9:49 PM Brian D Heaton (bdh...@c4...) wrote: > > In the 10 crashes I watched yesterday, there was none of this. All the > crashes happened in areas without any illegal packets (which you would > expect, as I was using second.aprs.net and ahub, which have cleaner > streams. > -- /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ | CARC Repeater 146.940 DN62 | | http://groups.yahoo.com/group/RM-APRS | | "The Dungeon" at http://go.to/KC7ZRU | | AIM - kc7zru | +++++++++++++++++++++++++++++++++++++++++++++++ "I fear all we have done is to awaken a sleeping giant and fill him with a terrible resolve." - Admiral Yamamoto |
From: Hamish M. <ha...@cl...> - 2001-12-31 00:42:45
|
On Fri, Dec 28, 2001 at 11:13:22AM -0700, KC7ZRU - Tate wrote: > Whatever the cause - just as 'suddenly' things are working 'like normal' again. Been up for over 17 hours now no prob. Yesterday she wouldn't run for more than 90 seconds at a go. > > Running the CVS version from yesterday morning (MST) and connected to second.aprsd.net > > Is it possible that some of the less respectable members of the I-net community have discovered aprsd and are exploring weaknesses? Or is that likely jumping at shadows.... I think that's highly unlikely. It is interesting (to me) that sometimes it dies regularly yet other times the same code runs for days. That suggests to me that there is some particular source of packets which is crashing it which happens to be active at the time. However it does look like bad packets shouldn't be the source any more. I'm at a loss. Hamish -- Hamish Moffatt VK3SB <ha...@de...> <ha...@cl...> |
From: Steve D. <k4...@ta...> - 2001-12-28 18:27:10
|
On 12/28/01 1:13 PM KC7ZRU - Tate (kc...@ar...) wrote: >Is it possible that some of the less respectable members of the I-net >community have discovered aprsd and are exploring weaknesses? Or is that >likely jumping at shadows.... > Given the results of my experience and recent testing, this is not the cause, here are the specific reasons: It has been going on for years, better and worse at times, but aprsd has never been stable on my two multiproc systems. In the testing on findu, no aprsd ports were open to the outside world, there was only a single client connection by my findu parser, which sends no data. The tests were on single connects to APRServe and AHub machines, which do a good job of scrubbing the stream. The input stream for 10 packets before and 20 packets after the crashes during the tests were carefully scrutinized, all packets were well formed, legal APRS packets. There was no obvious pattern to the input stream. Steve K4HG |
From: Brian D H. <bdh...@c4...> - 2001-12-28 19:08:14
|
Steve, Do you see better/worse stability on a single processor machine? Can you provide details such as kernel version (general kernel config), glibc versio, gcc/g++ version, etc? I'm curious to know if there is something going on with threading and SMP support. THX/BDH On Fri, 2001-12-28 at 12:27, Steve Dimse wrote: > On 12/28/01 1:13 PM KC7ZRU - Tate (kc...@ar...) wrote: > > >Is it possible that some of the less respectable members of the I-net > >community have discovered aprsd and are exploring weaknesses? Or is that > >likely jumping at shadows.... > > > Given the results of my experience and recent testing, this is not the > cause, here are the specific reasons: > > It has been going on for years, better and worse at times, but aprsd has > never been stable on my two multiproc systems. > > In the testing on findu, no aprsd ports were open to the outside world, > there was only a single client connection by my findu parser, which sends > no data. > > The tests were on single connects to APRServe and AHub machines, which do > a good job of scrubbing the stream. > > The input stream for 10 packets before and 20 packets after the crashes > during the tests were carefully scrutinized, all packets were well > formed, legal APRS packets. There was no obvious pattern to the input > stream. > > Steve K4HG > > _______________________________________________ > Aprsd-users mailing list > Apr...@li... > https://lists.sourceforge.net/lists/listinfo/aprsd-users > |
From: KC7ZRU - T. <kc...@ar...> - 2001-12-28 19:47:32
|
Good info - thanks Steve. Steve Dimse wrote: > > On 12/28/01 1:13 PM KC7ZRU - Tate (kc...@ar...) wrote: > > Given the results of my experience and recent testing, this is not the > cause, here are the specific reasons: |
From: Steve D. <k4...@ta...> - 2001-12-28 20:10:22
|
On 12/28/01 2:08 PM Brian D Heaton (bdh...@c4...) wrote: >Steve, > > Do you see better/worse stability on a single processor machine? Can >you provide details such as kernel version (general kernel config), >glibc versio, gcc/g++ version, etc? I'm curious to know if there is >something going on with threading and SMP support. > I don't run it on any single processor machine. It seems to be worse under Redhat 7.2 (kernel 2.4.7-10) than it was under 6.2 (2.2 something-or-other), but even there it wasn't so good. Steve K4HG |
From: David V. <da...@vr...> - 2001-12-28 20:44:04
|
For what it's worth, I was trying to run aprsd on a dual-cpu Dell server running Redhat 7.2. I couldn't keep it running eiter. I tried for a few weeks and gave up. dave n9qnz > -----Original Message----- > From: apr...@li... > [mailto:apr...@li...]On Behalf Of > Steve Dimse > Sent: Friday, December 28, 2001 3:10 PM > To: Brian D Heaton > Cc: KC7ZRU - Tate; aprsd Users > Subject: Re: [Aprsd-users] Killer packets?? > > > On 12/28/01 2:08 PM Brian D Heaton (bdh...@c4...) wrote: > > >Steve, > > > > Do you see better/worse stability on a single processor > machine? Can > >you provide details such as kernel version (general kernel config), > >glibc versio, gcc/g++ version, etc? I'm curious to know if there is > >something going on with threading and SMP support. > > > I don't run it on any single processor machine. It seems to be worse > under Redhat 7.2 (kernel 2.4.7-10) than it was under 6.2 (2.2 > something-or-other), but even there it wasn't so good. > > Steve K4HG > > _______________________________________________ > Aprsd-users mailing list > Apr...@li... > https://lists.sourceforge.net/lists/listinfo/aprsd-users > > |
From: Brian D H. <bdh...@c4...> - 2001-12-28 21:12:36
|
Running solid as a rock here. I run mine off and on for development. I'm on a recent kernel, all patches applied, and using the GCC 3.x branch. THX/BDH On Fri, 2001-12-28 at 14:43, David Vrona wrote: > For what it's worth, I was trying to run aprsd on a dual-cpu Dell server > running Redhat 7.2. > > I couldn't keep it running eiter. I tried for a few weeks and gave up. > > dave > n9qnz > > > -----Original Message----- > > From: apr...@li... > > [mailto:apr...@li...]On Behalf Of > > Steve Dimse > > Sent: Friday, December 28, 2001 3:10 PM > > To: Brian D Heaton > > Cc: KC7ZRU - Tate; aprsd Users > > Subject: Re: [Aprsd-users] Killer packets?? > > > > > > On 12/28/01 2:08 PM Brian D Heaton (bdh...@c4...) wrote: > > > > >Steve, > > > > > > Do you see better/worse stability on a single processor > > machine? Can > > >you provide details such as kernel version (general kernel config), > > >glibc versio, gcc/g++ version, etc? I'm curious to know if there is > > >something going on with threading and SMP support. > > > > > I don't run it on any single processor machine. It seems to be worse > > under Redhat 7.2 (kernel 2.4.7-10) than it was under 6.2 (2.2 > > something-or-other), but even there it wasn't so good. > > > > Steve K4HG > > > > _______________________________________________ > > Aprsd-users mailing list > > Apr...@li... > > https://lists.sourceforge.net/lists/listinfo/aprsd-users > > > > > > > _______________________________________________ > Aprsd-users mailing list > Apr...@li... > https://lists.sourceforge.net/lists/listinfo/aprsd-users > |
From: Brian D H. <bdh...@c4...> - 2001-12-28 21:11:58
|
Steve, I'm assuming you're current on all the RH patches. I'd also download/compile the newest 2.4.17 kernel (2.4.16 would work too). There have been some major changes to the VM subsystem since 2.4.7 and it will remove some areas of concern. 2.4.16/17 also has the ext3 patches integrated into the kernel so if you've got ext3 filesystems you won't have to add the patch. As a side benefit you can tweak the kernel for your setup. The RH stock kernels have so much stuff compiled in (or in modules) that it can cause some problems. Give that a shot and let me know if I can be of any assistance. THX/BDH On Fri, 2001-12-28 at 14:10, Steve Dimse wrote: > On 12/28/01 2:08 PM Brian D Heaton (bdh...@c4...) wrote: > > >Steve, > > > > Do you see better/worse stability on a single processor machine? Can > >you provide details such as kernel version (general kernel config), > >glibc versio, gcc/g++ version, etc? I'm curious to know if there is > >something going on with threading and SMP support. > > > I don't run it on any single processor machine. It seems to be worse > under Redhat 7.2 (kernel 2.4.7-10) than it was under 6.2 (2.2 > something-or-other), but even there it wasn't so good. > > Steve K4HG > > _______________________________________________ > Aprsd-users mailing list > Apr...@li... > https://lists.sourceforge.net/lists/listinfo/aprsd-users > |
From: Steve D. <k4...@ta...> - 2001-12-28 21:38:41
|
On 12/28/01 4:12 PM Brian D Heaton (bdh...@c4...) wrote: > I'm assuming you're current on all the RH patches. I'd also >download/compile the newest 2.4.17 kernel (2.4.16 would work too). >There have been some major changes to the VM subsystem since 2.4.7 and >it will remove some areas of concern. 2.4.16/17 also has the ext3 >patches integrated into the kernel so if you've got ext3 filesystems you >won't have to add the patch. As a side benefit you can tweak the kernel >for your setup. The RH stock kernels have so much stuff compiled in (or >in modules) that it can cause some problems. > > Give that a shot and let me know if I can be of any assistance. > Actually I've given up, now that I've been pointed to aprsd.pl, so far it is much more stable. Just to be totally clear. This is not a new problem, there is a concurrency issue, it seems the higher the performance of the system (and especially going to multiple processors) the greater the likelihood of it occuring. There is some other trigger factor I don't understand, but as far as I can tell it isn't specific data on the stream. I don't believe this is a kernel issue, it has been present in every kernel I've used going back to 2.0.something before I even started findU. I've spent countless hours trying to isolate something that could consistently reproduce it or find some way to work around it. I can't bring down the production machine for findu every time a new kernel comes out in the vague hope that that will magically fix the problem. For production machines staying a bit behind the bleeding edge is often the safest course of action, not to mention the most efficient. I want to spend my time working on findU, not administering and configuring Linux. Steve K4HG |
From: David V. <da...@vr...> - 2001-12-29 14:49:14
|
Could I offer my Linux server as a test bed for findu or whatever else you guys want to try? It's a dual 733 MHZ CPU box and it's extremely lightly loaded. Lots of RAM and it's on a hefty UPS. Uptime is currently 75 days or so. I'm more than willing to setup accounts on it for test work. Brian will need to make sure the latest kernel and gcc are in place. I can do that but I would be more comfortable if he did it or at least checked my work. dave n9qnz |