Thread: [Svxlink-devel] Error in voter
Brought to you by:
sm0svx
From: PE1RJV <pe...@ho...> - 2014-01-13 07:21:35
|
Since the last week we added a second receiver to our repeater system. More or less once a day svxlink quits with an error: svxlink: Voter.cpp:1118: virtual void Voter::Receiving::timerExpired(): Assertion `bestSrx() != 0' failed. It seems to be a problem with a buffer overflow on the remotetrx site, but I'm not sure. Any clues on how to fix this ? Best 73s, Paul ----- PI3UTR -- View this message in context: http://svxlink.996268.n3.nabble.com/Error-in-voter-tp3254.html Sent from the svxlink-devel mailing list archive at Nabble.com. |
From: SM0SVX <sm...@us...> - 2014-01-28 21:55:22
|
On Sunday 12 January 2014 23:21:28 PE1RJV wrote: > Since the last week we added a second receiver to our repeater system. > More or less once a day svxlink quits with an error: > > svxlink: Voter.cpp:1118: virtual void Voter::Receiving::timerExpired(): > Assertion `bestSrx() != 0' failed. Did you change any of the default config for the Voter? Can you post the config for the Voter and the receivers. 73's de SM0SVX / Tobias > > It seems to be a problem with a buffer overflow on the remotetrx site, but > I'm not sure. > Any clues on how to fix this ? > > Best 73s, Paul > > > > ----- > PI3UTR |
From: PE1RJV <pe...@ho...> - 2014-01-29 08:02:32
|
SvxLink: [Voter] TYPE=Voter RECEIVERS=Rx1,NetRx1,NetRx2,NetRx3 VOTING_DELAY=100 BUFFER_LENGTH=0 REVOTE_INTERVAL=1000 HYSTERESIS=50 RX_SWITCH_DELAY=500 SQL_CLOSE_REVOTE_DELAY=500 [Rx1] TYPE=Local AUDIO_DEV=alsa:plughw:0,0 AUDIO_CHANNEL=0 SQL_DET=SERIAL #SQL_DET=CTCSS #SQL_DET=SIGLEV SQL_START_DELAY=0 SQL_DELAY=40 SQL_HANGTIME=50 #SQL_EXTENDED_HANGTIME=500 #SQL_EXTENDED_HANGTIME_THRESH=15 SQL_TIMEOUT=900 #VOX_FILTER_DEPTH=20 #VOX_THRESH=1000 #CTCSS_MODE=2 CTCSS_FQ=77.0 #CTCSS_SNR_OFFSET=0 #CTCSS_OPEN_THRESH=15 #CTCSS_CLOSE_THRESH=9 #CTCSS_BPF_LOW=60 #CTCSS_BPF_HIGH=230 SERIAL_PORT=/dev/ttyS0 SERIAL_PIN=DCD:SET #SERIAL_SET_PINS=DTR #EVDEV_DEVNAME=/dev/input/by-id/usb-SYNIC_SYNIC_Wireless_Audio-event-if03 #EVDEV_OPEN=1,163,1 #EVDEV_CLOSE=1,163,0 SIGLEV_DET=NOISE SIGLEV_SLOPE=110.99 SIGLEV_OFFSET=-720.05 #TONE_SIGLEV_MAP=100,84,60,50,37,32,28,23,19,8 SIGLEV_OPEN_THRESH=30 SIGLEV_CLOSE_THRESH=10 DEEMPHASIS=0 SQL_TAIL_ELIM=100 #PREAMP=6 PEAK_METER=0 #DTMF_DEC_TYPE=S54S #DTMF_SERIAL=/dev/ttyS0 #DTMF_DEC_TYPE=INTERNAL DTMF_DEC_TYPE=NONE DTMF_MUTING=1 DTMF_HANGTIME=200 DTMF_MAX_FWD_TWIST=8 DTMF_MAX_REV_TWIST=4 1750_MUTING=1 #SEL5_DEC_TYPE=INTERNAL #SEL5_TYPE=ZVEI1 RemoteTRX: [Rx1] TYPE=Local AUDIO_DEV=alsa:plughw:0 AUDIO_CHANNEL=0 #SQL_DET=CTCSS SQL_DET=SIGLEV SQL_START_DELAY=50 SQL_DELAY=40 SQL_HANGTIME=200 SQL_EXTENDED_HANGTIME=500 SQL_EXTENDED_HANGTIME_THRESH=10 SQL_TIMEOUT=600 #VOX_FILTER_DEPTH=20 #VOX_THRESH=1000 #CTCSS_MODE=2 CTCSS_FQ=77.0 CTCSS_SNR_OFFSET=-23.50 CTCSS_OPEN_THRESH=15 CTCSS_CLOSE_THRESH=9 CTCSS_BPF_LOW=60 CTCSS_BPF_HIGH=270 #SERIAL_PORT=/dev/ttyS0 #SERIAL_PIN=CTS:SET #SERIAL_SET_PINS=DTR #EVDEV_DEVNAME=/dev/input/by-id/usb-SYNIC_SYNIC_Wireless_Audio-event-if03 #EVDEV_OPEN=1,163,1 #EVDEV_CLOSE=1,163,0 SIGLEV_DET=NOISE SIGLEV_SLOPE=30.70 SIGLEV_OFFSET=-88.75 #TONE_SIGLEV_MAP=100,84,60,50,37,32,28,23,19,8 SIGLEV_OPEN_THRESH=30 SIGLEV_CLOSE_THRESH=10 DEEMPHASIS=0 SQL_TAIL_ELIM=200 #PREAMP=6 PEAK_METER=1 #DTMF_DEC_TYPE=S54S #DTMF_SERIAL=/dev/ttyS0 #DTMF_DEC_TYPE=INTERNAL DTMF_DEC_TYPE=NONE DTMF_MUTING=1 DTMF_HANGTIME=200 #DTMF_MAX_FWD_TWIST=8 #DTMF_MAX_REV_TWIST=4 1750_MUTING=1 #SEL5_DEC_TYPE=INTERNAL #SEL5_TYPE=ZVEI1 73s, Paul From: SM0SVX [via SvxLink] Sent: Tuesday, January 28, 2014 9:56 PM To: PE1RJV Subject: Re: Error in voter On Sunday 12 January 2014 23:21:28 PE1RJV wrote: > Since the last week we added a second receiver to our repeater system. > More or less once a day svxlink quits with an error: > > svxlink: Voter.cpp:1118: virtual void Voter::Receiving::timerExpired(): > Assertion `bestSrx() != 0' failed. Did you change any of the default config for the Voter? Can you post the config for the Voter and the receivers. 73's de SM0SVX / Tobias > > It seems to be a problem with a buffer overflow on the remotetrx site, but > I'm not sure. > Any clues on how to fix this ? > > Best 73s, Paul > > > > ----- > PI3UTR ------------------------------------------------------------------------------ WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk _______________________________________________ Svxlink-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/svxlink-devel -------------------------------------------------------------------------------- If you reply to this email, your message will be added to the discussion below: http://svxlink.996268.n3.nabble.com/Error-in-voter-tp3254p3273.html To start a new topic under svxlink-devel, email ml-...@n3... To unsubscribe from SvxLink, click here. NAML ----- PI3UTR -- View this message in context: http://svxlink.996268.n3.nabble.com/Error-in-voter-tp3254p3275.html Sent from the svxlink-devel mailing list archive at Nabble.com. |
From: SM0SVX <sm...@us...> - 2014-02-03 07:51:28
|
Cannot see anything strange there. I'll dig into the source code to see what's wrong. 73's de SM0SVX / Tobias On Wednesday 29 January 2014 00:01:54 PE1RJV wrote: > SvxLink: > > [Voter] > TYPE=Voter > RECEIVERS=Rx1,NetRx1,NetRx2,NetRx3 > VOTING_DELAY=100 > BUFFER_LENGTH=0 > REVOTE_INTERVAL=1000 > HYSTERESIS=50 > RX_SWITCH_DELAY=500 > SQL_CLOSE_REVOTE_DELAY=500 > > [Rx1] > TYPE=Local > AUDIO_DEV=alsa:plughw:0,0 > AUDIO_CHANNEL=0 > SQL_DET=SERIAL > #SQL_DET=CTCSS > #SQL_DET=SIGLEV > SQL_START_DELAY=0 > SQL_DELAY=40 > SQL_HANGTIME=50 > #SQL_EXTENDED_HANGTIME=500 > #SQL_EXTENDED_HANGTIME_THRESH=15 > SQL_TIMEOUT=900 > #VOX_FILTER_DEPTH=20 > #VOX_THRESH=1000 > #CTCSS_MODE=2 > CTCSS_FQ=77.0 > #CTCSS_SNR_OFFSET=0 > #CTCSS_OPEN_THRESH=15 > #CTCSS_CLOSE_THRESH=9 > #CTCSS_BPF_LOW=60 > #CTCSS_BPF_HIGH=230 > SERIAL_PORT=/dev/ttyS0 > SERIAL_PIN=DCD:SET > #SERIAL_SET_PINS=DTR > #EVDEV_DEVNAME=/dev/input/by-id/usb-SYNIC_SYNIC_Wireless_Audio-event-if03 > #EVDEV_OPEN=1,163,1 > #EVDEV_CLOSE=1,163,0 > SIGLEV_DET=NOISE > SIGLEV_SLOPE=110.99 > SIGLEV_OFFSET=-720.05 > #TONE_SIGLEV_MAP=100,84,60,50,37,32,28,23,19,8 > SIGLEV_OPEN_THRESH=30 > SIGLEV_CLOSE_THRESH=10 > DEEMPHASIS=0 > SQL_TAIL_ELIM=100 > #PREAMP=6 > PEAK_METER=0 > #DTMF_DEC_TYPE=S54S > #DTMF_SERIAL=/dev/ttyS0 > #DTMF_DEC_TYPE=INTERNAL > DTMF_DEC_TYPE=NONE > DTMF_MUTING=1 > DTMF_HANGTIME=200 > DTMF_MAX_FWD_TWIST=8 > DTMF_MAX_REV_TWIST=4 > 1750_MUTING=1 > #SEL5_DEC_TYPE=INTERNAL > #SEL5_TYPE=ZVEI1 > > RemoteTRX: > > [Rx1] > TYPE=Local > AUDIO_DEV=alsa:plughw:0 > AUDIO_CHANNEL=0 > #SQL_DET=CTCSS > SQL_DET=SIGLEV > SQL_START_DELAY=50 > SQL_DELAY=40 > SQL_HANGTIME=200 > SQL_EXTENDED_HANGTIME=500 > SQL_EXTENDED_HANGTIME_THRESH=10 > SQL_TIMEOUT=600 > #VOX_FILTER_DEPTH=20 > #VOX_THRESH=1000 > #CTCSS_MODE=2 > CTCSS_FQ=77.0 > CTCSS_SNR_OFFSET=-23.50 > CTCSS_OPEN_THRESH=15 > CTCSS_CLOSE_THRESH=9 > CTCSS_BPF_LOW=60 > CTCSS_BPF_HIGH=270 > #SERIAL_PORT=/dev/ttyS0 > #SERIAL_PIN=CTS:SET > #SERIAL_SET_PINS=DTR > #EVDEV_DEVNAME=/dev/input/by-id/usb-SYNIC_SYNIC_Wireless_Audio-event-if03 > #EVDEV_OPEN=1,163,1 > #EVDEV_CLOSE=1,163,0 > SIGLEV_DET=NOISE > SIGLEV_SLOPE=30.70 > SIGLEV_OFFSET=-88.75 > #TONE_SIGLEV_MAP=100,84,60,50,37,32,28,23,19,8 > SIGLEV_OPEN_THRESH=30 > SIGLEV_CLOSE_THRESH=10 > DEEMPHASIS=0 > SQL_TAIL_ELIM=200 > #PREAMP=6 > PEAK_METER=1 > #DTMF_DEC_TYPE=S54S > #DTMF_SERIAL=/dev/ttyS0 > #DTMF_DEC_TYPE=INTERNAL > DTMF_DEC_TYPE=NONE > DTMF_MUTING=1 > DTMF_HANGTIME=200 > #DTMF_MAX_FWD_TWIST=8 > #DTMF_MAX_REV_TWIST=4 > 1750_MUTING=1 > #SEL5_DEC_TYPE=INTERNAL > #SEL5_TYPE=ZVEI1 > > 73s, Paul > > > From: SM0SVX [via SvxLink] > Sent: Tuesday, January 28, 2014 9:56 PM > To: PE1RJV > Subject: Re: Error in voter > > On Sunday 12 January 2014 23:21:28 PE1RJV wrote: > > Since the last week we added a second receiver to our repeater system. > > More or less once a day svxlink quits with an error: > > > > svxlink: Voter.cpp:1118: virtual void Voter::Receiving::timerExpired(): > > Assertion `bestSrx() != 0' failed. > > Did you change any of the default config for the Voter? Can you post the > config for the Voter and the receivers. > > 73's de SM0SVX / Tobias > > > It seems to be a problem with a buffer overflow on the remotetrx site, but > > I'm not sure. > > Any clues on how to fix this ? > > > > Best 73s, Paul > > > > > > > > ----- > > PI3UTR |
From: SM0SVX <sm...@us...> - 2014-02-16 11:15:00
|
Paul, Try latest Subversion trunk now. 73's de SM0SVX / Tobias On Monday 03 February 2014 08:51:20 SM0SVX wrote: > Cannot see anything strange there. I'll dig into the source code to see > what's wrong. > > 73's de SM0SVX / Tobias > > On Wednesday 29 January 2014 00:01:54 PE1RJV wrote: > > SvxLink: > > > > [Voter] > > TYPE=Voter > > RECEIVERS=Rx1,NetRx1,NetRx2,NetRx3 > > VOTING_DELAY=100 > > BUFFER_LENGTH=0 > > REVOTE_INTERVAL=1000 > > HYSTERESIS=50 > > RX_SWITCH_DELAY=500 > > SQL_CLOSE_REVOTE_DELAY=500 > > > > [Rx1] > > TYPE=Local > > AUDIO_DEV=alsa:plughw:0,0 > > AUDIO_CHANNEL=0 > > SQL_DET=SERIAL > > #SQL_DET=CTCSS > > #SQL_DET=SIGLEV > > SQL_START_DELAY=0 > > SQL_DELAY=40 > > SQL_HANGTIME=50 > > #SQL_EXTENDED_HANGTIME=500 > > #SQL_EXTENDED_HANGTIME_THRESH=15 > > SQL_TIMEOUT=900 > > #VOX_FILTER_DEPTH=20 > > #VOX_THRESH=1000 > > #CTCSS_MODE=2 > > CTCSS_FQ=77.0 > > #CTCSS_SNR_OFFSET=0 > > #CTCSS_OPEN_THRESH=15 > > #CTCSS_CLOSE_THRESH=9 > > #CTCSS_BPF_LOW=60 > > #CTCSS_BPF_HIGH=230 > > SERIAL_PORT=/dev/ttyS0 > > SERIAL_PIN=DCD:SET > > #SERIAL_SET_PINS=DTR > > #EVDEV_DEVNAME=/dev/input/by-id/usb-SYNIC_SYNIC_Wireless_Audio-event-if03 > > #EVDEV_OPEN=1,163,1 > > #EVDEV_CLOSE=1,163,0 > > SIGLEV_DET=NOISE > > SIGLEV_SLOPE=110.99 > > SIGLEV_OFFSET=-720.05 > > #TONE_SIGLEV_MAP=100,84,60,50,37,32,28,23,19,8 > > SIGLEV_OPEN_THRESH=30 > > SIGLEV_CLOSE_THRESH=10 > > DEEMPHASIS=0 > > SQL_TAIL_ELIM=100 > > #PREAMP=6 > > PEAK_METER=0 > > #DTMF_DEC_TYPE=S54S > > #DTMF_SERIAL=/dev/ttyS0 > > #DTMF_DEC_TYPE=INTERNAL > > DTMF_DEC_TYPE=NONE > > DTMF_MUTING=1 > > DTMF_HANGTIME=200 > > DTMF_MAX_FWD_TWIST=8 > > DTMF_MAX_REV_TWIST=4 > > 1750_MUTING=1 > > #SEL5_DEC_TYPE=INTERNAL > > #SEL5_TYPE=ZVEI1 > > > > RemoteTRX: > > > > [Rx1] > > TYPE=Local > > AUDIO_DEV=alsa:plughw:0 > > AUDIO_CHANNEL=0 > > #SQL_DET=CTCSS > > SQL_DET=SIGLEV > > SQL_START_DELAY=50 > > SQL_DELAY=40 > > SQL_HANGTIME=200 > > SQL_EXTENDED_HANGTIME=500 > > SQL_EXTENDED_HANGTIME_THRESH=10 > > SQL_TIMEOUT=600 > > #VOX_FILTER_DEPTH=20 > > #VOX_THRESH=1000 > > #CTCSS_MODE=2 > > CTCSS_FQ=77.0 > > CTCSS_SNR_OFFSET=-23.50 > > CTCSS_OPEN_THRESH=15 > > CTCSS_CLOSE_THRESH=9 > > CTCSS_BPF_LOW=60 > > CTCSS_BPF_HIGH=270 > > #SERIAL_PORT=/dev/ttyS0 > > #SERIAL_PIN=CTS:SET > > #SERIAL_SET_PINS=DTR > > #EVDEV_DEVNAME=/dev/input/by-id/usb-SYNIC_SYNIC_Wireless_Audio-event-if03 > > #EVDEV_OPEN=1,163,1 > > #EVDEV_CLOSE=1,163,0 > > SIGLEV_DET=NOISE > > SIGLEV_SLOPE=30.70 > > SIGLEV_OFFSET=-88.75 > > #TONE_SIGLEV_MAP=100,84,60,50,37,32,28,23,19,8 > > SIGLEV_OPEN_THRESH=30 > > SIGLEV_CLOSE_THRESH=10 > > DEEMPHASIS=0 > > SQL_TAIL_ELIM=200 > > #PREAMP=6 > > PEAK_METER=1 > > #DTMF_DEC_TYPE=S54S > > #DTMF_SERIAL=/dev/ttyS0 > > #DTMF_DEC_TYPE=INTERNAL > > DTMF_DEC_TYPE=NONE > > DTMF_MUTING=1 > > DTMF_HANGTIME=200 > > #DTMF_MAX_FWD_TWIST=8 > > #DTMF_MAX_REV_TWIST=4 > > 1750_MUTING=1 > > #SEL5_DEC_TYPE=INTERNAL > > #SEL5_TYPE=ZVEI1 > > > > 73s, Paul > > > > > > From: SM0SVX [via SvxLink] > > Sent: Tuesday, January 28, 2014 9:56 PM > > To: PE1RJV > > Subject: Re: Error in voter > > > > On Sunday 12 January 2014 23:21:28 PE1RJV wrote: > > > Since the last week we added a second receiver to our repeater system. > > > More or less once a day svxlink quits with an error: > > > > > > svxlink: Voter.cpp:1118: virtual void Voter::Receiving::timerExpired(): > > > Assertion `bestSrx() != 0' failed. > > > > Did you change any of the default config for the Voter? Can you post the > > config for the Voter and the receivers. > > > > 73's de SM0SVX / Tobias > > > > > It seems to be a problem with a buffer overflow on the remotetrx site, > > > but > > > I'm not sure. > > > Any clues on how to fix this ? > > > > > > Best 73s, Paul > > > > > > > > > > > > ----- > > > PI3UTR > > ---------------------------------------------------------------------------- > -- Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk > _______________________________________________ > Svxlink-devel mailing list > Svx...@li... > https://lists.sourceforge.net/lists/listinfo/svxlink-devel |
From: PE1RJV <pe...@ho...> - 2014-02-24 11:35:41
|
I updated the system today, in about a week I will report if the results. TIA ! ----- PI3UTR -- View this message in context: http://svxlink.996268.n3.nabble.com/Error-in-voter-tp3254p3324.html Sent from the svxlink-devel mailing list archive at Nabble.com. |
From: SM0SVX <sm...@us...> - 2014-03-22 15:30:02
|
On Monday 24 February 2014 03:35:33 PE1RJV wrote: > I updated the system today, in about a week I will report if the results. Any feedback on this? 73's de SM0SVX / Tobias > TIA ! > > > > ----- > PI3UTR |
From: PE1RJV <pe...@ho...> - 2014-03-26 15:46:22
|
Hi, Not a solid feedback unfortunately, it still is not working ok. It seems to be triggered by some buffer under- or overflow in the remotetrx, I still have to investigate it further. Finding the time to do so is my issue... 73s, Paul On 22-3-2014 16:31, SM0SVX [via SvxLink] wrote: > On Monday 24 February 2014 03:35:33 PE1RJV wrote: > > I updated the system today, in about a week I will report if the > results. > > Any feedback on this? > > 73's de SM0SVX / Tobias > > > > TIA ! > > > > > > > > ----- > > PI3UTR > > > ------------------------------------------------------------------------------ > > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and > their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Svxlink-devel mailing list > [hidden email] </user/SendEmail.jtp?type=node&node=3398&i=0> > https://lists.sourceforge.net/lists/listinfo/svxlink-devel > > > ------------------------------------------------------------------------ > If you reply to this email, your message will be added to the > discussion below: > http://svxlink.996268.n3.nabble.com/Error-in-voter-tp3254p3398.html > To start a new topic under svxlink-devel, email > ml-...@n3... > To unsubscribe from SvxLink, click here > <http://svxlink.996268.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=2&code=cGUxcmp2QGhvdG1haWwuY29tfDJ8MzAwMTA0MjI3>. > NAML > <http://svxlink.996268.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > ----- PI3UTR -- View this message in context: http://svxlink.996268.n3.nabble.com/Error-in-voter-tp3254p3418.html Sent from the svxlink-devel mailing list archive at Nabble.com. |
From: Rob J. <pe...@am...> - 2014-04-12 09:52:52
Attachments:
gdb.txt
|
I am helping Paul with debugging this problem. I started svxlink 13.12 in interactive mode and with core dumping enabled, and printed the call backtrace in gdb. It is included as attachment. When you require the core file or want me to print other info from gdb please mention it. We have a setup, mainly for testing, where two receivers are connected, one remote and one local. But the remote receiver is close by, and many stations are received on both receivers. What appears to be happening is when the station ends transmitting, both the local and remote receiver squelch close at "the same time", and a race condition appears to exist when it is handling the "squelch off" event for one receiver, finding the next receiver to use, and then finding that one is closing the squelch as well. The crashes always occur at the end of a transmission from a station. It can work for a few hours or it can crash in a couple of minutes. I cannot imagine that race conditions like this would not have been foreseen in the design of svxlink, so I first show you the stackdump before I start to investigate very deeply how this can happen and how svxlink handles serialization in general. (which I have not yet done) Rob PE1CHL |
From: Rob J. <pe...@am...> - 2014-04-13 16:13:34
Attachments:
gdb.txt
|
Here are a couple more crashdumps from the same repeater. The first 3 are apparently a different situation that ends in an Abort (signal 6), the last one is again a Segmentation Violation (signal 11) and looks much like the first. Rob PE1CHL |
From: SM0SVX <sm...@us...> - 2014-04-17 15:57:40
|
On Sunday 13 April 2014 18:13:23 Rob Janssen wrote: > Here are a couple more crashdumps from the same repeater. > The first 3 are apparently a different situation that ends in an Abort > (signal 6), the last one is again a Segmentation Violation (signal 11) and > looks much like the first. Thanks for helping to solve this problem and thanks for the dumps. I'll have a look at them. > I try to debug the Voter problem discussed on the list, and my suspicion is > that it is a simple race condition resulting from the use of asynchronous > timers in code with no protection of shared variables. > > However, I cannot believe that is really the case, as the software appears > to be written by knowledgeable people and is in wide use, and issues like > this should have been found long ago or avoided altogether. > > So my basic question is: what is the general mechanism that protects the > code against interruption by timer signals at places where this causes > unwanted change of variables? Is there some protection offered by C++ or > by constructs in the program that I have not yet found?> > For example, we see the code SEGV at line 361 in Macho.cpp, this code fragment: > if (myPendingEvent) { > > _IEventBase * event = myPendingEvent; > myPendingEvent = 0; > event->dispatch(*myCurrentState); <--- line 361 > delete event; > > } > > I think it means that "event" or "myCurrentState" is a NULL pointer. > However, both of those have been checked for NULL shortly before. > So I suspect that between the check and the use, a timer signal has been > handled that sets them. > > Is that possible? Or am I completely on the wrong track here... SvxLink does not use threads or UNIX signals so there really is only one program flow. Well, that's almost true. There are some small parts that use threads/signals but they should be isolated from the rest of the code using a safe interface. Timers in SvxLink are implemented using the "pselect" system call. Timer events are generated from AsyncCppApplication.cpp as are all other events. There is no way for a timer to interrupt the code above. 73's de SM0SVX / Tobias > > Rob PE1CHL |
From: SM0SVX <sm...@us...> - 2014-04-24 06:22:55
|
On Tuesday 22 April 2014 23:17:05 Rob Janssen wrote: > SM0SVX wrote: > > On Sunday 13 April 2014 18:13:23 Rob Janssen wrote: > >> Here are a couple more crashdumps from the same repeater. > >> The first 3 are apparently a different situation that ends in an Abort > >> (signal 6), the last one is again a Segmentation Violation (signal 11) > >> and > >> looks much like the first. > > > > Thanks for helping to solve this problem and thanks for the dumps. I'll > > have a look at them. > > I have been working on it and I noticed that several versions of svxlink had > been installed on the computer on top of eachother. I decided to uninstall > everything and start from scratch, especially after I saw a backtrace in > gdb where part of the function list had no matching symbols. Ah, that typically sounds like different libraries were used when linking and when running. > Unfortunately I made the mistake to uninstall in reverse order: first a > "make uninstall" to undo the "make install" I did last, and then a couple > of "rpm -e" commands to remove the early version that had been installed > using rpm, then some more rm commands to remove libraries still left in > /usr/lib. > > I found out the hard way that "make uninstall" simply removes the > /etc/svxlink/svxlink.conf file, not a file that it has installed but the > carefully modified one :-( > > I had expected (in fact I had not really thought about it) that it would > leave config files, certainly those that had been modified, or that it > would rename them. rpm -e does that, but as I ran the "make uninstall" > first, the svxlink.conf file already was lost. > The latest backup was two months old. Of course I have arranged for a > daily backup of those directories now. I'm sorry you got your configuration deleted. At least you (re-)discovered that backups are good :-) I guess the "make uninstall" could be improved but the plan is to switch to cmake so I will not put in any time to fix the old makefiles. The reason I have not switched to cmake yet is that I have an old SvxLink system running that does not support cmake. It should be upgraded but that have not happened so far. Maybe soon... > What I did not really expect did happen though: after the clean install, the > crashes as I have described have not yet occurred again. So maybe there > still was some installation conflict that was resolve this way, and that > can result when only "make install" is used? To be really sure, we need to > run it a few days more. But usually it crashed 6 times a day or more, and > now it has not crashed for more than a day. That's great! I hope it will continue to run. I thought it was kind of strange that you had so many crashes since I have systems that have been running without problems for a very long time. Using "make install" should work but I rarely use it myself since I mostly run SvxLink directly from the source tree. That is more convenient for development. > Is there a script available that removes all installed files from different > svxlink versions? (so one can make a clean start) > Of course I now know to save the modified config files first :-) There is nothing other than "make uninstall" that I know of. If you want to see exactly which files are installed you could temporarily install svxlink into its own directory. make install DESTDIR=/tmp/svxlink NO_CHOWN=1 NO_CHGRP=1 Now you can have a look in /tmp/svxlink to find out what has been installed. The "tree" utility give a nice overview: tree /tmp/svxlink 73's de SM0SVX / Tobias > > Rob |
From: Rob J. <pe...@am...> - 2014-04-20 17:34:30
|
SM0SVX wrote: > On Sunday 13 April 2014 18:13:23 Rob Janssen wrote: >> Here are a couple more crashdumps from the same repeater. >> The first 3 are apparently a different situation that ends in an Abort >> (signal 6), the last one is again a Segmentation Violation (signal 11) and >> looks much like the first. > Thanks for helping to solve this problem and thanks for the dumps. I'll have a > look at them. > I am again trying to debug it and I noticed that the "assertion failed" messages never appear anywhere. I found it is caused by the re-routing of stderr to a pipe during startup when --logfile is present. Normally this is nice because it will cause stderr messages to be time-stamped in the logfile, but when those assert errors occur the C library will print the message to stderr and then close everything and raise a signal 6, which means the main loop that gets the message from the pipe and writes it to the log is not executed anymore. I put a "if (daemonize)" around the redirection of stderr so I can have the assert errors while debugging (and running the program on the console under a program that re-starts it when it exits with signal 6 or 11, so the users of the repeater do not suffer). However, it appears that a lot of output is in fact written to stderr. I think it needs a decision, whether to send error output to stdout, or to open the logfile on stderr and not use the redirect pipe trick on stderr. Of course that means that stderr output will not be timestamped in the log. Maybe it could also be fixed by catching the SIGABRT signal and flushing the error pipe before resetting the signal and raising it again? I also noticed this fragment at the end of main (in svxlink.cpp): if (sigaction(SIGHUP, &sighup_oldact, NULL) == -1) { perror("sigaction"); } if (sigaction(SIGHUP, &sigterm_oldact, NULL) == -1) { perror("sigaction"); } if (sigaction(SIGHUP, &sigint_oldact, NULL) == -1) { perror("sigaction"); } I think the last to SIGHUP should be SIGTERM and SIGINT respectively. Of course this fixes no problems, in fact this part could be removed completely without any effect. Rob |
From: SM0SVX <sm...@us...> - 2014-04-24 05:10:35
|
On Sunday 20 April 2014 19:34:19 Rob Janssen wrote: > SM0SVX wrote: > > On Sunday 13 April 2014 18:13:23 Rob Janssen wrote: > >> Here are a couple more crashdumps from the same repeater. > >> The first 3 are apparently a different situation that ends in an Abort > >> (signal 6), the last one is again a Segmentation Violation (signal 11) > >> and > >> looks much like the first. > > > > Thanks for helping to solve this problem and thanks for the dumps. I'll > > have a look at them. > > I am again trying to debug it and I noticed that the "assertion failed" > messages never appear anywhere. I found it is caused by the re-routing of > stderr to a pipe during startup when --logfile is present. > Normally this is nice because it will cause stderr messages to be > time-stamped in the logfile, but when those assert errors occur the C > library will print the message to stderr and then close everything and > raise a signal 6, which means the main loop that gets the message from the > pipe and writes it to the log is not executed anymore. Hmmm. Yes, the logging could be improved somewhat obviously. > I put a "if (daemonize)" around the redirection of stderr so I can have the > assert errors while debugging (and running the program on the console under > a program that re-starts it when it exits with signal 6 or 11, so the users > of the repeater do not suffer). However, it appears that a lot of output is > in fact written to stderr. Why not just temporarily run it in foreground? If you want timestamped rows, just pipe the output to something that can add timestamps, like awk: svxlink | awk '{ print strftime("%c") ": " $0 }' You'd have to add your restart loop as well of course. If you do not want to be logged in to the system all the time, use a utility like "tmux" (or "screen") to be able to detach and reattach to a terminal session. > I think it needs a decision, whether to send error output to stdout, or to > open the logfile on stderr and not use the redirect pipe trick on stderr. > Of course that means that stderr output will not be timestamped in the log. > Maybe it could also be fixed by catching the SIGABRT signal and flushing the > error pipe before resetting the signal and raising it again? I'll need to think about that one. If you don't want it to be forgotten, enter a bug report at: http://sourceforge.net/p/svxlink/bugs/ > I also noticed this fragment at the end of main (in svxlink.cpp): > > if (sigaction(SIGHUP, &sighup_oldact, NULL) == -1) > { > perror("sigaction"); > } > > if (sigaction(SIGHUP, &sigterm_oldact, NULL) == -1) > { > perror("sigaction"); > } > > if (sigaction(SIGHUP, &sigint_oldact, NULL) == -1) > { > perror("sigaction"); > } > > I think the last to SIGHUP should be SIGTERM and SIGINT respectively. > Of course this fixes no problems, in fact this part could be removed > completely without any effect. Yes, that's wrong. Thanks for pointing that out. I have fixed it in the 13.12 branch. But as you say, they have no real effect. They are only there for completeness. 73's de SM0SVX / Tobias > > Rob |
From: Rob J. <pe...@am...> - 2014-04-22 21:17:20
|
SM0SVX wrote: > On Sunday 13 April 2014 18:13:23 Rob Janssen wrote: >> Here are a couple more crashdumps from the same repeater. >> The first 3 are apparently a different situation that ends in an Abort >> (signal 6), the last one is again a Segmentation Violation (signal 11) and >> looks much like the first. > Thanks for helping to solve this problem and thanks for the dumps. I'll have a > look at them. > I have been working on it and I noticed that several versions of svxlink had been installed on the computer on top of eachother. I decided to uninstall everything and start from scratch, especially after I saw a backtrace in gdb where part of the function list had no matching symbols. Unfortunately I made the mistake to uninstall in reverse order: first a "make uninstall" to undo the "make install" I did last, and then a couple of "rpm -e" commands to remove the early version that had been installed using rpm, then some more rm commands to remove libraries still left in /usr/lib. I found out the hard way that "make uninstall" simply removes the /etc/svxlink/svxlink.conf file, not a file that it has installed but the carefully modified one :-( I had expected (in fact I had not really thought about it) that it would leave config files, certainly those that had been modified, or that it would rename them. rpm -e does that, but as I ran the "make uninstall" first, the svxlink.conf file already was lost. The latest backup was two months old. Of course I have arranged for a daily backup of those directories now. What I did not really expect did happen though: after the clean install, the crashes as I have described have not yet occurred again. So maybe there still was some installation conflict that was resolve this way, and that can result when only "make install" is used? To be really sure, we need to run it a few days more. But usually it crashed 6 times a day or more, and now it has not crashed for more than a day. Is there a script available that removes all installed files from different svxlink versions? (so one can make a clean start) Of course I now know to save the modified config files first :-) Rob |
From: Rob J. <pe...@am...> - 2014-04-26 09:00:12
|
I noticed that the default /etc/sysconfig/svxlink and /etc/sysconfig/remotetrx files have the configuration path set as /etc/svxlink.conf and /etc/remotetrx.conf, while those config files are actually placed in /etc/svxlink/svxlink.conf and /etc/svxlink/remotetrx.conf. Probably a good idea to change the paths in the default files. After my inadvertent uninstall earlier this week this file suddenly had the wrong content program would not start as a service. 73, Rob |
From: SM0SVX <sm...@us...> - 2014-04-26 13:27:20
|
On Saturday 26 April 2014 11:00:00 Rob Janssen wrote: > I noticed that the default /etc/sysconfig/svxlink and > /etc/sysconfig/remotetrx files have the configuration path set as > /etc/svxlink.conf and /etc/remotetrx.conf, while those config files are > actually placed in /etc/svxlink/svxlink.conf and > /etc/svxlink/remotetrx.conf. > > Probably a good idea to change the paths in the default files. After my > inadvertent uninstall earlier this week this file suddenly had the wrong > content program would not start as a service. Thanks! Fixed in the 13.12 release branch. 73's de SM0SVX / Tobias > > 73, > Rob |
From: Rob J. <pe...@am...> - 2014-04-28 18:08:08
|
SM0SVX wrote: > >> What I did not really expect did happen though: after the clean install, the >> crashes as I have described have not yet occurred again. So maybe there >> still was some installation conflict that was resolve this way, and that >> can result when only "make install" is used? To be really sure, we need to >> run it a few days more. But usually it crashed 6 times a day or more, and >> now it has not crashed for more than a day. > That's great! I hope it will continue to run. I thought it was kind of strange > that you had so many crashes since I have systems that have been running > without problems for a very long time. It turns out that the improved stability has nothing to do with the uninstall/reinstall, but instead it probably was caused by the revert to the older svxlink.conf In the backup we had a voter config with these parameters: VOTING_DELAY=100 BUFFER_LENGTH=0 REVOTE_INTERVAL=1000 HYSTERESIS=50 RX_SWITCH_DELAY=500 SQL_CLOSE_REVOTE_DELAY=500 It has been running stable for several days with these params. Now, BUFFER_LENGTH has been changed to 100 and the crashes are back, same stackdumps as I have presented before. That parameter was also present in the config I accidentally deleted. Maybe you can look again for the cause of this issue? (from the backtrace I suspect that something happens during the playout of that buffer, and the program then crashes) Anyway, here is what I made for uninstall. Probably not relevant anymore, but maybe others can use it (hopefully there are no unintended linewraps): #!/bin/bash # uninstall all svxlink software to get a clean slate before installing # a new version - Rob PE1CHL echo -n "Uninstall svxlink - are you sure? (y/n): " read answer if [ "$answer" != "y" ] then exit 1 fi now=`date +%Y%m%d%H%M%S` backup="/tmp/backup-$now.tar.gz" echo echo "Making config backup to $backup..." tar czf $backup /etc/svxlink /usr/share/svxlink/*.* /etc/init.d/svxlink /etc/init.d/remotetrx /etc/logrotate.d/svxlink /etc/logrotate.d/remotetrx /etc/sysconfig/svxlink /etc/sysconfig/remotetrx /etc/init.d/svxlink stop /etc/init.d/remotetrx stop if [ -x /bin/rpm ] then if rpm -qa | grep -q svxlink then echo "Removing rpm-installed version" rpm -e echolib-devel libasync-devel rpm -e svxlink-server qtel echolib libasync fi fi rm -fv /etc/security/console.perms.d/90-svxlink.perms rm -fv /etc/udev/rules.d/10-svxlink.rules rm -fv /usr/bin/qtel /usr/bin/remotetrx /usr/bin/siglevdetcal /usr/bin/svxlink rm -fvr /usr/include/svxlink rm -fv /usr/lib/libasyncaudio* /usr/lib/libasynccore* /usr/lib/libasynccpp* /usr/lib/libasyncqt* rm -fv /usr/lib/libecholib* /usr/lib/liblocationinfo.a /usr/lib/libtrx.a rm -fvr /usr/lib/svxlink rm -fv /usr/share/applications/qtel.desktop /usr/share/icons/link.xpm rm -fv /usr/share/man/man1/qtel.1.gz /usr/share/man/man1/remotetrx.1.gz rm -fv /usr/share/man/man1/siglevdetcal.1.gz /usr/share/man/man1/svxlink.1.gz rm -fv /usr/share/man/man5/ModuleDtmfRepeater.conf.5.gz rm -fv /usr/share/man/man5/ModuleEchoLink.conf.5.gz rm -fv /usr/share/man/man5/ModuleHelp.conf.5.gz rm -fv /usr/share/man/man5/ModuleParrot.conf.5.gz rm -fv /usr/share/man/man5/ModulePropagationMonitor.conf.5.gz rm -fv /usr/share/man/man5/ModuleSelCallEnc.conf.5.gz rm -fv /usr/share/man/man5/ModuleTclVoiceMail.conf.5.gz rm -fv /usr/share/man/man5/remotetrx.conf.5.gz rm -fv /usr/share/man/man5/svxlink.conf.5.gz rm -fvr /usr/share/qtel rm -fv /usr/share/svxlink/events.d/*.tcl* /usr/share/svxlink/modules.d/*.tcl* rmdir -v /usr/share/svxlink/* /usr/share/svxlink rmdir -v /var/spool/svxlink/* /var/spool/svxlink echo echo -n "Remove configuration files and other customization? (y/n): " read answer if [ "$answer" != "y" ] then exit 0 fi rm -fv /etc/init.d/svxlink /etc/init.d/remotetrx rm -fv /etc/logrotate.d/svxlink /etc/logrotate.d/remotetrx rm -fv /etc/sysconfig/svxlink /etc/sysconfig/remotetrx rm -fvr /etc/svxlink /usr/share/svxlink/*.* echo echo -n "Remove stored data (qso recorder, voice mail etc)? (y/n): " read answer if [ "$answer" != "y" ] then exit 0 fi rm -fvr /usr/share/svxlink rm -fvr /var/spool/svxlink exit 0 |
From: SM0SVX <sm...@us...> - 2014-04-28 20:51:42
|
On Monday 28 April 2014 20:07:57 Rob Janssen wrote: > SM0SVX wrote: > >> What I did not really expect did happen though: after the clean install, > >> the crashes as I have described have not yet occurred again. So maybe > >> there still was some installation conflict that was resolve this way, > >> and that can result when only "make install" is used? To be really sure, > >> we need to run it a few days more. But usually it crashed 6 times a day > >> or more, and now it has not crashed for more than a day. > > > > That's great! I hope it will continue to run. I thought it was kind of > > strange that you had so many crashes since I have systems that have been > > running without problems for a very long time. > > It turns out that the improved stability has nothing to do with the > uninstall/reinstall, but instead it probably was caused by the revert to > the older svxlink.conf In the backup we had a voter config with these > parameters: > > VOTING_DELAY=100 > BUFFER_LENGTH=0 > REVOTE_INTERVAL=1000 > HYSTERESIS=50 > RX_SWITCH_DELAY=500 > SQL_CLOSE_REVOTE_DELAY=500 > > It has been running stable for several days with these params. > Now, BUFFER_LENGTH has been changed to 100 and the crashes are back, same > stackdumps as I have presented before. That parameter was also present in > the config I accidentally deleted. Maybe you can look again for the cause > of this issue? > (from the backtrace I suspect that something happens during the playout of > that buffer, and the program then crashes) Very interesting! That could explain why I have not seen the crashes on any of my systems since I have been using zero buffer length. I'll have a look at it later this week. 73's de SM0SVX / Tobias > > Anyway, here is what I made for uninstall. Probably not relevant anymore, > but maybe others can use it (hopefully there are no unintended linewraps): > > #!/bin/bash > # uninstall all svxlink software to get a clean slate before installing > # a new version - Rob PE1CHL > > echo -n "Uninstall svxlink - are you sure? (y/n): " > read answer > if [ "$answer" != "y" ] > then > exit 1 > fi > > now=`date +%Y%m%d%H%M%S` > backup="/tmp/backup-$now.tar.gz" > echo > echo "Making config backup to $backup..." > tar czf $backup /etc/svxlink /usr/share/svxlink/*.* /etc/init.d/svxlink > /etc/init.d/remotetrx /etc/logrotate.d/svxlink /etc/logrotate.d/remotetrx > /etc/sysconfig/svxlink /etc/sysconfig/remotetrx > > /etc/init.d/svxlink stop > /etc/init.d/remotetrx stop > > if [ -x /bin/rpm ] > then > if rpm -qa | grep -q svxlink > then > echo "Removing rpm-installed version" > rpm -e echolib-devel libasync-devel > rpm -e svxlink-server qtel echolib libasync > fi > fi > > rm -fv /etc/security/console.perms.d/90-svxlink.perms > rm -fv /etc/udev/rules.d/10-svxlink.rules > rm -fv /usr/bin/qtel /usr/bin/remotetrx /usr/bin/siglevdetcal > /usr/bin/svxlink rm -fvr /usr/include/svxlink > rm -fv /usr/lib/libasyncaudio* /usr/lib/libasynccore* /usr/lib/libasynccpp* > /usr/lib/libasyncqt* rm -fv /usr/lib/libecholib* /usr/lib/liblocationinfo.a > /usr/lib/libtrx.a rm -fvr /usr/lib/svxlink > rm -fv /usr/share/applications/qtel.desktop /usr/share/icons/link.xpm > rm -fv /usr/share/man/man1/qtel.1.gz /usr/share/man/man1/remotetrx.1.gz > rm -fv /usr/share/man/man1/siglevdetcal.1.gz > /usr/share/man/man1/svxlink.1.gz rm -fv > /usr/share/man/man5/ModuleDtmfRepeater.conf.5.gz > rm -fv /usr/share/man/man5/ModuleEchoLink.conf.5.gz > rm -fv /usr/share/man/man5/ModuleHelp.conf.5.gz > rm -fv /usr/share/man/man5/ModuleParrot.conf.5.gz > rm -fv /usr/share/man/man5/ModulePropagationMonitor.conf.5.gz > rm -fv /usr/share/man/man5/ModuleSelCallEnc.conf.5.gz > rm -fv /usr/share/man/man5/ModuleTclVoiceMail.conf.5.gz > rm -fv /usr/share/man/man5/remotetrx.conf.5.gz > rm -fv /usr/share/man/man5/svxlink.conf.5.gz > rm -fvr /usr/share/qtel > rm -fv /usr/share/svxlink/events.d/*.tcl* > /usr/share/svxlink/modules.d/*.tcl* rmdir -v /usr/share/svxlink/* > /usr/share/svxlink > rmdir -v /var/spool/svxlink/* /var/spool/svxlink > > echo > echo -n "Remove configuration files and other customization? (y/n): " > read answer > if [ "$answer" != "y" ] > then > exit 0 > fi > > rm -fv /etc/init.d/svxlink /etc/init.d/remotetrx > rm -fv /etc/logrotate.d/svxlink /etc/logrotate.d/remotetrx > rm -fv /etc/sysconfig/svxlink /etc/sysconfig/remotetrx > rm -fvr /etc/svxlink /usr/share/svxlink/*.* > > echo > echo -n "Remove stored data (qso recorder, voice mail etc)? (y/n): " > read answer > if [ "$answer" != "y" ] > then > exit 0 > fi > > rm -fvr /usr/share/svxlink > rm -fvr /var/spool/svxlink > exit 0 > > > ---------------------------------------------------------------------------- > -- "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Svxlink-devel mailing list > Svx...@li... > https://lists.sourceforge.net/lists/listinfo/svxlink-devel |
From: SM0SVX <sm...@us...> - 2014-04-30 10:26:02
|
On Monday 28 April 2014 22:51:34 SM0SVX wrote: > On Monday 28 April 2014 20:07:57 Rob Janssen wrote: > > SM0SVX wrote: > > >> What I did not really expect did happen though: after the clean > > >> install, > > >> the crashes as I have described have not yet occurred again. So maybe > > >> there still was some installation conflict that was resolve this way, > > >> and that can result when only "make install" is used? To be really > > >> sure, > > >> we need to run it a few days more. But usually it crashed 6 times a > > >> day > > >> or more, and now it has not crashed for more than a day. > > > > > > That's great! I hope it will continue to run. I thought it was kind of > > > strange that you had so many crashes since I have systems that have been > > > running without problems for a very long time. > > > > It turns out that the improved stability has nothing to do with the > > uninstall/reinstall, but instead it probably was caused by the revert to > > the older svxlink.conf In the backup we had a voter config with these > > parameters: > > > > VOTING_DELAY=100 > > BUFFER_LENGTH=0 > > REVOTE_INTERVAL=1000 > > HYSTERESIS=50 > > RX_SWITCH_DELAY=500 > > SQL_CLOSE_REVOTE_DELAY=500 > > > > It has been running stable for several days with these params. > > Now, BUFFER_LENGTH has been changed to 100 and the crashes are back, same > > stackdumps as I have presented before. That parameter was also present > > in > > the config I accidentally deleted. Maybe you can look again for the cause > > of this issue? > > (from the backtrace I suspect that something happens during the playout of > > that buffer, and the program then crashes) > > Very interesting! That could explain why I have not seen the crashes on any > of my systems since I have been using zero buffer length. I'll have a look > at it later this week. Now you have a fix in both Subversion trunk and the 13.12 branch which I hope will resolve the problem. I was able to quite easily provoke the system to crash when setting BUFFER_LENGTH to 100. The problem I fixed occurred when a receiver reported a signal strength lower than -100 in certain situations. It was some stupid code that assumed that a signal level below -100 could never be reported. That code is now fixed to be safe. Have you properly calibrated the signal level measurements for all your receivers? If not, that could have been a reason for the system producing very low siglev values. Normally they should be within 0 to 100. 73's de SM0SVX / Tobias |
From: Rob J. <pe...@am...> - 2014-05-01 07:31:44
|
Yesterday I wrote: > Note that we had 2 kinds of crashes, a segmentation violation and an assertion failure, and it may be that > one is now fixed and the other not yet. It has crashed 5 times after installing the new version and now it is always the same error I posted yesterday, while before it was about 3/4 the segmentation violation and 1/4 this error. So it looks like one cause has been eliminated, and (at least :) one remains. Rob |
From: Rob J. <pe...@am...> - 2014-04-30 17:12:20
|
SM0SVX wrote: > Now you have a fix in both Subversion trunk and the 13.12 branch which I hope > will resolve the problem. I was able to quite easily provoke the system to > crash when setting BUFFER_LENGTH to 100. The problem I fixed occurred when a > receiver reported a signal strength lower than -100 in certain situations. It > was some stupid code that assumed that a signal level below -100 could never > be reported. That code is now fixed to be safe. Thank you for your work! I took the Voter.cpp and Voter.h from svn and compiled it in the 13.12 version, set the buffer to 100 again but shortly afterwards it again crashed. After that it has been up for another hour so it has not become worse. This time it was the assertion error: svxlink: Macho.hpp:891: void Macho::_SubstateInstance<S>::deleteBox() [with S = Voter::SwitchActiveRx]: Assertion `myBox' failed. Last log messages are: Wed Apr 30 18:01:13 2014: Voter: Switching from "NetRx_HVS" (0) to "Rx1" (22.2904) Wed Apr 30 18:01:16 2014: Voter: Switching from "Rx1" (23.6051) to "NetRx_HVS" (114.441) Wed Apr 30 18:01:22 2014: Voter: Switching from "NetRx_HVS" (0) to "Rx1" (23.6532) Wed Apr 30 18:01:22 2014: Voter: The squelch is CLOSED (Rx1=41.7713) Wed Apr 30 18:01:22 2014: Voter: The squelch is OPEN (Rx1=16.1715) The crash was 18:01:24 This is the backtrace in gdb: (gdb) bt #0 0x00b92424 in __kernel_vsyscall () #1 0x00400b11 in raise () from /lib/libc.so.6 #2 0x004023ea in abort () from /lib/libc.so.6 #3 0x003f9e2b in __assert_fail_base () from /lib/libc.so.6 #4 0x003f9ee6 in __assert_fail () from /lib/libc.so.6 #5 0x0816f276 in Macho::_SubstateInstance<Voter::SwitchActiveRx>::deleteBox (this=0xa5075f8) at Macho.hpp:891 #6 0x081705d8 in Macho::Link<Voter::SwitchActiveRx, Voter::SquelchOpen>::_deleteBox ( this=0xa5076a8, instance=...) at Macho.hpp:1959 #7 0x0818f6d4 in Macho::_StateInstance::exit (this=0xa5075f8, next=...) at Macho.cpp:144 #8 0x0818fbaa in Macho::_MachineBase::rattleOn (this=0xa1a4d1c) at Macho.cpp:317 #9 0x081674a0 in Macho::Machine<Voter::Top>::dispatch (this=0xa1a4d1c, event=0xa55b760, destroy=true) at Macho.hpp:1806 #10 0x081628ff in Voter::Top::eventTimerExpired (this=0xa1a7500, t=0xa1a754c) at Voter.cpp:794 #11 0x0816e3ec in sigc::bound_mem_functor1<void, Voter::Top, Async::Timer*>::operator() ( this=0xa1a75ec, _A_a1=@0xa560608) at /usr/include/sigc++-2.0/sigc++/functors/mem_fun.h:1851 #12 0x0816d7db in sigc::adaptor_functor<sigc::bound_mem_functor1<void, Voter::Top, Async::Timer*> >::operator()<Async::Timer* const&> (this=0xa1a75e8, _A_arg1=@0xa560608) at /usr/include/sigc++-2.0/sigc++/adaptors/adaptor_trait.h:84 #13 0x0816c6c5 in sigc::internal::slot_call1<sigc::bound_mem_functor1<void, Voter::Top, Async::Timer*>, void, Async::Timer*>::call_it (rep=0xa1a75d0, a_1=@0xa560608) at /usr/include/sigc++-2.0/sigc++/functors/slot.h:137 #14 0x002abd08 in sigc::internal::signal_emit1<void, Async::Timer*, sigc::nil>::emit ( impl=0xa1a4c88, _A_a1=@0xa560608) at /usr/include/sigc++-2.0/sigc++/signal.h:1006 #15 0x002ab3af in sigc::signal1<void, Async::Timer*, sigc::nil>::emit (this=0xa1a7550, _A_a1=@0xa560608) at /usr/include/sigc++-2.0/sigc++/signal.h:2773 #16 0x002aaafa in sigc::signal1<void, Async::Timer*, sigc::nil>::operator() (this=0xa1a7550, _A_a1=@0xa560608) at /usr/include/sigc++-2.0/sigc++/signal.h:2781 #17 0x002a9935 in Async::CppApplication::exec (this=0xbf901230) at AsyncCppApplication.cpp:228 #18 0x0810dd58 in main (argc=3, argv=0xbf901724) at svxlink.cpp:514 Note that we had 2 kinds of crashes, a segmentation violation and an assertion failure, and it may be that one is now fixed and the other not yet. > > Have you properly calibrated the signal level measurements for all your > receivers? If not, that could have been a reason for the system producing very > low siglev values. Normally they should be within 0 to 100. > Ok it has been a theory in our group as well that the calibration has something to do with it. But it was not completely pinpointed what the exact requirements are. We have recalibrated several times, and at the moment the values sometimes are slightly below zero but not -100 anymore. So while this may have been a factor, it probably is not the cause of the current problem. I sometimes see values slightly above 100 (e.g. 109), is that a problem as well? Maybe the value can be clamped somewhere? Our problem is that our sites are broadcast transmitter sites. The antenna of the secondary receiver we are now testing is only about a meter from an FM broadcast transmitter antenna radiating several kilowatts. In such an environment it is sometimes difficult to calibrate to a fixed noise floor. Rob |
From: SM0SVX <sm...@us...> - 2014-05-03 06:10:49
|
On Wednesday 30 April 2014 19:12:07 Rob Janssen wrote: > SM0SVX wrote: > > Now you have a fix in both Subversion trunk and the 13.12 branch which I > > hope will resolve the problem. I was able to quite easily provoke the > > system to crash when setting BUFFER_LENGTH to 100. The problem I fixed > > occurred when a receiver reported a signal strength lower than -100 in > > certain situations. It was some stupid code that assumed that a signal > > level below -100 could never be reported. That code is now fixed to be > > safe. > > Thank you for your work! > I took the Voter.cpp and Voter.h from svn and compiled it in the 13.12 > version, set the buffer to 100 again but shortly afterwards it again > crashed. After that it has been up for another hour so it has not become > worse. It would be better if you could use the 13.12 branch directly instead of copying files. Then I know we are testing exactly the same code. svn co svn://svn.code.sf.net/p/svxlink/svn/branches/releases/13.12 > This time it was the assertion error: > svxlink: Macho.hpp:891: void Macho::_SubstateInstance<S>::deleteBox() [with > S = Voter::SwitchActiveRx]: Assertion `myBox' failed. > > Last log messages are: > Wed Apr 30 18:01:13 2014: Voter: Switching from "NetRx_HVS" (0) to "Rx1" > (22.2904) Wed Apr 30 18:01:16 2014: Voter: Switching from "Rx1" (23.6051) > to "NetRx_HVS" (114.441) Wed Apr 30 18:01:22 2014: Voter: Switching from > "NetRx_HVS" (0) to "Rx1" (23.6532) Wed Apr 30 18:01:22 2014: Voter: The > squelch is CLOSED (Rx1=41.7713) Wed Apr 30 18:01:22 2014: Voter: The > squelch is OPEN (Rx1=16.1715) > > The crash was 18:01:24 > > This is the backtrace in gdb: > (gdb) bt > #0 0x00b92424 in __kernel_vsyscall () > #1 0x00400b11 in raise () from /lib/libc.so.6 > #2 0x004023ea in abort () from /lib/libc.so.6 > #3 0x003f9e2b in __assert_fail_base () from /lib/libc.so.6 > #4 0x003f9ee6 in __assert_fail () from /lib/libc.so.6 > #5 0x0816f276 in Macho::_SubstateInstance<Voter::SwitchActiveRx>::deleteBox > (this=0xa5075f8) at Macho.hpp:891 > #6 0x081705d8 in Macho::Link<Voter::SwitchActiveRx, > Voter::SquelchOpen>::_deleteBox ( this=0xa5076a8, instance=...) at > Macho.hpp:1959 > #7 0x0818f6d4 in Macho::_StateInstance::exit (this=0xa5075f8, next=...) at > Macho.cpp:144 #8 0x0818fbaa in Macho::_MachineBase::rattleOn > (this=0xa1a4d1c) at Macho.cpp:317 #9 0x081674a0 in > Macho::Machine<Voter::Top>::dispatch (this=0xa1a4d1c, event=0xa55b760, > destroy=true) at Macho.hpp:1806 > #10 0x081628ff in Voter::Top::eventTimerExpired (this=0xa1a7500, > t=0xa1a754c) at Voter.cpp:794 #11 0x0816e3ec in > sigc::bound_mem_functor1<void, Voter::Top, Async::Timer*>::operator() ( > this=0xa1a75ec, _A_a1=@0xa560608) at > /usr/include/sigc++-2.0/sigc++/functors/mem_fun.h:1851 #12 0x0816d7db in > sigc::adaptor_functor<sigc::bound_mem_functor1<void, Voter::Top, > Async::Timer*> >::operator()<Async::Timer* const&> (this=0xa1a75e8, > _A_arg1=@0xa560608) at > /usr/include/sigc++-2.0/sigc++/adaptors/adaptor_trait.h:84 #13 0x0816c6c5 > in sigc::internal::slot_call1<sigc::bound_mem_functor1<void, Voter::Top, > Async::Timer*>, void, Async::Timer*>::call_it (rep=0xa1a75d0, > a_1=@0xa560608) at /usr/include/sigc++-2.0/sigc++/functors/slot.h:137 > #14 0x002abd08 in sigc::internal::signal_emit1<void, Async::Timer*, > sigc::nil>::emit ( impl=0xa1a4c88, _A_a1=@0xa560608) at > /usr/include/sigc++-2.0/sigc++/signal.h:1006 #15 0x002ab3af in > sigc::signal1<void, Async::Timer*, sigc::nil>::emit (this=0xa1a7550, > _A_a1=@0xa560608) at /usr/include/sigc++-2.0/sigc++/signal.h:2773 #16 > 0x002aaafa in sigc::signal1<void, Async::Timer*, sigc::nil>::operator() > (this=0xa1a7550, _A_a1=@0xa560608) at > /usr/include/sigc++-2.0/sigc++/signal.h:2781 #17 0x002a9935 in > Async::CppApplication::exec (this=0xbf901230) at > AsyncCppApplication.cpp:228 #18 0x0810dd58 in main (argc=3, > argv=0xbf901724) at svxlink.cpp:514 > > Note that we had 2 kinds of crashes, a segmentation violation and an > assertion failure, and it may be that one is now fixed and the other notsvn checkout svn://svn.code.sf.net/p/svxlink/svn/trunk svxlink-svn > yet. Yes, it's probably so. There were two bugs. I have now fixed another bug which caused the voter to crash. I do not get exactly the same stack trace as you but I hope it's the same bug causing both the crashes. Please try the new code out. I hope it will fix your crash problems. > > Have you properly calibrated the signal level measurements for all your > > receivers? If not, that could have been a reason for the system producing > > very low siglev values. Normally they should be within 0 to 100. > > Ok it has been a theory in our group as well that the calibration has > something to do with it. But it was not completely pinpointed what the > exact requirements are. We have recalibrated several times, and at the > moment the values sometimes are slightly below zero but not -100 anymore. > So while this may have been a factor, it probably is not the cause of the > current problem. I sometimes see values slightly above 100 (e.g. 109), is > that a problem as well? 109 is not a problem but too high values may be a problem. There is an upper limit of 120 above which SvxLink will interpret it as a bogus value and instead report a zero signal level. I'm not happy with this limitation but it was introduced as a workaround for receivers where unsquelched audio is not available. When the squelch close, SvxLink "hear" maximum silence and interprets this as a very strong signal. The voter will then switch to this receiver which of course is wrong. Since that is a special case, this behavior should probably be turned on by configuration. It's presently hard coded. > Maybe the value can be clamped somewhere? Our > problem is that our sites are broadcast transmitter sites. The antenna of > the secondary receiver we are now testing is only about a meter from an FM > broadcast transmitter antenna radiating several kilowatts. In such an > environment it is sometimes difficult to calibrate to a fixed noise floor. Yes, I've considered clamping it. That should be fine as long as we handle the case described above. 73's de SM0SVX / Tobias > > Rob |
From: Rob J. <pe...@am...> - 2014-05-03 07:48:52
|
SM0SVX wrote: > It would be better if you could use the 13.12 branch directly instead of > copying files. Then I know we are testing exactly the same code. > > svn co svn://svn.code.sf.net/p/svxlink/svn/branches/releases/13.12 Ok I now checked out and compiled that version. Things I changed locally (and for now are now gone again) are only: - I added a couple of assert statements in places where I had seen it crash before (e.g. rattleOn) - I made a small modification in the logging so that stderr was not redirected, and I could see the actual assert failure. however, it could also be read in the coredump using gdb. - when daemonized, I added a chdir /tmp so that it could actually dump core when running as svxlink user - I had temporarily changed the upper limit of siglev from 120 to 200 late this week (after previous mail) > > Yes, it's probably so. There were two bugs. I have now fixed another bug which > caused the voter to crash. I do not get exactly the same stack trace as you > but I hope it's the same bug causing both the crashes. > > Please try the new code out. I hope it will fix your crash problems. It is now running the unmodified 13.12 checkout. > > >>> Have you properly calibrated the signal level measurements for all your >>> receivers? If not, that could have been a reason for the system producing >>> very low siglev values. Normally they should be within 0 to 100. >> Ok it has been a theory in our group as well that the calibration has >> something to do with it. But it was not completely pinpointed what the >> exact requirements are. We have recalibrated several times, and at the >> moment the values sometimes are slightly below zero but not -100 anymore. >> So while this may have been a factor, it probably is not the cause of the >> current problem. I sometimes see values slightly above 100 (e.g. 109), is >> that a problem as well? > 109 is not a problem but too high values may be a problem. There is an upper > limit of 120 above which SvxLink will interpret it as a bogus value and > instead report a zero signal level. I had seen this and I changed this limit to 200 last thursday evening because I guess that our secondary receiver sometimes peaks above 120 during normal operation. This may explain the random switchovers that we sometimes hear (signal is good on secondary receiver but suddenly it switches to main receiver, maybe because the siglev is above 120 for a short time and changed to 0 by that code, and then shortly afterwards it is below 120 again). It may even be that this situation causes the quick changeovers that ultimately make the software crash on the main site. I noticed that this change would only be effective once it was made on our secondary site, but I could not install it at that time. > I'm not happy with this limitation but it > was introduced as a workaround for receivers where unsquelched audio is not > available. When the squelch close, SvxLink "hear" maximum silence and > interprets this as a very strong signal. The voter will then switch to this > receiver which of course is wrong. Since that is a special case, this behavior > should probably be turned on by configuration. It's presently hard coded. Ok now I understand the reason for that check. I already wondered what would be the condition where a value above 120 would have to be considered as zero. Maybe it is better to use a squelch signal via the RS232 port in that case, to fix the quality to zero when that signal indicates the receiver is squelched. Or indeed put the limit in a config variable so it has to be set only in systems like that. The observation has been that the signal level mechanism works well in a clean environment, but when there are other strong signals present locally there may be effects that make the detector return bogus values or make the previously made calibration again invalid. (on a site where several broadcast programmes are transmitted around 100 MHz there unavoidably is a higher noise floor on 145 MHz and it of course is not completely constant) The hardware we use at the central site (and will use on the secondary sites in the future) has a calibrated RSSI (signal strength) value available on the management inteface, although it is doubtful if we can poll it at a sufficient rate without making the firmware crash (there appears to be such an issue). When RSSI is available it could be used, possibly in combination with the existing mechanism. (use signal/noise ratio for low SNR signals, returning a quality e.g. up to 50, and for stronger signals use the RSSI with a transfer function that results in values 50..100) Hopefully then we don't have to work with the measurements from signals with good SNR anymore (where the small interferences have the largest effect on the detector now). Rob |
From: SM0SVX <sm...@us...> - 2014-05-03 10:03:05
|
On Saturday 03 May 2014 09:48:40 Rob Janssen wrote: > SM0SVX wrote: > > It would be better if you could use the 13.12 branch directly instead of > > copying files. Then I know we are testing exactly the same code. > > > > svn co svn://svn.code.sf.net/p/svxlink/svn/branches/releases/13.12 > > Ok I now checked out and compiled that version. > Things I changed locally (and for now are now gone again) are only: > - I added a couple of assert statements in places where I had seen it crash > before (e.g. rattleOn) - I made a small modification in the logging so that > stderr was not redirected, and I could see the actual assert failure. > however, it could also be read in the coredump using gdb. - when > daemonized, I added a chdir /tmp so that it could actually dump core when > running as svxlink user - I had temporarily changed the upper limit of > siglev from 120 to 200 late this week (after previous mail) > > Yes, it's probably so. There were two bugs. I have now fixed another bug > > which caused the voter to crash. I do not get exactly the same stack > > trace as you but I hope it's the same bug causing both the crashes. > > > > Please try the new code out. I hope it will fix your crash problems. > > It is now running the unmodified 13.12 checkout. ok, good. > >>> Have you properly calibrated the signal level measurements for all your > >>> receivers? If not, that could have been a reason for the system > >>> producing > >>> very low siglev values. Normally they should be within 0 to 100. > >> > >> Ok it has been a theory in our group as well that the calibration has > >> something to do with it. But it was not completely pinpointed what the > >> exact requirements are. We have recalibrated several times, and at the > >> moment the values sometimes are slightly below zero but not -100 anymore. > >> So while this may have been a factor, it probably is not the cause of the > >> current problem. I sometimes see values slightly above 100 (e.g. 109), > >> is > >> that a problem as well? > > > > 109 is not a problem but too high values may be a problem. There is an > > upper limit of 120 above which SvxLink will interpret it as a bogus value > > and instead report a zero signal level. > > I had seen this and I changed this limit to 200 last thursday evening > because I guess that our secondary receiver sometimes peaks above 120 > during normal operation. This may explain the random switchovers that we > sometimes hear (signal is good on secondary receiver but suddenly it > switches to main receiver, maybe because the siglev is above 120 for a > short time and changed to 0 by that code, and then shortly afterwards it is > below 120 again). > It may even be that this situation causes the quick changeovers that > ultimately make the software crash on the main site. I noticed that this > change would only be effective once it was made on our secondary site, but > I could not install it at that time. > > I'm not happy with this limitation but it > > > > was introduced as a workaround for receivers where unsquelched audio is > > not > > available. When the squelch close, SvxLink "hear" maximum silence and > > interprets this as a very strong signal. The voter will then switch to > > this > > receiver which of course is wrong. Since that is a special case, this > > behavior should probably be turned on by configuration. It's presently > > hard coded. > Ok now I understand the reason for that check. I already wondered what > would be the condition where a value above 120 would have to be considered > as zero. Maybe it is better to use a squelch signal via the RS232 port in > that case, to fix the quality to zero when that signal indicates the > receiver is squelched. Or indeed put the limit in a config variable so it > has to be set only in systems like that. I'd like to keep the squelch indicator and the signal level detector separate from each other rather than adding new dependencies. To be able to absolutely guarantee that the squelch is open during the time that the signal level is estimated we would have to sample the squelch pin at the same rate as the audio. This will make the system less flexible since we are adding close to real-time requirements. The correct solution really is to use unsquelched audio and as a workaround being able to set the level above which the signal level is considered bogus using configuration. > The observation has been that the signal level mechanism works well in a > clean environment, but when there are other strong signals present locally > there may be effects that make the detector return bogus values or make the > previously made calibration again invalid. (on a site where several > broadcast programmes are transmitted around 100 MHz there unavoidably is a > higher noise floor on 145 MHz and it of course is not completely constant) Yes, it's hard to do a good signal level detector using audio only and you can say that a requirement is that the radio environment is the same at all receiver sites so that the only thing affecting the estimated signal strength is the signal you intend to receive and not some local interference. Then the absolute value is not interesting as long as the relative value indicate which receiver that is the best. > The hardware we use at the central site (and will use on the secondary sites > in the future) has a calibrated RSSI (signal strength) value available on > the management inteface, although it is doubtful if we can poll it at a > sufficient rate without making the firmware crash (there appears to be such > an issue). > When RSSI is available it could be used, possibly in combination with the > existing mechanism. (use signal/noise ratio for low SNR signals, returning > a quality e.g. up to 50, and for stronger signals use the RSSI with a > transfer function that results in values 50..100) Hopefully then we don't > have to work with the measurements from signals with good SNR anymore > (where the small interferences have the largest effect on the detector > now). As you may have seen, Adi (DL1HRC) is working on an extension where you can interface to hardware using external scripts that talk to SvxLink through PTYs. Right now it's for squelch, ptt and dtmf but the concept should be possible to extend to signal levels as well. Then an external script could be used to read the signal strength from your hardware and this is then written to the slave end of a PTY which SvxLink opens specifically for communicating signal level measurements. As for combining different types of signal level detectors, this will have to be something for the future but I guess it would be fully possible to implement using the same structure as for the voter: a signal level "detector" type that aggregate the estimate from multiple sensors. 73's de SM0SVX / Tobias > > Rob |