From: Gregg L. <gr...@li...> - 2009-02-20 18:57:23
|
I've recently committed a change to the Insteon_PLM.pm lib that addresses NACK handling. For those of you that don't know, the PLM will return a NACK when it becomes overly busy. In general, receipt of a NACK means that the message sent to the PLM needs to be resent. But, due to the async nature of sending to the PLM and getting messages back, it was possible to requeue the missed command more than once (thereby creating duplicates). If this situation cascades, then the problem becomes exponentially worse and the command queue fills up w/ re-attempts. The change includes checks throughout the PLM command logic to ensure that duplicate messages are never requeued. I'd like to thank Marc M (who is the proud owner of an insteon plasma signal sucker ;) ) for bringing the situation to light. Thanks to his "defective" insteon environment, the problem worsened enough to detect what was occurring--as this problem is not usually so obvious. I've since been doing some stress testing as this was symptomatic of situations that caused me to add in additional transmit delays and bulk message throttling. I'd be very interested in the results of those that have large (> 20 insteon devices) environments that understand their environment enough to feel comfortable experimenting. Here goes: Insteon_PLM_disable_throttling=1 The above ini param disables logic that will automatically delay the message sending queue by one second after three commands have been sent w/i a 1 second window. Now, the default inter-message delay is 0.25 seconds. So, the implication is that greatest default rate of sending is 3 messages in 1.5 seconds. Technically, the first three are sent in the first 0.5 seconds; but, as you can see, this has an effect of lowering aggregate bandwidth. Insteon_PLM_xmit_delay=0.15 As noted above, the default is 0.25; so, this cuts things down a bit. But, notice that the combination of the two changes means that the message rate is 10 messages (not 3) in 1.5 seconds. This is a 300%+ improvement in overall speed. It is possible that installations with large numbers of devices will see retransmits on mh startup due to insteon polled status messages colliding. It may be possible to work a "sliding scale" for throttling large numbers of async commands if this is a problem. So, will the above really matter? Well, I like my lighting to be as responsive as possible. And, sometimes I need multiple cascading lighting controls that can't be practically accommodated via scenes. So, I would argue that yes it can *if* the above can remain stable. It is unlikely that I'll disable/change the defaults anytime soon as they serve as the "safe" settings. But, I am interested in others' results. You can think of this exercise as similar to mobo over-clocking. I've been trying the above statements while able to monitor the logs and restarting mh. In the past, over-zealous message sending rates created what I attributed to "PLM lock-up". However, I'm now wondering if the "corner case" which the recent commit fixes is what I was creating more often. A word of warning... It is possible that the tweaks above could actually cause real PLM lockup. So, please leave them in place only while you are able to monitor and correct things if a real PLM lockup were to happen. I don't want to be responsible for any spousal abuse if the lights stop functioning and you are hurt with a swinging frying pan. Gregg |