[mh] Important insteon update

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I've recently committed a change to the Insteon_PLM.pm lib that 
addresses NACK handling. For those of you that don't know, the PLM will 
return a NACK when it becomes overly busy.  In general, receipt of a 
NACK means that the message sent to the PLM needs to be resent.  But, 
due to the async nature of sending to the PLM and getting messages back, 
it was possible to requeue the missed command more than once (thereby 
creating duplicates).  If this situation cascades, then the problem 
becomes exponentially worse and the command queue fills up w/ 
re-attempts.  The change includes checks throughout the PLM command 
logic to ensure that duplicate messages are never requeued.

I'd like to thank Marc M (who is the proud owner of an insteon plasma 
signal sucker ;) ) for bringing the situation to light.  Thanks to his 
"defective" insteon environment, the problem worsened enough to detect 
what was occurring--as this problem is not usually so obvious.

I've since been doing some stress testing as this was symptomatic of 
situations that caused me to add in additional transmit delays and bulk 
message throttling.  I'd be very interested in the results of those that 
have large (> 20 insteon devices) environments that understand their 
environment enough to feel comfortable experimenting.  Here goes:

Insteon_PLM_disable_throttling=1

The above ini param disables logic that will automatically delay the 
message sending queue by one second after three commands have been sent 
w/i a 1 second window.  Now, the default inter-message delay is 0.25 
seconds.  So, the implication is that greatest default rate of sending 
is 3 messages in 1.5 seconds.  Technically, the first three are sent in 
the first 0.5 seconds; but, as you can see, this has an effect of 
lowering aggregate bandwidth.

Insteon_PLM_xmit_delay=0.15

As noted above, the default is 0.25; so, this cuts things down a bit. 
But, notice that the combination of the two changes means that the 
message rate is 10 messages (not 3) in 1.5 seconds.  This is a 300%+ 
improvement in overall speed.  It is possible that installations with 
large numbers of devices will see retransmits on mh startup due to 
insteon polled status messages colliding.  It may be possible to work a 
"sliding scale" for throttling large numbers of async commands if this 
is a problem.

So, will the above really matter?  Well, I like my lighting to be as 
responsive as possible.  And, sometimes I need multiple cascading 
lighting controls that can't be practically accommodated via scenes. 
So, I would argue that yes it can *if* the above can remain stable.   It 
is unlikely that I'll disable/change the defaults anytime soon as they 
serve as the "safe" settings.  But, I am interested in others' results. 
  You can think of this exercise as similar to mobo over-clocking.

I've been trying the above statements while able to monitor the logs and 
restarting mh.  In the past, over-zealous message sending rates created 
what I attributed to "PLM lock-up".  However, I'm now wondering if the 
"corner case" which the recent commit fixes is what I was creating more 
often.

A word of warning... It is possible that the tweaks above could actually 
cause real PLM lockup.  So, please leave them in place only while you 
are able to monitor and correct things if a real PLM lockup were to 
happen.  I don't want to be responsible for any spousal abuse if the 
lights stop functioning and you are hurt with a swinging frying pan.

Gregg