OK, so I looked at the first failure.  Michael, I think I understand what you mean by out of sequence messages.  I see that the link data packet arrives before the ack.  You have the code set to ignore link data packets that arrive before the ack message.  Then the ack arrives and nothing happens.  I am surmising that something is clearing the current_message including the timeout(command).

I see this line in _process_command that allows an ACK message to clear the current message:
if (substr($msg{extra},4,2) eq '01') {

It looks like all of the ACKs have 00 for cmd2.  As a result, it looks like ACKs are clearing messages.


On Tue, Feb 26, 2013 at 5:17 PM, Marc MERLIN <marc_mh@merlins.org> wrote:
On Tue, Feb 26, 2013 at 08:59:59AM -0600, Michael Stovenour wrote:
> Would you mind pulling my latest i2_aldb_support branch updates then trying the PLM scan
> all device link tables?  I think there is still a command timeout issue if messages arrive
> out of order (due to retransmissions) but I can't reproduce it here.  If you don't get a
> final message like the one below then the scan didn't complete normally.
>
>       [Scan all link tables] All tables have completed scanning
>
> If you can reproduce the failure I'll push some more code with a bunch of debug logging.
> Otherwise we might just move on for now and revisit it later.

Sure thing, there you go.

I'm wondering if you merge your code with Kevin's hop count code whether that will work
around the problem, but indeed there is still a problem.

First scan failed in the middle with:
26/02/2013 12:51:33  [Scan all link tables] Now scanning: $br2_lamp (7 of 27)
26/02/2013 12:51:34  [Insteon::ALDB_i2] DEBUG4: Start of scan; initializing aldb structure
26/02/2013 12:51:36  [Insteon::ALDB_i2] DEBUG4: scan done; adding last address [0fe7] to empty array
26/02/2013 12:51:36  [Insteon::ALDB_i2] $br2_lamp completed aldb scan: status: good
26/02/2013 12:51:36  [Scan all link tables] Now scanning: $fmr_kpl (8 of 27)
26/02/2013 12:51:37  [Insteon::ALDB_i2] DEBUG3: $br2_lamp [0x0fe7] received: 0001010fe701000000000000000000 for _mem_activity= _mem_action=
26/02/2013 12:51:37  [Insteon::ALDB_i2] $br2_lamp: unhandled _mem_action=
26/02/2013 12:51:58  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:52:16  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:52:23  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:52:34  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:52:51  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:52:52  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:52:53  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:52:57  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:53:15  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:53:16  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:53:22  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:53:53  [Scan all link tables] Now scanning: $fmr_slav (9 of 27)
26/02/2013 12:53:54  [Insteon::ALDB_i2] DEBUG4: Start of scan; initializing aldb structure
26/02/2013 12:53:55  [Insteon::ALDB_i2] DEBUG3: $fmr_slav [0x0fef] ack not received. ignoring message
26/02/2013 12:54:31  Running: PLM scan all device link tables
26/02/2013 12:54:31  [Scan all linktables] WARN: link already underway. Ignoring request for new scan ...

full log here: http://marc.merlins.org/tmp/print.log.scanfailed


Second scan, I'm not sure how it failed:
26/02/2013 12:57:05  [Scan all link tables] Now scanning: $PLM (1 of 27)
26/02/2013 12:57:48  [Scan all link tables] Now scanning: $mbr_kpl (2 of 27)
26/02/2013 12:58:36  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $mbr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 12:58:40  [Scan all link tables] Now scanning: $mbr_lamp2 (3 of 27)
26/02/2013 12:58:41  [Insteon::ALDB_i2] DEBUG4: Start of scan; initializing aldb structure
26/02/2013 12:58:44  [Insteon::ALDB_i2] DEBUG3: $mbr_lamp2 [0x0fd7] ack not received. ignoring message
26/02/2013 13:00:00  Running: AllDevices Scan
26/02/2013 13:00:26  Running: compcloset fan on
26/02/2013 13:02:00  Running: AllDevices Status

full log here: http://marc.merlins.org/tmp/print.log.scanfailed2

And I then went back to git master, and did a full scan that completed.
Kevin, on the plus side your code changes in master have been working for me so far. Actually
I had issues getting a scan to completion and your new hop count seems to help:

26/02/2013 16:31:44  Running: PLM scan all device link tables
26/02/2013 16:31:44  [Scan all link tables] Now scanning: $PLM (1 of 27)
26/02/2013 16:32:00  Running: AllDevices Status
26/02/2013 16:32:27  [Scan all link tables] Now scanning: $mbr_kpl (2 of 27)
26/02/2013 16:33:19  [Scan all link tables] Now scanning: $mbr_lamp2 (3 of 27)
26/02/2013 16:33:33  [Scan all link tables] Now scanning: $mbr_lamp3 (4 of 27)
26/02/2013 16:33:41  [Scan all link tables] Now scanning: $mbr_outside (5 of 27)
26/02/2013 16:33:52  [Scan all link tables] Now scanning: $br2_sw (6 of 27)
26/02/2013 16:34:08  [Scan all link tables] Now scanning: $br2_lamp (7 of 27)
26/02/2013 16:34:14  [Scan all link tables] Now scanning: $fmr_kpl (8 of 27)
26/02/2013 16:34:41  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 16:34:58  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 16:35:22  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $fmr_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 16:36:34  [Scan all link tables] Now scanning: $fmr_slav (9 of 27)
26/02/2013 16:36:34  [Scan all link tables] Now scanning: $fmr_lamp (10 of 27)
26/02/2013 16:37:04  [Scan all link tables] Now scanning: $fmr_outside (11 of 27)
26/02/2013 16:37:27  [Scan all link tables] Now scanning: $entryway (12 of 27)
26/02/2013 16:37:55  [Scan all link tables] Now scanning: $lvr_kpl (13 of 27)
26/02/2013 16:39:01  [Insteon_PLM] DEBUG2: PLM command timer expired but no transmission in place.  Moving on...
26/02/2013 16:40:32  [Insteon_PLM] DEBUG3: Interface extremely busy. Resending command after delaying for 1 second
26/02/2013 16:40:33  [Insteon::BaseMessage] WARN: now resending obj=$lvr_kpl; command=peek; extra=4F after 1 attempts.
26/02/2013 16:41:32  [Scan all link tables] Now scanning: $lvr_lamp (14 of 27)
26/02/2013 16:41:40  [Insteon::BaseInterface] WARN: received a message from $lvr_lamp in response to a peek command, but the command code d4 is incorrect. Ignorring received message.
26/02/2013 16:41:44  [Scan all link tables] Now scanning: $yard_lights (15 of 27)
26/02/2013 16:41:44  [Insteon::BaseInterface] Warn! Unable to locate object for source: 0a99e6 and group:
26/02/2013 16:41:59  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $yard_lights but cannot correlate to sent message! IGNORING received message!!
26/02/2013 16:42:02  [Scan all link tables] Now scanning: $kitchen_kpl (16 of 27)
26/02/2013 16:43:16  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $kitchen_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 16:43:16  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $kitchen_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 16:43:33  [Insteon_PLM] DEBUG3: Interface extremely busy. Resending command after delaying for 1 second
26/02/2013 16:43:34  [Insteon::BaseMessage] WARN: now resending obj=$kitchen_kpl; command=peek; extra=F6 after 1 attempts.
26/02/2013 16:44:14  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $kitchen_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 16:45:00  Running: AllDevices Scan
26/02/2013 16:45:38  [Insteon::BaseInterface] WARN: received insteon ACK/NACK message from $kitchen_kpl but cannot correlate to sent message! IGNORING received message!!
26/02/2013 16:46:25  [Scan all link tables] Now scanning: $dining_kpl (17 of 27)
26/02/2013 16:46:28  [Insteon_PLM] DEBUG3: Interface extremely busy. Resending command after delaying for 1 second
26/02/2013 16:46:29  [Insteon::BaseMessage] WARN: now resending obj=$dining_kpl; command=peek; extra=F0 after 1 attempts.
26/02/2013 16:47:00  Running: AllDevices Status
26/02/2013 16:49:31  [Insteon::BaseMessage] WARN: now resending obj=$dining_kpl; command=peek; extra=96 after 1 attempts.
26/02/2013 16:49:35  [Insteon::BaseMessage] WARN: now resending obj=$dining_kpl; command=peek; extra=96 after 2 attempts.
26/02/2013 16:49:58  [Scan all link tables] Now scanning: $garage1_neon_kpl (19 of 27)
26/02/2013 16:51:09  [Insteon_PLM] DEBUG3: Interface extremely busy. Resending command after delaying for 1 second
26/02/2013 16:51:10  [Insteon::BaseMessage] WARN: now resending obj=$garage1_neon_kpl; command=peek; extra=43 after 1 attempts.
26/02/2013 16:52:03  [Scan all link tables] Now scanning: $garage2_neon_kpl (20 of 27)
26/02/2013 16:52:15  [Insteon_PLM] DEBUG3: Interface extremely busy. Resending command after delaying for 1 second
26/02/2013 16:52:16  [Insteon::BaseMessage] WARN: now resending obj=$garage2_neon_kpl; command=peek; extra=ED after 1 attempts.
26/02/2013 16:52:34  [Insteon::BaseInterface] WARN: received a message from $garage2_neon_kpl in response to a set_address_msb command, but the command code 2b is incorrect. Ignorring received message.
26/02/2013 16:52:53  [Insteon_PLM] DEBUG3: Interface extremely busy. Resending command after delaying for 1 second
26/02/2013 16:52:54  [Insteon::BaseMessage] WARN: now resending obj=$garage2_neon_kpl; command=peek; extra=91 after 1 attempts.
26/02/2013 16:53:10  [Scan all link tables] Now scanning: $gar_outlights_kpl (21 of 27)
26/02/2013 16:54:02  [Scan all link tables] Now scanning: $iolinc_gardoor2 (22 of 27)
26/02/2013 16:54:07  [Scan all link tables] Now scanning: $iolinc_garside (23 of 27)
26/02/2013 16:54:11  [Scan all link tables] Now scanning: $fan_fmr (24 of 27)
26/02/2013 16:54:12  [Scan all link tables] Now scanning: $fan_mbr (25 of 27)
26/02/2013 16:54:12  [Scan all link tables] Now scanning: $fan_gar_roof (26 of 27)
26/02/2013 16:54:13  [Scan all link tables] Now scanning: $fan_gar_ceiling (27 of 27)
26/02/2013 16:54:14  [Scan all link tables] All tables have completed scanning

full log here: http://marc.merlins.org/tmp/print.log.scanok

Hope this helps.

Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/