#932 Undefined behavior in CheckSum algorithm for 32-bit-integer Perl builds

checksum (1)

I recently migrated some sites along with their log files to a new server, and was curious why AWStats failed to seek to the last record processed when run on the new server against the old log files. Running with debug on revealed that the calculated checksums weren't matching the ones stored in the data files (which had been calculated on the old server).

I noticed in the CheckSum subroutine the left shift of the character value by 8 * $j:

    $checksum += ( ord($c) << ( 8 * $j ) );

But on the next line I see that $j is only being reset to 0 when its value prior to incrementing is greater than 3:

    if ( $j++ > 3 ) { $j = 0; }

Which means that $j will have been 4 before it is reset to 0 (I suspect it was intended to reset after 3), and the previous line will have shifted the character value by 8 * 4 = 32, which is undefined behavior for Perl builds that use 32-bit integers internally.

The reason I got different checksum results on the new server was that it was running a 64-bit Perl build whereas the old server ran a 32-bit build. So on the old server the shift was overflowing Perl's internal integer type and producing implementation-defined behavior (my ActiveState builds were apparently wrapping around the overflow so that a 32-bit shift produced the original value), whereas the shift did not overflow on the new server and the entire resulting 40-bit value was added to the checksum there.

This also means that on Perl builds that don't wrap shift overflow and just produce 0 from such a shift, CheckSum effectively ignores every 5th character in its calculations.

Not a big deal in practice, just thought I'd point it out.