Menu

#1 Enable correct checksums for UTF-8 encoded data

closed
nobody
None
5
2010-01-07
2010-01-04
Dan Scott
No

The SIP2 specification states "The checksum is four ASCII character digits representing the binary sum of the characters including the
first character of the transmission and up to and including the checksum field identifier characters."

The specification does not explicitly define what it means by a character, although elsewhere it defines the contents of a packet as "In general the packet contains only ASCII characters." In terms of encoding, it states "The default character set will be English 850 (as defined in the Microsoft MS DOS manual). If another character set is required, the SC and the ACS must mutually define the character set."

We have a SIP client - a current generation 3M self-check unit - that offers UTF8 as one of the character sets. As the rest of Evergreen already speaks UTF8, and our collection contains a great deal of material outside of the plain ASCII range (including characters that do not fit into English 850), this seems like the best choice for us.

In practice, however, the current checksum implementation in Sip::Checksum.pm produces incorrect checksums - at least as implemented by our 3M self-check. The openncip code, using the OpenILS::SIP implementation for Evergreen, was generating the following packet:

64 Y 02420091223 110643000000000004000000000000AA00007001049233|AEDaniel Brent Scott|BHUSD|BV0.00|BD564 Kaireen Street Sudbury Ontario UNKNOWN P3E 5R6|BEdscott@laurentian.ca|BFExt 3315|AQOSUL|AULe Siècle de Louis XIV|AUEmerging technologies for academic libraries in the digital age |AUKnights of the black and white |AUBroken wings|BLY|PCFaculty|PIFiltered|AFOK|AOconifer|AY5AZ8415

This resulted in an incorrect checksum error, several retries with the same result, and finally our self-check client giving up.

I believe the checksum error is because the current algorithm in Sip::Checksum.pm calculates the value of Unicode characters (%U) instead of bytes (%C). Using the attached patch generates the following packet & checksum which the 3M self-check unit is happy with:

64 Y 02420091223 110643000000000004000000000000AA00007001049233|AEDaniel Brent Scott|BHUSD|BV0.00|BD564 Kaireen Street Sudbury Ontario UNKNOWN P3E 5R6|BEdscott@laurentian.ca|BFExt 3315|AQOSUL|AULe Siècle de Louis XIV|AUEmerging technologies for academic libraries in the digital age |AUKnights of the black and white |AUBroken wings|BLY|PCFaculty|PIFiltered|AFOK|AOconifer|AY5AZ825E

Discussion

  • David Fiander

    David Fiander - 2010-01-07

    Applied.

     
  • David Fiander

    David Fiander - 2010-01-07
    • status: open --> closed
     

Log in to post a comment.