Saxon's implementation of the Unicode codepoint
collation actually compares UTF-16 code units rather
than Unicode codepoint values. This leads to incorrect
results when comparing a character in the range
56320-65535 with a character outside the BMP (that is,
greater than 65535). Because the non-BMP character is
represented by a pair of code units (a "surrogate
pair") of which the first one is less than 56320, the
non-BMP character collates as "less than" the other
character. For example, the result of the expression
"" lt "𑅰"
is incorrectly returned as false.
The problem does not affect equality comparisons.