Menu

#710 StringIndexOutOfBounds in normalize-unicode()

v8.9
closed
5
2012-10-08
2007-07-18
Michael Kay
No

There is a problem in normalize-unicode(), which is also likely to occur if normalization-form="NFD" is selected in the serializer.

When normalizing to decomposed normal form (NFD) if the input contains a combining character (such as x0304) immediately after a non-BMP character (such as x1D4AE), a StringIndexOutOfBounds exception occurs.

Source patched in Subversion, module net.sf.saxon.codenorm.Normalizer

Discussion

  • Michael Kay

    Michael Kay - 2007-07-18

    Logged In: YES
    user_id=251681
    Originator: YES

    Because of the priority of this fix for an important customer, a patch has also been created for Saxon 8.8 in Subversion (module net.sf.saxon.codenorm.Normalizer)

     
  • Michael Kay

    Michael Kay - 2007-08-03

    Logged In: YES
    user_id=251681
    Originator: YES

    There is a further problem that occurs under the same input conditions; this time no exception occurs, but the resulting string is not well-formed UTF16, that is, it contains incorrect bytes in the surrogate pair range. A further patch will be placed in Subversion on both 8.9 and 8.8 branches.

     
  • Michael Kay

    Michael Kay - 2007-11-04

    Logged In: YES
    user_id=251681
    Originator: YES

    Fixed in 9.0.0.1