Colin Paul Adams:
> It looks to me like the oasis tests need re-thinking:
> assert_invalid calls parse_from_string, passing it as an argument
> a call to new_unicode_string_from_utf8 with invalid utf8 as it's
> argument, which is contrary to it's precondition.
> This looks odd, as parse_from_string DOESN'T have a pre-condition of
> valid utf8.
> So I'm guessing that if I remove the call to
> new_unicode_string_from_utf8, then the tests might start working.
No, parse_from_string takes DECODED strings, that is a list of unicode
character codes. Let's assume a model parser is constructed thus:
XML byte stream
1: decoding, e.g. latin-n, UTF-xx
stream of unicode characters
2: parser, of markup etc
Conceptually, parse_from_string feeds stuff into (2) directly.
parse_from_stream feeds into (1). So parse_from_string's STRING
is not at all in UTF8 and therefore no precondition is required.
Of course in practice, (1) the encoding and markup are intertwined
so the layering is not that clean (2) the current parser is not
implemented that way, but this need not be very visible at the
XM_PARSER interface level.
Back to our test, the test driver itself should check the UTF8
validity before calling new_unicode ... OR use parse_from_stream
via a STRING-based stream, the latter being the better solution I
think, as it will exercise the UTF8 validation within the
parser ("step 1" above).
The comments to parse_from_string and parse_from_stream are
seriously insufficient. I thought I had documented this, as
we went through this before I think. I'll do this.