From: Colin P. A. <co...@co...> - 2005-11-18 06:05:20
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> I don't understand why you want to export to Eric> {UC_UTF8_ROUTINES}. I thought it would be necessary for routines used as pre-conditions of routines used within UC_UTF8_ROUTINES. Eric> I'm sorry, but when I see that, with two irrelevant Eric> arguments, I prefer to have: Eric> is_encoded_next_byte: is_encoded_next_byte (a_byte) Eric> If the only purpose of having these last two arguments is Eric> for code reuse, I think that I already mentioned before that Eric> we should probably have (find a better name): Eric> foobar (a_byte, a_first_byte: CHARACTER; Eric> ignore_first_byte: BOOLEAN): BOOLEAN is ... Eric> and then: Eric> is_encoded_next_byte (a_byte: CHARACTER): BOOLEAN is do Eric> foobar (a_byte, 0, False) end I was thinking about that too, but I couldn't think of a good name. Eric> I'm sorry, but I have a hard time understanding why you Eric> changed the signature, and why it was needed. Is Eric> `encoded_next_value' correct? Yes. Eric> Would its precondition as Eric> stated above (with `is_encoded_next_byte' having only one Eric> argument) be correct? Yes (for the implementation that you give above, using foobar). Eric> Can `foobar' (only used in`valid_utf8' as far as I can see) Eric> be given a meaningful, unambiguous, non-confusing name? This is where I stumbled yesterday. The required meaning is: Is `a_byte' valid as the non-first byte of a UTF-8 encoding for a character, taking the value of the first byte into consideration if we are considering the second byte of the sequence, but ignoring the first byte if we are considering the third or fourth byte of a sequence. Trying to find a meaningful, unambiguous, non-confusing name for all that is a bit beyond me. So I now think it is much better to have two routines thus: is_encoded_next_byte (a_byte: CHARACTER): BOOLEAN is -- Is `a_byte' one of the next bytes in UTF-8 encoding? do -- 10xxxxxx Result := (byte_127 < a_byte and a_byte <= byte_191) end and is_encoded_second_byte (a_byte, a_first_byte: CHARACTER): BOOLEAN is -- Is `a_byte' a valid second byte in UTF-8 encoding? require valid_first_byte: is_encoded_first_byte (a_first_byte) do -- 10xxxxxx if a_first_byte = byte_224 then Result := (byte_159 < a_byte and a_byte <= byte_191) elseif a_first_byte = byte_237 then Result := (byte_127 < a_byte and a_byte <= byte_159) elseif a_first_byte = byte_240 then Result := (byte_143 < a_byte and a_byte <= byte_191) elseif a_first_byte = byte_244 then Result := (byte_127 < a_byte and a_byte <= byte_143) else Result := (byte_127 < a_byte and a_byte <= byte_191) end end -- Colin Adams Preston Lancashire |