Re: [gobo-eiffel-develop] Partial code review

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

>>>>> "Eric" == Eric Bezault <er...@go...> writes:

    Eric> I don't understand why you want to export to
    Eric> {UC_UTF8_ROUTINES}.

I thought it would be necessary for routines used as pre-conditions of
routines used within UC_UTF8_ROUTINES.

    Eric> I'm sorry, but when I see that, with two irrelevant
    Eric> arguments, I prefer to have:

    Eric>     is_encoded_next_byte: is_encoded_next_byte (a_byte)

    Eric> If the only purpose of having these last two arguments is
    Eric> for code reuse, I think that I already mentioned before that
    Eric> we should probably have (find a better name):

    Eric>     foobar (a_byte, a_first_byte: CHARACTER;
    Eric> ignore_first_byte: BOOLEAN): BOOLEAN is ...

    Eric> and then:

    Eric>     is_encoded_next_byte (a_byte: CHARACTER): BOOLEAN is do
    Eric> foobar (a_byte, 0, False) end

I was thinking about that too, but I couldn't think of a good name.

    Eric> I'm sorry, but I have a hard time understanding why you
    Eric> changed the signature, and why it was needed. Is
    Eric> `encoded_next_value' correct?

Yes.

    Eric> Would its precondition as
    Eric> stated above (with `is_encoded_next_byte' having only one
    Eric> argument) be correct? 

Yes (for the implementation that you give above, using foobar).

    Eric> Can `foobar' (only used in`valid_utf8' as far as I can see)
    Eric>  be given a meaningful, unambiguous, non-confusing name?

This is where I stumbled yesterday.
The required meaning is:
Is `a_byte' valid as the non-first byte of a UTF-8 encoding for a
character, taking the value of the first byte into consideration if we
are considering the second byte of the sequence, but ignoring the
first byte if we are  considering the third or fourth byte of a
sequence.

Trying to find a meaningful, unambiguous, non-confusing name for all
that is a bit beyond me.

So I now think it is much better to have two routines thus:

is_encoded_next_byte (a_byte: CHARACTER): BOOLEAN is
                -- Is `a_byte' one of the next bytes in UTF-8 encoding?
	do
                	-- 10xxxxxx
                Result := (byte_127 < a_byte and a_byte <= byte_191)
	end

and 

is_encoded_second_byte (a_byte, a_first_byte: CHARACTER): BOOLEAN is
                -- Is `a_byte' a valid second byte in UTF-8 encoding?
	require
		valid_first_byte: is_encoded_first_byte (a_first_byte)
        do
                	-- 10xxxxxx
        	if a_first_byte = byte_224 then
                	Result := (byte_159 < a_byte and a_byte <= byte_191)
		elseif a_first_byte = byte_237 then
			Result := (byte_127 < a_byte and a_byte <= byte_159)
		elseif a_first_byte = byte_240 then
			Result := (byte_143 < a_byte and a_byte <= byte_191)
		elseif a_first_byte = byte_244 then
			Result := (byte_127 < a_byte and a_byte <= byte_143)
		else
			Result := (byte_127 < a_byte and a_byte <= byte_191)
		end
	end

-- 
Colin Adams
Preston Lancashire