Re: FFI and non 1:1 encoding -- errors?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

is there some reason or mnemotechnic way to understand why Linux mblen() says it returns a number of bytes, while Encoding_mblen() yields a count in character entities?!?

Distinctions such as these cause me headaches when reviewing foreign.d w.r.t. encoding issues. I'm quite fed up with it and have headaches now.

So far, all I have found is that:

0. lispbibl.d and encoding.d aren't quite explicit enough about the precise interface to the encoding functions. As a result I waste time (and remember having spent a lot of time when I implemented with-foreign-string just to ensure that I got all those mbs/wbs  unpack_string_alooca_and_or_ro() right.

1. conversion from c-string outside of 1:1 is bogus because it defers to asciz_to_string(), which assumes a single 0 byte terminator.

2. there seem to be no wmbsh*t_len() that works on unbounded buffers, like strlen() does. They all expect a buffer limit. Of course, one could throw in an artificial max_array_or_string_index_limit * sizeof(character/byte) or what would you suggest??

3. I'm convinced that what I reported a few days ago under this subject are bugs in CLISP, not on my side.

4. It looks like ENCODING-ZEROES raises its head again.

5. Beside C-STRING, C-ARRAY-MAX also depends on a correct discovery of the end of a string. I believe convert_from_foreign:c_arrray_max to be broken for strings because of this (or at least, it does not do what I would expect).
I expect:
(c-array-max #([ff fe] 65 0 66 0 67 0 68 0 0 0):utf-16) -> "abcd"
Currently, it gives an error.

6. Conversion from c-array-ptr is broken for the same reasons.

7. When the FFI will correctly support arbitrary encodings, the string "must be an ASCII-compatible encoding" shall be omitted from impnotes:with-foreign-string.

8. A work-around (= status quo) may be to declare that the FFI does not support arbitrary encodings, but only ASCII-compatible ones...
UTF-8 is in, UTF-16 is out.

Regards,
	Jorg Hohle.