|
From: Hoehle, Joerg-C. <Joe...@t-...> - 2005-06-24 14:49:38
|
Hi, is there some reason or mnemotechnic way to understand why Linux mblen() says it returns a number of bytes, while Encoding_mblen() yields a count in character entities?!? Distinctions such as these cause me headaches when reviewing foreign.d w.r.t. encoding issues. I'm quite fed up with it and have headaches now. So far, all I have found is that: 0. lispbibl.d and encoding.d aren't quite explicit enough about the precise interface to the encoding functions. As a result I waste time (and remember having spent a lot of time when I implemented with-foreign-string just to ensure that I got all those mbs/wbs unpack_string_alooca_and_or_ro() right. 1. conversion from c-string outside of 1:1 is bogus because it defers to asciz_to_string(), which assumes a single 0 byte terminator. 2. there seem to be no wmbsh*t_len() that works on unbounded buffers, like strlen() does. They all expect a buffer limit. Of course, one could throw in an artificial max_array_or_string_index_limit * sizeof(character/byte) or what would you suggest?? 3. I'm convinced that what I reported a few days ago under this subject are bugs in CLISP, not on my side. 4. It looks like ENCODING-ZEROES raises its head again. 5. Beside C-STRING, C-ARRAY-MAX also depends on a correct discovery of the end of a string. I believe convert_from_foreign:c_arrray_max to be broken for strings because of this (or at least, it does not do what I would expect). I expect: (c-array-max #([ff fe] 65 0 66 0 67 0 68 0 0 0):utf-16) -> "abcd" Currently, it gives an error. 6. Conversion from c-array-ptr is broken for the same reasons. 7. When the FFI will correctly support arbitrary encodings, the string "must be an ASCII-compatible encoding" shall be omitted from impnotes:with-foreign-string. 8. A work-around (= status quo) may be to declare that the FFI does not support arbitrary encodings, but only ASCII-compatible ones... UTF-8 is in, UTF-16 is out. Regards, Jorg Hohle. |