From: Hoehle, Joerg-C. <Joe...@t-...> - 2002-04-24 12:08:08
|
Hi, I wrote: > CLISP's FFI doesn't correctly support UNICODE anyway. That's > another one on my growing FFI-limits list. I've been thinking about WITH-FOREIGN-STRING. Lispworks has it, and I can provide a very similar macro. However, I'm unsettled about the exact foreign-string representation and type in Lisp and would like to poll opinions. (with-foreign-string (fos ... :encoding custom:*default-file-encoding*) (describe fos) 0) -> outputs ?? The :encoding keyword works around the FFI restriction on C-STRING being limited to 1:1 mappings. Even UTF-16 can be converted. The drawback is that the foreign function's parameter cannot be declared as having type C-STRING anymore. Note that strictly speaking, you can already use the FFI with arbitrary string encodings today, using EXT:CONVERT-STRING-TO-BYTES and working with (ffi:c-array-ptr uint8) instead of c-string parameters (with care if the foreign side does uint16-wide access to an array terminated by a single uint8 zero...). So am I wasting my time on a speed&garbage optimization issue again? o What would be an appropriate parameter type declaration? a) C-POINTER b) C-ARRAY-PTR uint8 c) something else o What would be the Lisp object representing the foreign string (fos above): x) FOREIGN-ADDRESS y) FOREIGN-VARIABLE If y), what would be the type of the FOREIGN-VARIABLE object y1) (C-ARRAY uint8 <bytesize>) y2) uint8 mostly, except (C-ARRAY uint16 <elementsize) if encoding happens to be an exact 16bit encoding (which I don't know yet how to detect). Future code: ;; This is better than ffi:c-string because there's no 1:1 limit like on custom:*foreign-encoding*, ;; so Unicode strings can be converted. (defmacro with-foreign-string ((foreign-variable element-count byte-count &key (encoding 'custom:*foreign-encoding*) (null-terminated-p 'T)) string &body body) `(ffi::exec-with-string (lambda (,foreign-variable ,element-count ,byte-count) ,@body) ,string ,encoding ,null-terminated-p)) ;; TODO what element-type? automatically derive uint16 for real unicode? ;; (c-array or c-array-max? uint8-or-16? N) ;; TODO with_string_0 may not be enough, resp. buggy: need bytelen+1+1 for 0-terminated uint16 array! My thoughts: I currently tend to favour FOREIGN-ADDRESS (=> C-POINTER type declaration): i) easy to implement ii) constant, well-understood behaviour However: j) FOREIGN-VARIABLE would be consistent with the WITH-FOREIGN-OBJECT macro, but the unresolved uint8/uint16 problem makes me avoid that. The problem with automatic unit8/uint16 recognition is that the programmer writing the FFI code may not have control over the actual encoding used by the user, thus not know what the actual type of the variables s/he manipulates is. OTOH, using foreign-variables would allow access to CLISP's functions like (SETF (element foreign-string 25) #xabcd) when the programmer is sure it's UTF-16 (or any other fixed 16bit encoding, if such exists). The programmer may still create a foreign-variable him/herself, using (FOREIGN-ADDRESS-VARIABLE fso-as-foreign-address (parse-c-type `(c-array uint8 ,bytesize))) -- which I have not yet written (a trivial exercise). > I *highly* welcome proposals for how an API which works with > unicode would look like. > I've been thinking about things like > (with-c-string (varname "foobar" &optional charset-or?-encoding) > &body ...) > -> FOREIGN-VARIABLE of type (C-ARRAY > FFI:uint8-or-character-or-what? <len> > (FFI::FOREIGN-SIZE *) -> length in 8bit bytes > or something closer to make-array with &key :length > :initial-contents ... > It doesn't tell what DEF-x-CALL-OUT for such a function would > look like. > > Is using vectors of (unsigned-byte 8) resp. (c-array uint8 n) all > that's needed by people? [more snipped] To be less abstract, here is hypothetical code (yet another version of ZLIB compression for CL-PDF): (ffi:def-lib-call-out zlib-compress-string *zlib* (:name "compress") (:arguments (dest ffi:c-pointer :in) (destlen (ffi:c-ptr ffi:ulong) :in-out) (source ffi:c-pointer) ; cannot use c-string with arbitrary Unicode (sourcelen ffi:ulong)) (:return-type ffi:int) (:language :stdc)) (defun compress-string (astring) "Compress the string SOURCE. Returns an array of bytes representing the compressed data." ;; works with strings of arbitrary encodings (with-foreign-string (source elemsize sourcelen :encoding pdf::+external-format+ :null-terminated-p NIL) astring (declare (ignore elemsize)) (let* ((destlen (+ 12 (ceiling (* sourcelen 1.05))))) ;; Using CLISP's symbol-macro based interface (ffi:with-c-var (dest `(c-array uint8 ,destlen)) ; no init (multiple-value-bind (status actual) (zlib-compress-string (ffi:c-var-address dest) destlen source sourcelen) (if (zerop status) ;;(subseq dest 0 actual) ;;ffi:cast not usable because of different size... (ffi:offset dest 0 `(c-array uint8 ,actual)) (error "zlib error, code ~d" status))))))) WITH-FOREIGN-STRING is nice, but there's also a need for the converse, foreign-address -> Lisp string... Thanks for your comments, Jorg Hohle. |