On Fri, Feb 3, 2012 at 1:40 AM, Raymond Toy <toy.raymond@gmail.com> wrote:
1. You have to check after reading the string to see what it contains.  (I guess a very small compile-time cost.)

Indeed the cost is very small and can be included in the routine that reads the string into the buffer.
 
2. Because I didn't think any lisp did that, but it's not illegal to do so.
3. It's a burden on the user if the type of a constant string depends on what's in it.  Being illiterate, I only know ASCII, so, perhaps this isn't a problem in practice.

I implemented this because after I introduced Unicode all programs began using 4 times more memory than non-unicode versions of it. It is natural: symbols, strings, code, all data can be either base-string or extended-strings and if the core does not try to save space, everything defaults to the most expensive version.

In practice this should never be a problem.

* Constants do not need declarations in any of the lisps I know. I mean, in your fortran2cl code, (LET ((A "fooo")) ...) when the variable A is not modified, immediately tells the compiler that it can replace the variable with a constant.

* Constants are not meant to be overwritten, never. This is very clear in the spec. From that point of view, the user should not care whether the constant is a simple array or not, or whether it contains one type of elements or another. If you need modifiable arrays in the fortran code, then those LET statements should not contain assignments of constants, but rather a (copy-seq 'string "whateverconstantyouused").

* The user should not expect one or another type of array from a constant that is read in an non-readable form. More precisely, "aaaa" does not specify anything about the array type. The array forms #A do. I understand this is a problem with the ANSI specification, which states explicitly that *print-readably* cannot affect how strings and symbols are printed :-/ That is unfortunate and probable arises from a balance between readability and utility of the printed output.

I had a look at f2cl's code and the following code would more or less fix it. There might be simpler ways, such as looking only at PARAMETER statements, but my fortran is a bit rusty and I do not know f2cl so well. Note also that one possible optimization could be to use LOAD-TIME-VALUE around COERCE, for those lisps that would not precompute the COERCE statement.

diff --git a/src/f2cl1.l b/src/f2cl1.l
index 87b7ceb..907750d 100644
--- a/src/f2cl1.l
+++ b/src/f2cl1.l
@@ -1075,7 +1075,20 @@ correctly"
     (if *common-block-file*
  (do-file)
  (do-output outport))))
-    
+
+(defun fixup-unicode-strings (s)
+  (labels ((fixup-inner (item)
+     (cond ((stringp item)
+    (if (typep item '(array character (*)))
+ item
+ `(coerce ,item '(array character (*)))))
+   ((atom item)
+    item)
+   ((consp item)
+    (mapcar #'fixup-inner item)))))
+    (if (subtypep 'character 'base-char)
+ s
+ (fixup-inner s))))
 
 (defun translate-and-write-subprog (prog-list outport output-path
     declaim package options)
@@ -1151,7 +1164,7 @@ correctly"
        ;; functions, in case the Fortran code has declared them as
        ;; external.
        (setf fun (fixup-f2cl-lib fun (cons (cadr fort-fun) *external-function-names*)))
-       
+       (setf fun (fixup-unicode-strings fun))
        (special-print fun outport)
        (format outport "~2&(in-package #-gcl #:cl-user #+gcl \"CL-USER\")~%#+#.(cl:if (cl:find-package '#:f2cl) '(and) '(or))~%")
        (let* ((*package* (find-package '#:cl-user))


--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com