Todd Sabin <tsabin@...> writes:
> Christophe Rhodes <csr21@...> writes:
>> The branch on which I've been doing the work to get (vector nil) be
>> recognized as a string type appears to be nearing its end. By my
>> measurements, I can't tell any difference in performance in
>> self-compilation; of course, this isn't necessarily other people's
>> typical workload. To mitigate the potential performance pain that
>> might otherwise appear because STRING declarations no longer mean
>> (VECTOR BASE-CHAR) but rather (OR (VECTOR BASE-CHAR) (VECTOR NIL)),
>> transforms for SIMPLE-STRING element accesses have been implemented,
>> though there will still be a (small) performance benefit for
>> declarating an object that is known to be a SIMPLE-BASE-STRING as
>> such, as the compiler can then elide the test for (VECTOR NIL).
> Can you quantify the (small) performance benefit? Also, does that
> mean that most/all code that uses strings is going to be slower now,
> or that if you were previously getting a performance boost through
> declaring things as strings you'll now get less of a boost, or
> something else?
The small performance benefit, relative to simply declaring something
as a SIMPLE-STRING, is roughly six instructions and a (cached) memory
access -- essentially, a type test. And what I mean is that things
that are declared as STRINGs will now be six instructions and a cached
memory access slower than they were previously, and things declared as
BASE-STRINGs will have unchanged performance characteristics.
> Assuming there is some non-negligible slowdown, this seems like a
> questionable change to me. Would it really be so bad to just
> implement (or rather, leave things) as if the string type were defined
> This denotes the union of all types (array c (size)) for all
> <<non-empty>> subtypes c of character; that is, the set of strings
> of size size.
I don't know.
Part of the point of having development versions, and CVS access, and
users, is to get feedback on this kind of thing. I can't measure the
slowdown on my typical use; if someone _can_ measure a slowdown on
theirs, then obviously I want to know about it, to see if it's a
fundamental slowdown in an application-vital inner loop or if it can
be mitigated, either with a small amount of uglification in the user
code (more specific declarations) or else with compiler enhancements.
Note that ordinary compiler enhancements such as loop-invariant
lifting would have a great effect, I suspect, on such usage patterns
(because the type of an array is a loop invariant :-); similarly,
implementing SB-KERNEL:HAIRY-DATA-VECTOR-REF using a computed goto
would win massively over the TYPECASE we have at the moment.
The other point is that this slowdown, if slowdown there is, is
fundamental to having more than one string type. Since I believe
fairly firmly that Unicode is the future, but also that ascii-like
strings aren't going away either, I believe that STRING is almost
certain to turn into (OR (VECTOR CHARACTER) (VECTOR BASE-CHAR)) in any
case, irrespective of the (VECTOR NIL) issue, so we have to deal with
> OOC, if you are going to treat (vector nil) as a subtype of string,
> then what happens in the following code:
> (declaim (optimize (safety 3)))
> (intern (make-array 0 :element-type nil))
> or passing something of type (vector nil) to any of the standard
> functions that accept string parameters?
It works as expected, and
(eq (intern (make-array 0 :element-type nil))
(intern (make-array 0 :element-type 'character)))
returns true, because "" and (make-array 0 :element-type nil) are
STRING=, as they are strings with every element EQL. If the string
operation requires accessing the contents, then an error (of type
NIL-ARRAY-ACCESSED) will be signalled.
http://www-jcsu.jesus.cam.ac.uk/~csr21/ +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%") (pprint #36rJesusCollegeCambridge)