From: Christophe R. <cs...@ca...> - 2003-06-25 17:56:01
|
Hi, The branch on which I've been doing the work to get (vector nil) be recognized as a string type appears to be nearing its end. By my measurements, I can't tell any difference in performance in self-compilation; of course, this isn't necessarily other people's typical workload. To mitigate the potential performance pain that might otherwise appear because STRING declarations no longer mean (VECTOR BASE-CHAR) but rather (OR (VECTOR BASE-CHAR) (VECTOR NIL)), transforms for SIMPLE-STRING element accesses have been implemented, though there will still be a (small) performance benefit for declarating an object that is known to be a SIMPLE-BASE-STRING as such, as the compiler can then elide the test for (VECTOR NIL). In the process of working on this, I have consolidated much of the knowledge about array types scattered around the system into one place, an extended *SPECIALIZED-ARRAY-ELEMENT-TYPE-PROPERTIES* vector, which is then used to generate the code in various places either at macroexpansion-time or at read-time. No fewer than 13 separate places in the code have undergone this treatment, so maybe the next time array types need to be examined (for instance, for Unicode, or perhaps for the ANSI-required (UNSIGNED-BYTE {7,15,31}) specialized arrays) it'll be less work. Off the top of my head, to add a new specialization one will now need to * edit the master list in src/compiler/generic/vm-array.lisp; * export some predicate and error symbols in package-data-list.lisp-expr; * implement the required array behaviour in src/compiler/target/array.lisp (times $NARCH); * defknown and define an interpreter stub for the predicate, probably in src/compiler/generic/vm-{type,fndb}.lisp; * add a class in src/code/class.lisp; * inform the gcs and purify about it (src/runtime/{cheney-,gen}gc.c, src/runtime/purify.c). This obviously isn't ideal, but it's less non-ideal than it was a couple of weeks ago. Also some (but by no means all) bugs that would have been exposed by a Unicodish (VECTOR CHARACTER) distinct from (VECTOR BASE-CHAR) have been fixed. I have one more change to check in (once it builds :-), to the internal error defining logic; what is in CVS (tag vector_nil_string_branch) is representative, so if there are any comments, now is the time to make them. I'll probably merge the branch tomorrow unless I hear feedback to the contrary. Cheers, Christophe -- http://www-jcsu.jesus.cam.ac.uk/~csr21/ +44 1223 510 299/+44 7729 383 757 (set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b))) (defvar b "~&Just another Lisp hacker~%") (pprint #36rJesusCollegeCambridge) |
From: Todd S. <ts...@op...> - 2003-06-25 21:03:31
|
Christophe Rhodes <cs...@ca...> writes: > Hi, > > The branch on which I've been doing the work to get (vector nil) be > recognized as a string type appears to be nearing its end. By my > measurements, I can't tell any difference in performance in > self-compilation; of course, this isn't necessarily other people's > typical workload. To mitigate the potential performance pain that > might otherwise appear because STRING declarations no longer mean > (VECTOR BASE-CHAR) but rather (OR (VECTOR BASE-CHAR) (VECTOR NIL)), > transforms for SIMPLE-STRING element accesses have been implemented, > though there will still be a (small) performance benefit for > declarating an object that is known to be a SIMPLE-BASE-STRING as > such, as the compiler can then elide the test for (VECTOR NIL). Can you quantify the (small) performance benefit? Also, does that mean that most/all code that uses strings is going to be slower now, or that if you were previously getting a performance boost through declaring things as strings you'll now get less of a boost, or something else? Assuming there is some non-negligible slowdown, this seems like a questionable change to me. Would it really be so bad to just implement (or rather, leave things) as if the string type were defined as: This denotes the union of all types (array c (size)) for all <<non-empty>> subtypes c of character; that is, the set of strings of size size. OOC, if you are going to treat (vector nil) as a subtype of string, then what happens in the following code: (declaim (optimize (safety 3))) (intern (make-array 0 :element-type nil)) or passing something of type (vector nil) to any of the standard functions that accept string parameters? -- Todd Sabin <ts...@op...> |
From: Christophe R. <cs...@ca...> - 2003-06-25 21:21:26
|
Todd Sabin <ts...@op...> writes: > Christophe Rhodes <cs...@ca...> writes: > >> Hi, >> >> The branch on which I've been doing the work to get (vector nil) be >> recognized as a string type appears to be nearing its end. By my >> measurements, I can't tell any difference in performance in >> self-compilation; of course, this isn't necessarily other people's >> typical workload. To mitigate the potential performance pain that >> might otherwise appear because STRING declarations no longer mean >> (VECTOR BASE-CHAR) but rather (OR (VECTOR BASE-CHAR) (VECTOR NIL)), >> transforms for SIMPLE-STRING element accesses have been implemented, >> though there will still be a (small) performance benefit for >> declarating an object that is known to be a SIMPLE-BASE-STRING as >> such, as the compiler can then elide the test for (VECTOR NIL). > > Can you quantify the (small) performance benefit? Also, does that > mean that most/all code that uses strings is going to be slower now, > or that if you were previously getting a performance boost through > declaring things as strings you'll now get less of a boost, or > something else? The small performance benefit, relative to simply declaring something as a SIMPLE-STRING, is roughly six instructions and a (cached) memory access -- essentially, a type test. And what I mean is that things that are declared as STRINGs will now be six instructions and a cached memory access slower than they were previously, and things declared as BASE-STRINGs will have unchanged performance characteristics. > Assuming there is some non-negligible slowdown, this seems like a > questionable change to me. Would it really be so bad to just > implement (or rather, leave things) as if the string type were defined > as: > > This denotes the union of all types (array c (size)) for all > <<non-empty>> subtypes c of character; that is, the set of strings > of size size. I don't know. Part of the point of having development versions, and CVS access, and users, is to get feedback on this kind of thing. I can't measure the slowdown on my typical use; if someone _can_ measure a slowdown on theirs, then obviously I want to know about it, to see if it's a fundamental slowdown in an application-vital inner loop or if it can be mitigated, either with a small amount of uglification in the user code (more specific declarations) or else with compiler enhancements. Note that ordinary compiler enhancements such as loop-invariant lifting would have a great effect, I suspect, on such usage patterns (because the type of an array is a loop invariant :-); similarly, implementing SB-KERNEL:HAIRY-DATA-VECTOR-REF using a computed goto would win massively over the TYPECASE we have at the moment. The other point is that this slowdown, if slowdown there is, is fundamental to having more than one string type. Since I believe fairly firmly that Unicode is the future, but also that ascii-like strings aren't going away either, I believe that STRING is almost certain to turn into (OR (VECTOR CHARACTER) (VECTOR BASE-CHAR)) in any case, irrespective of the (VECTOR NIL) issue, so we have to deal with this anyway. > OOC, if you are going to treat (vector nil) as a subtype of string, > then what happens in the following code: > > (declaim (optimize (safety 3))) > > (intern (make-array 0 :element-type nil)) > > or passing something of type (vector nil) to any of the standard > functions that accept string parameters? It works as expected, and (eq (intern (make-array 0 :element-type nil)) (intern (make-array 0 :element-type 'character))) returns true, because "" and (make-array 0 :element-type nil) are STRING=, as they are strings with every element EQL. If the string operation requires accessing the contents, then an error (of type NIL-ARRAY-ACCESSED) will be signalled. Cheers, Christophe -- http://www-jcsu.jesus.cam.ac.uk/~csr21/ +44 1223 510 299/+44 7729 383 757 (set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b))) (defvar b "~&Just another Lisp hacker~%") (pprint #36rJesusCollegeCambridge) |