Re: [Sbcl-help] string type uncertainty?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Pascal Bourguignon <pj...@in...> writes:

> Christophe Rhodes writes:
>> Pascal Bourguignon <pj...@in...> writes:
>> > In particular, what type uncertainty is there in being a
>> >  (VECTOR (ARRAY BASE-CHAR)) ?
>> 
>> There are two kinds of type uncertainty in that type, both of them
>> fairly fundamental (i.e. common to all CL implementations).
>> 
>> The first is that it is uncertain whether an object of type 
>> (VECTOR (ARRAY BASE-CHAR)) is a SIMPLE-ARRAY or not.  Knowing that an
>> object is definitely a simple-array can lead to significant
>> performance improvements; unfortunately, there is no way of
>> expressing, currently, "I am aware that I may pay a price for the
>> genericity of having this code work on all vectors, not just simple
>> arrays of rank 1".  So some implementations shut up about it, others
>> don't.  (Maybe we should at default compilation settings, I don't
>> know).
>
> To see why this behavior is obnoxious, try to compile this:
>
>    (defun test-a (stuff)
>       (declare (type t stuff))
>       (cond
>         ((simple-vector-p stuff) (aref (the simple-vector stuff) 0))
>         ((vectorp         stuff) (aref (the vector        stuff) 0))
>         (t nil)))
>
> It looks like  sbcl is incapable of (or at  least, can't bear) working
> with anything else than simple arrays.  

This is an unfair test of sbcl's type engine, because SIMPLE-VECTOR
means (SIMPLE-ARRAY T (*)), while VECTOR means (ARRAY * (*)).  Thus,
there are simple array types that are not caught by the first clause.

However, it turns out that even a properly adjusted test
  (defun test-a (stuff)
    (cond
      ((typep stuff 'simple-vector) (aref stuff 0))
      ((typep stuff '(array t (*))) (aref stuff 0))
      (t nil)))
raises efficiency notes, when really it shouldn't [because in the
second branch STUFF should be known to be 
(AND (ARRAY T (*)) (NOT SIMPLE-VECTOR))], but isn't.  

> Well, why stop  here?  Since it's much more  efficient to do additions
> with  fixnums,  why not  issue  notices  when  we use  floating  point
> numbers?
>
> And  why not  when you  use a  string (even  a simple-string)  issue a
> notice remembering that it would be much more efficient to avoid those
> long arrays and just use  fixnums.  After all, strings can't be stored
> in registers while fixnum can.

Good point.  Thank you for this suggestion; I'll be implementing it as
soon as possible.

In all seriousness, the diagnostics that sbcl emits come from a
hodgepodge of heuristics that have accumulated over time.  Some of
them are no longer appropriate at all; others are less important than
they once were.  I think what there is a need for, really, is some way
of selectively ignoring the diagnostics; this would solve not only
this specific problem that you are having, but also similar problems
elsewhere.

> It seems  to me  that the  solution is simple:  just believe  what the
> programmer writes!

This might be one of those solutions that is simple in conception but
tricky in execution.  I don't know; I haven't looked at it.

>> The second is significantly worse: since the
>> UPGRADED-ARRAY-ELEMENT-TYPE of (ARRAY BASE-CHAR) is T, the declaration
>> (VECTOR (ARRAY BASE-CHAR)) means _precisely_ the same as the
>> declaration (VECTOR T).  In other words, as mandated by ANSI, you can
>> store absolutely anything in that vector, not just arrays of type
>> BASE-CHAR.  Given this, the compiler is unable to infer how to do the
>> second dereference [in (AREF (AREF ...) ...)], because the result of
>> the first AREF could be anything.
>
> I   understand  that  UPGRADED-ARRAY-ELEMENT-TYPE   is  implementation
> dependant, but why does sbcl upgrade  a (ARRAY BASE-CHAR) to T and not
> to (ARRAY BASE-CHAR) ?

Because the implementation does not provide a representation of arrays
specialized to holding arrays of BASE-CHAR.

As an example: consider the type (SINGLE-FLOAT 0.0 1.0).  If you ask
sbcl to create an array :ELEMENT-TYPE '(SINGLE-FLOAT 0.0 1.0), it will
give you back an array specialized to hold general SINGLE-FLOATs, the
layout of which looks something like this in memory:

  [ header | length | element0 | element1 | element2 ]

where the element<n>s are unboxed single floats (i.e. they don't have
any header; they are just the bare IEEE single float in bits).  This
is what is meant by a specialized array on some type.

You might wish to consider what you would need to do to provide a
specialization of array to hold objects of type (ARRAY BASE-CHAR).

> Is there a declaration I could add to specify that all the elements of
> my array are of type (ARRAY BASE-CHAR) ?

No.

> Would adding THE (ARRAY CHARACTER *) in the right places do?

Yes.

Cheers,

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)

Re: [Sbcl-help] string type uncertainty?

Common Lisp compiler and runtime

Re: [Sbcl-help] string type uncertainty?