Paul Khuong <pvk@...> writes:
> It's been a while, so take this with a grain a salt.
You know a lot more about Common Lisp and SBCL than I do, so take
anything I say with a shovel of salt.
> We really want to avoid ping-ponging between the FP and Integer SIMD
> pipes if we can, so we must track whether a given value is FP or Int, at
> least during compilation. Now, we also like constant folding -- more
> generally, I dislike stuff that only exists as static information,
> without reflecting any runtime reality -- so if we want this (necessary
> for high-performance code) optimisation to work, we now have to track
> FP/Int-ness at runtime.
>  if only for the reg-reg moves we insert to shuffle values around.
>  and otherwise, why would anyone bother with intrinsics?
> Tracking FP/Int-ness at runtime isn't too much of an issue, but it does
> make it really clear that approximating FP/Int-ness with some sort of
> kludgey propagation is a Bad Idea. So, we'll have to involve the regular
> type system/type propagation logic. And this is where it gets hairy.
I wasn't previously aware of the bypass delays between FP and Int
domains - that is, e.g. mixing movapd with addps incurs a latency
penalty - so I was assuming that it could be left to the programmer to
call a typed intrinsic on a generic SSE type. But I wasn't accounting
for your point  where the compiler has to know whether to generate
e.g. movaps vs movdqa, or movups vs movdqu.
I'm not clear though, on why we need to track FP/int at runtime? Can we
not just use static type declarations to generate appropriate move
instructions for the domain, but just default to movaps if no more
information is available and accept the couple of clock cycles cost. The
move instructions will still work, and since the intrinsics are typed
(e.g. sse:add-ps vs sse:add-pi32 from cl-simd) this is up to the user to
> What happens with type union/intersection/negation of typed SIMD
> packets? I'm thinking of mostly replicating the specialised array logic,
> so, hopefully, I can leverage Christophe's brain when things go wrong.
> What kind of interface can we expose to users, so that naïve code works,
> and isn't too horrible, but also for efficient code to remain
> convenient? Half-assing it with static types that correspond to nothing
> at runtime made this step easier to fake, but that doesn't work anymore.
> I believe I settled on an interface such that intrinsics accept
> any-typed SIMD packets, but return typed (specialised) ones. This way,
> naïve users can declare their variables as default (any-typed) packets,
> while benefitting from type propagation, and without having to insert
> explicit FP<->Int casts: conversion from specialised to any-typed
> packets is always OK type-wise, and can be compiled into nothingness.
> Sophisticated users can still declare variables with explicit types:
> it'll help codegen, and lead to compile-time type errors instead of
> invisible pipeline-ping-ponging code. If necessary, they can insert
> *-to-fp and *-to-int casts (that compile into nothing as well). The
> casts could be inserted automatically and compiled away just the same,
> but I'm far from convinced this is a good idea: optional static type
> checking is something I really like about Python.
> Now, I'm not a BDUF fan, but the FP/Int dichotomy is very much an
> artefact of contemporary SSE implementations. Other platforms (ARM I
> believe, and I wouldn't be surprised if PPC were similar but saner) or
> microarchitectures (e.g. I'd expect single and double precision
> operations not to mix some time soon) may well be different. So, I'd
> like to find a simple way to extend the approach I sketched above to a
> more generic set of SIMD types -- and, already, some operations
> distinguish between single/double floats, while, for others, we should
> always pretend the values are single floats, according to my
> optimisation guides. I'm pretty sure we can just add more specialised
> types, as for array types (but that means we can't have packets of
> integers… instead, we'd have an union type, like CL:STRING, and that's
> proven to be somewhat hairy). An incidental upside of the finer SIMD
> type system is that printing could exploit this information; I wouldn't
> wish float-as-hexdump reading skills on anyone.
I'm not qualified to judge how to extend SBCL's type system, but having
a generic SIMD type hierarchy would surely be useful, as it could
provide the infrastructure for e.g. auto-vectorizing loops and reduces.
> There's my roadmap/braindump for the (hard) work remaining. The rest
> mostly involves forward-porting angavrilov's instruction definition
> fixes, and putting a nice lispy interface on front (and this is where
> many more people can get easily involved). The problem is that other
> developers prefer to work on more interesting/useful stuff, and I have
> more pressing responsibilities, mostly related to my wishing to graduate ;)
What is interesting/useful depends upon your view point ;)
I'd be happy to throw in some work on this, but my experience with SBCL
is limited so I'd need some pointers.