David Lichteblau <david@...> writes:
>> 1. A committer takes active hand and starts working on getting this
>> stuff in. I've been hoping to do just that, but it hasn't happened,
>> and looking at my schedule doesn't seem all that likely to happen
> I will probably commit things split up as follows:
> 1. A new optional Lisp feature (most likely called SB-SAFEPOINT).
> If enabled together with SB-THREAD, it replaces SIG_STOP_FOR_GC with
> a safepoint-based mechanism using Dmitry's two-phase stop-the-world
> algorithm. The patch will not include Windows-specific changes and
> should work on x86 (if I get around to it also amd64) on threading
> enabled platforms, in particular Linux.
First, I'd be very glad to see any motion toward integration of Win32
threads into official SBCL. Of course, I have some ideas on how to do
it; if anything written below contradicts the intentions of SBCL
maintainers, _please_ don't take it as an objection against the
integration effort: I'd prefer my thoughts to be ignored than the
integration to be delayed.
* TLS-63 versus TIB Arbitrary Data Slot:
- Dmitry and I came to agreement quickly in our preference for the
former (he has pulled the patch, so his fork uses TLS-63 too). I
haven't heard any arguments for rolling back to TIB Arbitrary Data
(BTW, the only difference between the two is a small initialization
procedure required for TLS-63 allocation; for SBCL compiler and for
all other code in the runtime, nothing changes except the constant).
It's interesting that ClozureCL developers recently switched to the
same model of TLS allocation for Windows -- to make their 32-bit
builds runnable on x64 (they allocate last 32 slots in the directly
accessible range [0-63]).
* Per-thread vs. Global safepoints:
- Of course, as long as our GC requires stopping the world, there is
no reason to have thread-local safepoints. That's why no such thing
exists in my code, as well as in Dmitry's code. However, I have
added a per-thread page with adjusted memory protection; this fact
may indeed be perceived as a movement toward per-thread safepoints,
so I decided to explain the idea behind it.
When FFI comes into play, we have to deal not only with safepoints,
but with an important notion of _safe regions_. Consider a thread in
a long foreign call. When other thread initiates garbage collection,
there is no point in waiting for the foreign call to complete:
external C code has nothing to do with GENCGC heap, so GC may work
concurrently. For this idea to work, however, it's required to have
a per-thread flag reflecting the thread's in-foreign-call status.
We also have to ensure that its updates are globally visible on
multicore (SMP) systems.
There is a paper describing a very similar problem domain. Foreign
calls and GC safety are given much attention there. The authors were
working on Java, not on Common Lisp implementation, but many issues
are unaffected by this difference:
I've finally understood that page protection may be used not only
for fast "conditional trap" instructions, but for data access
synchronization: when a location is updated by one thread most of
the time, _and_ infrequently examined by another thread (requiring
memory visibility guarantees), "fast path" of the former thread may
avoid atomic instructions and memory barriers, if the latter thread
just makes the location read-only. The paper mentioned above
provides a good explanation why we may rely on OS memory management
syscalls to complete all necessary IPI and TLB flushes when memory
protection is "tightened".
One of my goals was getting rid of SuspendThread() and
GetThreadContext() altogether: this functions, officially intended
for debugging only, had some unfortunate side-effects on Wine and
some scary bug-producing potential on real Windows (their effect on
performance is less important, but it may become an issue for an
application where most CPU load is generated in foreign calls). It
was achieved by (1) using topmost-Lisp-ESP as a truth-value for my
in-foreign-call flag; and (2) adding some temporaries to FFI
call-out VOP, making SBCL compiler to save all live registers. This
way, all GENCGC conservatism is reduced to scanning a stack, with
the exact range bounds available without any inter-thread
communication. When a per-thread page (containing ESP value) is made
read-only, it guarantees _both_ that GENCGC doesn't see obsolete
value _and_ that any thread suddenly leaving foreign call during
concurrent GC will trap and wait instead of returning to Lisp and
becoming an Evil Mutator.
* In old versions of our code, safe-regions used to be mismanaged
- Before I went on with experimental patches of widely-varying
quality, Dmitry and I synchronized our codebases rather
frequently. And despite all beauty and joys of page protection, I
have to say that the original Dmitry's design of SuspendThread-based
register conservation (with the same SuspendThread serving as an
asynchronous synchronization vehicle) does its job successfully in
all normal use cases. Starting the integration effort from that
Point of Divergence (or even from some earlier point) may be a good
decision; however, there is a flaw that I didn't notice at that
time, and it would be unfortunate if it ends up in upstream SBCL.
At the end of my patches overview,
http://www.siftsoft.com/inprogress/forknews.html , there is a 4-item
list, corresponding to major changes that I had to make in
safepoint-managing code. The last item (on SEH and unwinds) isn't
technically harmfull (except by providing a false assurance that
some situation is handled when it is not). But other items _do_
describe a real problem (from different perspectives).
Any safepoint implementation that has safe=>unsafe region transition
inside call_into_lisp, is necessary incorrect; and its failure mode
is so subtle that making it fail willfully is almost impossible.
When SBCL runtime calls Lisp function, it fetches a symbol value of
a static-function name (fdefinition-object), and calls a
function. If everything before call_into_lisp is supposed to be a
safe region (non-mutator code), GC from another thread may move a
fdefinition-object after it's fetched from the symbol, but before
it's actually called. (SUB-GC is the only Lisp call that is totally
immune to this problem, because its fdefinition-object is always
pinned if any GC is in progress).
Therefore I had to extend unsafe regions so they include
symbol-value loading as well as function calls. Just in case if
someone has to redo this work as part of this integration effort:
the presence of fake_foreign_function_call in the upstream code is
usually a place where unsafe region should begin in safepoint builds
(this connection is due to the fact that
fake_foreign_function_call's preconditions include blocked GC
* A harmless and important step towards Windows/x64 platform support
- During all my time dedicated to SBCL, there was a single instance of
an experience that I'd like to never meet again. 64-bit Windows
defines "long" type as _not_ being pointer-sized; and SBCL,
especially in parts written in C, relies on "long" and "unsigned
long" being pointer-sized in countless places.
Before the real work required for x64 Windows support, I had to walk
through almost every C source, adjusting type declarations -- almost
mechanically but not entirely mechanically (or else it would be
scriptable). Most of these places in my SBCL tree now have either
* intptr_t (or uintptr_t) -- where I perceived the primary intent of
code authors to have a "pointer-sized" value, and
* sword_t (uword_t), that are my own typedefs -- where I recognized
an intent to have a "machine-word-sized" value, with no explicit
connection to pointers.
Unfortunately, I don't have enough knowledge of all platforms that
SBCL is targetting (specifically, I've never programmed Alpha); so
while there are many typedefs in SBCL that seemed appropriate for
pointer-sized values, I wasn't too sure that they are good in this
role for other platforms. That's what caused me to introduce new
typedefs, which are better to be understood as synonyms for
"long and unsigned long on non-braindamaged systems".
It's better to get rid of supposedly-pointer-sized "longs" as soon
as possible, to avoid silly conflicts on code merge. If I'm wrong
with my uword_t and sword_t most of the time, than I'll be glad if
someone of the experienced SBCL developers either
(1) described the principles that I should use for choosing an
equivalent of "long" (e.g. lispobj and u64 [which is 32-bit on x86,
despite its name] sometimes seem to be the good candidates to
replace unsigned long), or
(2) started similar "long elimination" effort in the upstream, so I
can synchronize with it and stop worrying about the future merges.
Regards, Anton Kovalenko
+7(916)345-34-02 | Elektrostal' MO, Russia