Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

sbcl Log


Commit Date  
[4e815e] (20.9 kB) by Douglas Katzman Douglas Katzman

Assign thread-local storage indices at load-time on x86-64

This also includes a disassembler enhancement.

2014-04-03 05:31:54 View
Download
[8e4105] (20.8 kB) by Douglas Katzman Douglas Katzman

Some do-nothing clarifications about simple and closure funs.

2014-03-31 04:27:18 View
Download
[f63d03] (20.3 kB) by Alastair Bridgewater Alastair Bridgewater

Fix build of new SYMBOL-INFO magic on targets without compare-and-swap-vops.

* The (non-atomic) fallback position for compare-and-swap
requires some way to set the target value, which can be provided
as an explicit function name or will default to a SETF function.

* Define (SETF SYMBOL-INFO) as a :SET-TRANS for the SYMBOL-INFO
slot and provide an interpreter stub for it.

2014-03-24 01:22:50 View
Download
[0fb6f8] (20.2 kB) by Douglas Katzman Douglas Katzman

Initial reimplementation of globaldb - fast INFO and (SETF INFO).

The main idea is that info values are stored in a vector attached to
each symbol when possible. When not possible, the storage reverts
to the volatile [sic] environment, but still using a vector as the
payload instead of the chained hashing/alist approach.

This strives to be very fast at lookup at the expense of some added
complexity during updates. Performance testing suggests that it is
at least 2x to 3x faster at (INFO :class :type name), and FBOUNDP
is almost 4x faster. In a repeatable test, a file that took 1.8 seconds
to compile now takes 1.7 seconds but with more consing (as expected).

sbcl.core itself increases in size by <1% for 64-bit architecture,
and less for 32-bit architecture because there is proportionaly less
wasted space. A compact environment's table is effectively the
concatenation of all info vectors into one, so the added overhead is
in vector headers. However the fallback hash is now smaller,
so there used to be more wasted cells in the compact env.

Eventually the compact and volatile environments will both go away,
but not until the quasi-lockfree hashtable bootstraps properly.
The problem is an inability to use raw slots in early cold init.
It's actually not a problem of using them - the compiled code is ok -
but cold-init drops into 'ldb' due to how defstruct expands.

Among the bugs fixed by this (not straightforwardly testable) is that
the compact environment would hold into symbols that became otherwise
inaccessible. It no longer does, but still holds onto other names.

This patch builds with CCL as host, and for 32-on-64 and vice-versa,
so nothing seems terribly broken in terms of assumptions made.

2014-03-07 17:51:29 View
Download
[ced29b] (20.2 kB) by Stas Boukarev Stas Boukarev

Optimize special variable binding on sb-thread.

Remove a level of indirection when unbinding special bindings, instead
of saving a symbol on the binding stack, and then accessing its
tls-index to unbind it, save the tls-index directly, saving one memory
read.

2013-09-19 19:30:09 View
Download
[3031b2] (20.1 kB) by Paul Khuong Paul Khuong

Back end work for short vector SIMD packs

* Platform-agnostic changes:
- Declare type testing/checking routines.
- Define three primitive types: simd-pack-double for packs
of doubles, simd-pack-single for packs of singles, and
simd-pack-int for packs of integer/unknown.
- Define a heap-representation for 128-bit SIMD packs,
along with reserving a widetag and filling the corresponding
entries in gencgc's tables.
- Make the simd-pack class definition fully concrete.
- Teach IR1 how to expand SIMD-PACK type checks.
- IR2-conversion maps SIMD-PACK types to the right primitive type.
- Increase the limit on the number of storage classes: SIMD packs
went way past the previous (arbitrary?) limit of 40.

* Platform-specific changes, in src/compiler/target/simd-pack:
- Create new storage classes (that are backed by the float-reg [i.e. SSE]
storage base): one for each of double, single and integer sse packs.
- Also create the corresponding immediate-constant and stack storage
classes.
- Teach the assembler and the inline constant code about this new kind
of registers/constants, and how to map constant SIMD-PACKs to which SC.
- Define movement/conversion VOPs for SSE packs, along with VOP routines
needed for basic creation/manipulation of SSE packs.
- The type-checking VOP in generic/late-type-vops is extremely
x86-64-specific... IIRC, there are ordering issues I do not
want to tangle with.

* Implementation idiosyncrasy: while type *tests* (i.e. TYPEP calls) consider
the element type, type *checks* (e.g. THE or DECLARE) only check for
SIMD-PACKness, without looking at the element type. This is allowed by the
standard, is similar to what Python does for FUNCTION types, and helps
code remain efficient even when type checks can't be fully elided.

The vast majority of the code is verbatim or heavily inspired by Alexander
Gavrilov's branch.

2013-05-21 19:11:26 View
Download
[37d382] (19.7 kB) by David Lichteblau David Lichteblau

Support building without PSEUDO-ATOMIC on POSIX safepoints

- Mark Lisp signal handlers with a flag `synchronous' to indicate
whether we can (and must) handle them immediately. Conversely,
we understand this flag to imply a guarantee that the signal
does not occur during allocation.

- Any signal with a Lisp handler that is not synchronous is
implemented in the runtime using a trampoline, which (instead of
invoking Lisp code directly) first spawns a new pthread, which
only then calls back into Lisp to invoke the handler function
(with a fake signal context).

- Used in particular for SIGINT.

- For SIGPROF, introduce a second per-thread allocation region,
which gets swapped with the usual region around the call into
SIGPROF-HANDLER. This handler is a special case, because it is
careful not to trigger GC nor non-local unwinds, and we can
safely return to the original region afterwards.

- Add a new subclass SIGNAL-HANDLER-THREAD for this purpose,
making it easy to identify these threads (e.g. in the test
driver).

- Run sprof tests while building the contrib. Add a test stressing
time profiling of allocation sequences.

Enable using :SB-SAFEPOINT-STRICTLY on features.

Quite usable already on x86 and x86-64; PPC still has more prominent
issues, e.g. in threads.impure.lisp.

2012-12-21 19:30:48 View
Download
[d1a2fa] (19.6 kB) by David Lichteblau David Lichteblau

Some support for platforms whose libraries do not maintain a frame pointer

For platforms on which system libraries are built with the
equivalent of -fomit-frame-pointer, i.e. do not maintain EBP, save
it in the thread structure upon entry to an exception handler, and
restore the register during call_into_lisp.

Currently for Windows on x86-64 only, where it is required.
Analogous changes had been implemented for x86, but are not included
here.

Thanks to Anton Kovalenko.

2012-12-05 16:34:28 View
Download
[26ac61] (19.3 kB) by David Lichteblau David Lichteblau

Port to x86-64 versions of Windows

- Microsoft x86-64 calling convention differences compared to the
the System V ABI: Argument passing registers; shadow space.
- Inform gcc that we are using the System V ABI for a few functions.
- Define long, unsigned-long to be 32 bit. This change just falls
into place now, since incompatible code had been adjusted earlier.
- Use VEH, not SEH.
- No pseudo atomic needed around inline allocation, but tweak alloc().
- Use the gencgc space alignment that also works on win32 x86.
- Factor "function end breakpoint" handling out of the sigtrap handler.

Beware known bugs, manifested as hangs during threads.impure.lisp,
happening rather frequently with 64 bit builds and at least much
less frequently (or not at all) with 32 bit binaries on the same
version of Windows, tested on Server 2012. (All credit for features
goes to Anton, all bugs are my fault.)

Thanks to Anton Kovalenko.

2012-12-05 16:34:28 View
Download
[1b6d88] (19.3 kB) by David Lichteblau David Lichteblau

LLP64: change signed long to sword_t

Adjust uses of `long' in the C runtime for LLP64 platforms:
Replace `long' with `sword_t' where applicable.

Thanks to Anton Kovalenko.

2012-11-20 14:02:11 View
Download
[b727b3] (19.3 kB) by David Lichteblau David Lichteblau

LLP64: change unsigned long to uword_t

Adjust uses of `unsigned long' in the C runtime for LLP64 platforms:
Replace with `uword_t' where applicable.

Thanks to Anton Kovalenko.

2012-11-20 14:01:27 View
Download
[3f85a9] (19.3 kB) by David Lichteblau David Lichteblau

Allow synchronous win32 I/O to be interrupted, too

... if and only if running on a version of Windows new enough to
support doing so. Two scenarios come to mind where synchronous (i.e.
non-overlapped) I/O might matter:

- There is one kind of HANDLE which is never overlapped: Unnamed
pipes. Unlike named pipes, the feature added by this commit is
our only option of interrupting I/O on the former.

- User code might pass in a HANDLE through MAKE-FD-STREAM without
the right flag set. In principle, non-interruptibily of such a
HANDLE is a bug in said user code, but it doesn't hurt to deal
with these correctly as a side benefit. (The only Windows
releases which support re-opening of a HANDLE with the right
flag also have the functions needed by this commit.)

One downside for users might be an element of surprise, in that the
same SBCL binary will exhibit the presence or lack of features,
respectively, when started on recent Windows or old Windows. However,
the advantages of offering the feature seem to me to outweigh that
disadvantage.

Thanks to Anton Kovalenko.

2012-11-02 12:23:19 View
Download
[7aef55] (19.2 kB) by David Lichteblau David Lichteblau

Preliminary work towards threads on win32

* Implement SB-THREAD

* Support WITH-TIMEOUT, etc.

Implementation details:

* Implement pthreads, futex API on top of Win32.
* Adds support for the timer facility using sb-wtimer.
* Implement an interruptable `nanosleep' using waitable timers.
* Threading on Windows uses safepoints to stop the world.
On this platform, either all or none of :SB-THREAD, :SB-SAFEPOINT,
:SB-THRUPT, and :SB-WTIMER need to be enabled together.
* On this platform, INTERRUPT-THREAD will not run interruptions
in a target thread that is executing foreign code, even though
the POSIX version of sb-thrupt still allows this (potentially
unsafe) form of signalling by default.

Does not yet include interruptible I/O, which will be made available
separately. Slime users are requested to build SBCL without threads
until then.

Note that these changes alone are not yet sufficient to make SBCL on
Windows an ideal backend. Users looking for a particularly stable
or thread-enabled version of SBCL for Windows are still advised to
use the well-known Windows branch instead.

This is a merge of features developed earlier by Dmitry Kalyanov and
Anton Kovalenko.

2012-10-05 19:38:38 View
Download
[e6f4c7] (19.2 kB) by David Lichteblau David Lichteblau

Add safepoint mechanism

* Stop threads for GC at safepoints only.

* Replaces use of SIG_STOP_FOR_GC.

* Currently not used by default. Users need to set feature
SB-SAFEPOINT to enable this code. SB-SAFEPOINT should only be set
when SB-THREAD is also enabled.

* ISA support: Each architecture needs VOP support, and changes to
foreign call-out assembly; only x86 and x86-64 implemented at this
point.

* OS support: Minor changes to signal handling required, currently
implemented for Linux and Solaris.

* Performance note: Does not currently replace pseudo-atomic entirely,
except on Windows. Only once further work has been done to reduce
use of signals will pseudo-atomic become truly redundant. Therefore
use of safepoints on POSIX currently still implies the combined
performance overhead of both mechanisms.

* Design alternatives exist for some choices made here. In particular,
this commit places the safepoint trap page into the SBCL binary for
simplicity. It is likely that future changes to allow slam-free
runtime changes will have to go back to a hand-crafted address
parameter.

* This feature has been extracted from work related to Windows
support and backported to POSIX.

Credits: Uses the CSP-based stop-the-world protocol by Anton Kovalenko,
based on the safepoint and threading work by Dmitry Kalyanov. Use of
safepoints for SBCL originally researched by Paul Khuong.

2012-08-10 12:51:45 View
Download
[597826] (18.9 kB) by Cyrus Harmon Cyrus Harmon , pushed by Paul Khuong Paul Khuong

Miscellaneous cleanups for threaded darwin platforms

* Gather some related declarations in fewer (conditionalised) places

* Lay down some infrastructure for mach port different from threads'
addresses

* Slightly modified by Paul Khuong

2012-08-01 22:09:59 View
Download
[1dd527] (18.8 kB) by Nathan Froyd Nathan Froyd

micro-optimize allocation sequences, special variable binding on x86-64

Move the ALLOC-REGION, PSEUDO-ATOMIC-BITS, and BINDING-STACK-* slots
closer to the beginning of the thread structure. This change ensures
that the offsets for those slots are < 128 bytes, which in turns enables
shorter encodings for all accesses to this structure from Lisp code.

Code size of the C runtime was negligibly affected by this change.

2012-04-13 17:58:41 View
Download
[40bff3] (18.6 kB) by Nikodemus Siivola Nikodemus Siivola

stack-allocatable fill-initialized specialized arrays

I *think* we had this working earlier already, but it's been broken at least
for a while now since there were no tests for it.

Add a DEFKNOWN to the array byte bashers, providing the RESULT-ARG -- and
make them return the sequence.

Replace the unused and bitrotted UNSAFE IR1 attribute with its inverse:
DX-SAFE, and use that togather with RESULT-ARG to allow multiple refs to
potentially DX leafs. Still accept UNSAFE in DEFKNOWNs occurring in
user-code, but ignore it and give a style-warning.

For now, add DX-SAFE to LENGTH and VECTOR-LENGTH, which is enough for our
purposes.

Fixes lp#902351.

2011-12-10 14:26:32 View
Download
[8340bf] (18.8 kB) by Nikodemus Siivola Nikodemus Siivola

semaphores in the runtime

Trivial refactorings:

* Rename STATE_SUSPENDED STATE_STOPPED for elegance. (Spells with the same
number of letters as STATE_RUNNING, things line up nicer.)

* Re-express make_fixnum in terms of MAKE_FIXNUM so that we can use the
latter to define STATE_* names in a manner acceptable to use in
switch-statements.

* Move Mach exception handling initialization to darwin_init from
create_initial_thread so that current_mach_task gets initialized before
the first thread struct is initialized.

The Beef:

Replace condition variables in the runtime with semaphores.

On most platforms use sem_t, but on Darwin use semaphore_t. Hide the
difference behind, os_sem_t, os_sem_init, os_sem_destroy, os_sem_post, and
os_sem_wait.

POSIX realtime semaphores are supposedly safe to use in signal handlers,
unlike condition variables -- and experimentally at least Mach semaphores
on Darwin are a lot less prone to problems.

(Our pthread mutex usage isn't quite kosher either, but it's the
pthread_cond_wait and pthread_cond_broadcast pair that seemed to be
causing most of the trouble.)

2011-12-05 16:38:38 View
Download
[b71b8d] (18.5 kB) by Nikodemus Siivola Nikodemus Siivola

extensible CAS and CAS extensions

DEFINE-CAS-EXPANDER and DEFCAS are analogous to DEFINE-SETF-EXPANDER and
DEFSETF, including CAS-functions similar to SETF-functions:

(defun (cas foo) (old new ...) ...)

THis is exported from SB-EXT for users to play with, and used to implement
our CAS places internally.

Add support for CAS of:

* SLOT-VALUE

* STANDARD-INSTANCE-ACCESS

* FUNCALLABLE-STANDARD-INSTANCE-ACCESS

In case of SLOT-VALUE we don't yet support any optimizations or specify
results when SLOT-VALUE-USING-CLASS or friends are in play -- perhaps later
we can add

(CAS SLOT-VALUE-USING-CLASS) &co

in order to support it for arbitrary instances.

Adding support for permutation vector optimization should not be too hard
either, but let's let the dust settle first...

2011-11-12 13:41:48 View
Download
[d6f967] (18.4 kB) by Nikodemus Siivola Nikodemus Siivola

killing lutexes, adding timeouts

* Remove all lutex-specific code from the system.
** Use SB-FUTEX for futex-capable platforms, and plain SB-THREAD
otherwise.
** Make non-futex mutexes unfair spinlocks for now, using WAIT-FOR to
provide timeouts and backoff.
** Build non-futex condition variables on top of a queue and WAIT-FOR.

Performance implications: SB-FUTEX builds should perform pretty much the
same, or improve a bit. Threaded non-futex builds are affected as follows:

1. Threads idling on semaphores or condition variables aren't quite as
cheap. Just how costly depends on the OS. On Darwin 1000 idle threads
can chew up a bit over 50% CPU. I will try to address this later.

2. Contested locking around operations that take considerably longer
than a single timeslice suffers mild degradation.

3. Contested locking around operations that don't take long is an order
of magnitude performant.

4. Highly active semaphores perform much better. (Follows from #3.)

* GRAB-MUTEX gets timeout support on all platforms.

* CONDITION-WAIT gets timeout support.

* Disable a bunch of prone-to-hang thread tests on Darwin. (All of them
were already prone to hang prior to this commit.)

* Enable a bunch tests that now /pass/ on Darwin. \o/ This doesn't mean that
the threaded Darwin is fully expected to pass all tests yet, but let's say
it's more likely to do so.

...but still not robust enough to enable threads on Darwin by default.

* GET-MUTEX/GRAB-MUTEX get refactored into two main parts: %TRY-MUTEX and
%WAIT-ON-MUTEX, which are also used directly from CONDITION-WAIT where
appropriate.

2011-11-09 23:00:48 View
Download
[c86681] (19.0 kB) by Alastair Bridgewater Alastair Bridgewater

threads: Thread objects don't need a lowtag.

* It was a cute hack, in a way, to force the existing genesis
machinery to produce assembler symbols for thread structure slots.
But it's still a hack, and needs to die. And now it can.

2011-10-25 22:39:29 View
Download
[6793d7] (19.1 kB) by Alastair Bridgewater Alastair Bridgewater

1.0.41.21: runtime: Current stack and frame pointers are per-thread data.

* Add slots to the thread structure on threaded targets to hold
the control stack and frame pointers.

* Add some macros to thread.h to grab the correct variable or
slot on all builds, and use them everywhere required.

* Conditional-compile out the old global variables for this on
threaded targets (I probably messed this up).

2010-08-07 13:46:26 View
Download
[9a4436] (18.7 kB) by Alastair Bridgewater Alastair Bridgewater

1.0.41.19: runtime: Fix pseudo-atomic on non-x86oid gencgc.

* Pseudo-atomic is per-thread state, add it to struct thread.

* Pass the correct pointer for accessing p-a in dynbind.c.

* In {undo_,}fake_foreign_function_call(), stash reg_ALLOC as
pseudo-atomic-bits on threaded targets.

* In pseudo-atomic.h, the ppc gencgc code is really non-x86oid
gencgc code.

* Also in pseudo-atomic.h, update the non-x86oid gencgc code
to do the right thing with threaded pseudo-atomic-bits.

* Due to the way dynamic binding works on threaded targets, it
is now a requirement that the arch_* pseudo_atomic functions call
the generic versions if foreign_function_call_active_p() is true
on threaded targets (in short, C code needs to be able to enter
pseudo-atomic, not just lisp code).

2010-08-07 13:45:56 View
Download
[716153] (18.7 kB) by Alastair Bridgewater Alastair Bridgewater

1.0.41.17: runtime: Make foreign_function_call_active work with threaded targets.

* Add a slot to the thread structure for the active flag.

* Make the existing global variable only show up on
unithread targets.

* Introduce a wrapper macro to portably access the right
slot on both theaded and unithread targets.

* KLUDGE things up to maintain the old behavior on x86oids
until someone gets around to fixing x86{,-64}-assem.S to set
foreign_function_call_active properly.

2010-08-07 13:42:40 View
Download
[a157ed] (18.4 kB) by Paul Khuong Paul Khuong

1.0.29.44: Complex float improvements

* On all platforms:
- Slightly more stable complex-complex float (double and single)
division;
- New transform for real-complex division;
- complex-real and real-complex float addition and subtraction
behave as though the real was first upgraded to a complex, thus
losing the sign of any imaginary zero.

* On x86-64
- Complexes floats are represented packed in a single SSE register;
- VOPs for all four arithmetic operations, complex-complex, but also
complex-real and real-complex, except for complex-complex and
real-complex division;
- VOPs for =, negate and conjugate of complexes (complex-real and
complex-complex);
- VOPs for EQL of floats (real and complexes).
- Full register moves for float values in SSE registers should also
speed scalar operations up.

2009-06-25 15:37:05 View
Download
Older >