On Sat, Mar 14, 2009 at 12:57 PM, Gábor Melis <mega@retes.hu> wrote:
On Miércoles 11 Marzo 2009, Jean-Claude Beaudoin wrote:
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  Internal Error (threadLocalStorage.cpp:43), pid=8789,
> tid=3086904064 #  Error: guarantee(get_thread() == thread,"must be
> the same thread, quickly")

Why not look into threadLocalStorage.cpp and find out more?

Cheers,
Gabor

Well, I had done just that before writing to this list, but without much success. To
say the least, the organization of the source code of Hotspot feels bizarre
at best to me.  And it reminds me of why I prefer Common Lisp so much over C++...

But I agreed reading your reply that I had to give it one more try.  Here is what I found.

The line where it blows up (threadLocalStorage.cpp: 43) is an assertion that validates
the sanity (!?) of a pretty extreme optimization done in Hotspot in order to severely
accelerate the lookup of the pointer to an objec of the "Thread" class.  That class seems to be
the core of the thread management in Hotspot and looking it up would be the first step to doing
anything with threads in there.  The naive implementation would store that pointer in
TLS using pthread_?etspecific() , as seems to be still done on amd64, but on x86 platforms
Hotsport tries to bypass that pthread call through the use of a very large (4 megabytes!)
map vector (they call it _sp_map) indexed from the upper 20 bits of the stack pointer of
a thread (any thread).  They do get the stack pointer directly from register %esp and
code the lookup in embeded assembly so that the whole thing would take 1 or 2 CPU
cycles or so they claim in a comment of the source code!  This is pretty much the ultimate
space for speed trade.

One big weakness of this is that you have to know exactly the location and size of each
and every stack active in the VM before any interaction of the thread bound to it with the
Java VM. That is required in order to populate the optimization map vector before any access
to the relevant part of it.  That guessing of the location of a thread's stack seems to be
a bit tricky!  In the easy case it involves a call to an obscure and undocumented function
of the pthread library (pthread_getattr_np) in order to get access to the attributes object
used at thread creation and from which you can then extract the stack parameters.
But Hotspots' authors claims that such a call to pthread_getattr_np does not work on
Linux for the initial thread, at least not reliably enough, and that forces them to resort to
some elaborate second guessing as they coded in a function called
os::Linux::capture_initial_stack().

It seems to me that it is in that last function, trying to guess the location of the initial
stack, that the whole optimization scheme starts to go off the deep end.
It looks like SBCL is doing something unusual to that stack, reallocating it somehow
or something else esoteric, that throws off the guessing of capture_initial_stack().

Does SBCL indeed do any such manipulation to the stack of the initial thread?
Move it, resize it or grow it in some way? Re-align it in some way? The mapping
vector of Hotspot does assume the stack is aligned on pages of 4k.

Any hint in that area?

Thanks a lot,

Jean-Claude Beaudoin