Re: [Sbcl-devel] Beginning with TLS for SBCL win32 threads

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, Aug 15, 2008 at 12:15 AM, Elliott Slaughter
<ell...@gm...> wrote:

(Sorry for the slightly disjointed reply -- you probably want to read
it in reverse order...)

> Exception Code: 0xc0000005.
> Faulting IP: 0x412cb0.
> page status: 0x10000.
> Was writing: 0, where: 0x3fff8010.
> fatal error encountered in SBCL pid 2304(tid 3948):
> Exception too early in cold init, cannot continue.
> Welcome to LDB, a low-level debugger for the Lisp runtime environment.
> ldb> unknown command: ``;;;''

After you get to this point using Brian's suggestion of --core
cold-sbcl.core, things to do:

See if you can get an LDB backtrace. (Command is "ba".) Then, if you
have a working GDB on Win32, attach it, and see if the C backtrace
leads you anywhere. If it doesn't, look in sbcl.nm if you can figure
out which C side function in SBCL the IP could be int (if any).
Disassembling the area around the faulting IP might give a clue.
Constructing a backtrace by hand is also an option -- sbcl-internals
wiki has a short guide.

The instruction pointer seems to be outside the Lisp heap, so the
fault occurs either in C code of SBCL, or in some library.

The address where the write (or read -- I don't think so?) faulted, on
the other hand, is smack in the middle of dynamic space. (See address
space layout in src/compiler/x86/parms.lisp -- search for the string
win32 to find the relevant bits.)

So, what occurs that during a C side call an attempt to write to a
protected page was made. However (see handle_exception in win32-os.c),
either is_valid_lisp_address didn't return true for the address for
some reason, or gencgc_handle_wp_violation() declined to handle it.
The first option is just wierd. The second can eg. occur if gc_init()
has not completed setting up the page tables yet.

So, tasks:

* Verify that the faulting address is in the Lisp heap. (I believe so.)

* Verify that the IP is not in Lisp space. (I believe so.)

* Find out what causes the write -- by bisection via printf & /show if
necessary, backtrace/examining sbcl.nm is likely to be faster.

* When you know what is being written and where, you should be able to
figure out which of the following three options is right:

1. Writing in the wrong address.
2. Writing in the right address but too early.
3. Writing in the right address, and everything should be set up --
and it still goes wrong. The something (eg. the heap_base pointer) has
been corrupted earlier on.

(There may be other possibilities as well, but these are the ones that
spring to mind.)

Cheers,

 -- Nikodemus

Re: [Sbcl-devel] Beginning with TLS for SBCL win32 threads

Common Lisp compiler and runtime

Re: [Sbcl-devel] Beginning with TLS for SBCL win32 threads