[Xsb-development] The thread that won't die

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Last night I took another look at what was happening to Michael's
program in XSB.  I wanted to verify that hashification was involved
with the hanging and to understand the specifics a little better.  It
turned out that hashification is indeed involved (not surprising) and
the exact problem is this.  A choice point points to a node $N_prev$,
that is involved in a sibling list to be hashified.  The instruction
for N_prev is trie_try_xxx, try_retry_xxx, or try_trust_xxx.  When
$N_prev$ is hashified, it is not reallocated, all pointers to it are
still valid memory pointers.  *However*, if the new node $N_new$ is
the only node in its hash bucket its instruction will be changed to a
trie_no_cp_xxx.

In the case I saw last night, a choice point to $N_prev$ was set on
the CP stack and after hashification we backtrack to $N_new$, which
calls the trie_no_cp_xxx instruction which in turn attempts to unify
the position of a program term with a trie symbol and correctly fails.
The failure causes backtracking to, guess what ... trie_no_cp_xxx.
The problem is that this instruction which is not intended to be a
backtracking instruction, does not do any of the necessary choice
point management that try-retry-trust instructions do.  Backtracking
into a trie_no_cp_xxx instruction thus causes an infinite loop.

Whatever the semantics of assert and retract should be, I dont think
we should go into infinite loops.  Furthermore I dont see a good way
to allow one computation path to assert into tries and have another
backtrack through them (outside of tabling, which is a different
story).  And furthermore again, the problem will only increase when we
parallelize, when we want to start making our operations thread-safe.

So my proposal is to put a "lock" on asserted/interned tries by having
the first instruction of these tries be choice point that implements a
simple "lock" that ensures that the same trie cannot be both written
to and read from at the same time.

The lock is acquired either by one_node_check_insert() or by executing
the first choice point in an asserted trie, and will allow any number
of readers *or* a single writer.

The lock is released at the end of one_node_check_insert() or when
backtracking out of the trie.

There would be two new trie instructions for asserted/interned tries
only.

trie_set_lock which creates a choice point whose pcreg is
trie_release_lock and sets the lock to "read_mode"

trie_release_lock which deallocates the cp and sets the lock to
"none".

For now, the lock itself would be more of a byte, and the action of a
writer trying to acquire the lock in read mode or a reader in write
mode would be to abort.  When we parallelize, we can get fancier.  The
lock itself would probably reside in the TIF, but I havent looked
closely at this.

I think this is a reasonable compromise between the error handling
we'd like and the simplicity and speed we expect from tries.  I cant
see this slowing down trie code to any appreciable extent.  I havent
had much time to look closely into details of implementing this, so
comments on specifics would be appreciated.  It would take me a day's
work to implement this, and I can try to get around to it in the next
couple of weeks.  Realistically, I probably wouldnt get around to
anything significantly more complicated soon
(translated that means ever :-) so please keep that in mind when
commenting.

Terry