Re: [Algorithms] lock-free stack with hazard list performance

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 2009-04-21 at 15:33 -0700, Nicholas "Indy" Ray wrote:
> I'm not familiar with Hazard lists, but from reading the pseudo code,
> it seems that you get additional functionality at the cost or
> performance, Unless I am missing something (which is likely, googling
> "hazard lists" doesn't seem to be very helpful.) It seems that it'd
> would be slower by principal. and only losing about 25% performance
> over a spin lock doesn't seem to bad.

Not really gaining any additional functionality, the whole point of
going lock-free is performance gain over spinlocks.

However, there are 2 key problems with lock-free that really have their
root in the same place (apologies if I'm simply repeating stuff you
already know):

In between thread 1 obtaining the pointer to what I'll call Node A and
then performing a CAS to change it, other threads could do the
following:

1. Remove Node A and put it right back by recycling it. ABA problem.
This will cause the first thread's CAS to succeed but with the wrong
data causing data corruption.

2. Remove Node A and delete it. Major crash right there when thread 1
now tries do it's CAS.

Both problems can be solved by ensuring that Node A is not recycled nor
deleted until all threads release their reference to it. All lock-free
implementations so far I've found have been in managed languages such as
C# or Java for that reason so far as the GC ensures exactly that.

In C++ however without a GC, this becomes a bit more problematic. Using
a free-list to recycle nodes will solve problem 2, but it does not solve
ABA, it actually begs for it and rolls out the red carpet and even pays
for the limo.

That can be countered a bit by using tag bits in the pointer, a common
solution to the issue. When working with 32-bit this is easy, 32-bit
pointer, 32-bit tag field, 64-bit CAS. Everything is happy. Each time
the node is recycled, the tag field is increased ensuring that any still
referencing threads will fail on their CAS.

This however becomes problematic when working with 64-bit code and
64-bit pointers! No more spare 32-bits to use for tags. Can use the
lower bits, but that really I'm not comfortable with as without enough
bits, the tag will roll over around and cause ABA anyway.

Which last but not least, brings me to what I'm using right now, Hazard
Pointers. 

Basically, the principle is this:

Each thread contains its own list of 'hazards'. A hazard basically is a
pointer that one or multiple threads are currently referencing. So in my
stack code for instance, when I access to top node, here's what happens:

1. Obtain pointer to node
2. Put the pointer into my hazard list
3. Do whatever I want with the node
4. When done, remove the pointer from my hazard list

In principle, this works identically to a spinlock or mutex when you
look at the code, except that every single thread has it's privately
owned list. So each thread has its own private 'lock' so to speak,
allowing multiple threads access to the same resource without blocking
each other.

However, no thread is allowed to delete or recycle a node as long as any
other threads still have this pointer in their hazard list. That's what
the pseudo code in the previous e-mail does. Solves ABA and the deletion
problem.

However, all that work is relatively pointless if it doesn't perform any
quicker than a simple spinlock.

I was just thinking, there may be one thing I can try to speed things up
a bit. Currently, the hazard list is implemented as follows:

One singly linked list that contains both pointers that are actively
locked and to be deleted. Each record in the list has a status field
that is either 'Active' (currently referenced), 'Delete' (to be deleted
whenever possible) or 'Dead' (either reference has been released or it
has been deleted). Dead records can then be recycled to avoid needing to
grow the list which works quite well. ~15-20 hazard records in varying
states per thread pumping 2^24 values on/off a stack.

This also simplifies the Acquire / Release / Free code as I don't have
to move data from one list to another. I simply just need to find my
record, change the Status and be done.

However, it may end up paying off to maintain two linked lists and
dedicate one just to objects to be deleted. That way when I do my scans
across other threads' lists, I avoid scanning records that the other
thread wants to have deleted.

Come to think of it, maybe even three doubly linked lists that allow me
to move records around between lists and instead of having a status
field in the record, the list that the record is in determines its
status. With a doubly linked list I could do that fairly easily. 

I'll have to play with that and see what happens...

Stephan

> 
> On Tue, Apr 21, 2009 at 1:52 PM, Stephan Rose <ke...@so...> wrote:
> > Currently working on lock-free stack and queue implementations with hazard lists.
> >
> > >From a technical standpoint, they are working nicely, currently doing my testing with the stack as it's just the simpler one implementation-wise so less room for bugs to sneak in.
> >
> > However, performance wise, things aren't so happy.
> >
> > Ran a few tests:
> >
> > Common to all tests:
> > Values pushed/popped: 2^24 per push thread
> > Thread count, 2 pushing 2 popping
> > Processor: Quad core so each thread is running on its own core
> >
> > Stack with spinlock: ~33 seconds
> >
> > Stack with hazard-pointer, properly deleting memory: ~45 seconds
> >
> > Stack with hazard-pointer, leaking memory by just simply marking the pointer as no longer referenced: ~17 seconds
> >
> > The 3rd test does indicate a significant gain over the spinlock, nearly cutting the time in half.
> >
> > However, something about scanning the other threads for pointers that are being referenced is just taking performance and is completely dragging it down. In pseudo count this looks roughly something like this:
> >
> > In pseudocode below:
> >
> > if deleteCount < threshold return; // Tried various values here, values up to around 128 seem to slightly improve performance, while larger values like 1024 totally annihilate performance.
> >
> > foreach node
> >  if node->status = Delete
> >    if !scan(node->pointer) // scan returns false if no other threads hold a ref
> >      delete the node and mark node as free
> >    endif
> >  endif
> > endforeach
> >
> > then the scan function:
> >
> > foreach thread
> >  if thread != thisthread and thread->isrunning
> >    if thread->hasreference(pointer)
> >      return true;
> >    endif
> >  endif
> > endforeach
> >
> > The hasrefeference function then has to cycle through every single entry of it's threads hazard list and see if it has an active reference to that pointer.
> >
> > It works. Slowly. But 1 loop that for each iteration calls another loop which for each iteration calls another loop (try to say that 10 times really fast...) is bound to be like trying to tow a garbage truck with a honda civic, especially as the number of items in the lists goes up.
> >
> > Currently this these lists are all singly linked lists, I could try using something like a hashtable and seeing if I do better than that. But all the information I've found online about hazard pointers I think also use just singly linked lists so I'm not quite sure where my issue is.
> >
> > Any ideas?
> >
> > Thanks,
> >
> > Stephan
> >
> > ------------------------------------------------------------------------------
> > Stay on top of everything new and different, both inside and
> > around Java (TM) technology - register by April 22, and save
> > $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> > 300 plus technical and hands-on sessions. Register today.
> > Use priority code J9JMT32. http://p.sf.net/sfu/p
> > _______________________________________________
> > GDAlgorithms-list mailing list
> > GDA...@li...
> > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> > Archives:
> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
> >
> 
> ------------------------------------------------------------------------------
> Stay on top of everything new and different, both inside and 
> around Java (TM) technology - register by April 22, and save
> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> 300 plus technical and hands-on sessions. Register today. 
> Use priority code J9JMT32. http://p.sf.net/sfu/p
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list