Re: [Algorithms] lock-free stack with hazard list performance
Brought to you by:
vexxed72
|
From: Stephan R. <ke...@so...> - 2009-04-21 23:28:52
|
On Tue, 2009-04-21 at 15:33 -0700, Nicholas "Indy" Ray wrote: > I'm not familiar with Hazard lists, but from reading the pseudo code, > it seems that you get additional functionality at the cost or > performance, Unless I am missing something (which is likely, googling > "hazard lists" doesn't seem to be very helpful.) It seems that it'd > would be slower by principal. and only losing about 25% performance > over a spin lock doesn't seem to bad. Not really gaining any additional functionality, the whole point of going lock-free is performance gain over spinlocks. However, there are 2 key problems with lock-free that really have their root in the same place (apologies if I'm simply repeating stuff you already know): In between thread 1 obtaining the pointer to what I'll call Node A and then performing a CAS to change it, other threads could do the following: 1. Remove Node A and put it right back by recycling it. ABA problem. This will cause the first thread's CAS to succeed but with the wrong data causing data corruption. 2. Remove Node A and delete it. Major crash right there when thread 1 now tries do it's CAS. Both problems can be solved by ensuring that Node A is not recycled nor deleted until all threads release their reference to it. All lock-free implementations so far I've found have been in managed languages such as C# or Java for that reason so far as the GC ensures exactly that. In C++ however without a GC, this becomes a bit more problematic. Using a free-list to recycle nodes will solve problem 2, but it does not solve ABA, it actually begs for it and rolls out the red carpet and even pays for the limo. That can be countered a bit by using tag bits in the pointer, a common solution to the issue. When working with 32-bit this is easy, 32-bit pointer, 32-bit tag field, 64-bit CAS. Everything is happy. Each time the node is recycled, the tag field is increased ensuring that any still referencing threads will fail on their CAS. This however becomes problematic when working with 64-bit code and 64-bit pointers! No more spare 32-bits to use for tags. Can use the lower bits, but that really I'm not comfortable with as without enough bits, the tag will roll over around and cause ABA anyway. Which last but not least, brings me to what I'm using right now, Hazard Pointers. Basically, the principle is this: Each thread contains its own list of 'hazards'. A hazard basically is a pointer that one or multiple threads are currently referencing. So in my stack code for instance, when I access to top node, here's what happens: 1. Obtain pointer to node 2. Put the pointer into my hazard list 3. Do whatever I want with the node 4. When done, remove the pointer from my hazard list In principle, this works identically to a spinlock or mutex when you look at the code, except that every single thread has it's privately owned list. So each thread has its own private 'lock' so to speak, allowing multiple threads access to the same resource without blocking each other. However, no thread is allowed to delete or recycle a node as long as any other threads still have this pointer in their hazard list. That's what the pseudo code in the previous e-mail does. Solves ABA and the deletion problem. However, all that work is relatively pointless if it doesn't perform any quicker than a simple spinlock. I was just thinking, there may be one thing I can try to speed things up a bit. Currently, the hazard list is implemented as follows: One singly linked list that contains both pointers that are actively locked and to be deleted. Each record in the list has a status field that is either 'Active' (currently referenced), 'Delete' (to be deleted whenever possible) or 'Dead' (either reference has been released or it has been deleted). Dead records can then be recycled to avoid needing to grow the list which works quite well. ~15-20 hazard records in varying states per thread pumping 2^24 values on/off a stack. This also simplifies the Acquire / Release / Free code as I don't have to move data from one list to another. I simply just need to find my record, change the Status and be done. However, it may end up paying off to maintain two linked lists and dedicate one just to objects to be deleted. That way when I do my scans across other threads' lists, I avoid scanning records that the other thread wants to have deleted. Come to think of it, maybe even three doubly linked lists that allow me to move records around between lists and instead of having a status field in the record, the list that the record is in determines its status. With a doubly linked list I could do that fairly easily. I'll have to play with that and see what happens... Stephan > > On Tue, Apr 21, 2009 at 1:52 PM, Stephan Rose <ke...@so...> wrote: > > Currently working on lock-free stack and queue implementations with hazard lists. > > > > >From a technical standpoint, they are working nicely, currently doing my testing with the stack as it's just the simpler one implementation-wise so less room for bugs to sneak in. > > > > However, performance wise, things aren't so happy. > > > > Ran a few tests: > > > > Common to all tests: > > Values pushed/popped: 2^24 per push thread > > Thread count, 2 pushing 2 popping > > Processor: Quad core so each thread is running on its own core > > > > Stack with spinlock: ~33 seconds > > > > Stack with hazard-pointer, properly deleting memory: ~45 seconds > > > > Stack with hazard-pointer, leaking memory by just simply marking the pointer as no longer referenced: ~17 seconds > > > > The 3rd test does indicate a significant gain over the spinlock, nearly cutting the time in half. > > > > However, something about scanning the other threads for pointers that are being referenced is just taking performance and is completely dragging it down. In pseudo count this looks roughly something like this: > > > > In pseudocode below: > > > > if deleteCount < threshold return; // Tried various values here, values up to around 128 seem to slightly improve performance, while larger values like 1024 totally annihilate performance. > > > > foreach node > > if node->status = Delete > > if !scan(node->pointer) // scan returns false if no other threads hold a ref > > delete the node and mark node as free > > endif > > endif > > endforeach > > > > then the scan function: > > > > foreach thread > > if thread != thisthread and thread->isrunning > > if thread->hasreference(pointer) > > return true; > > endif > > endif > > endforeach > > > > The hasrefeference function then has to cycle through every single entry of it's threads hazard list and see if it has an active reference to that pointer. > > > > It works. Slowly. But 1 loop that for each iteration calls another loop which for each iteration calls another loop (try to say that 10 times really fast...) is bound to be like trying to tow a garbage truck with a honda civic, especially as the number of items in the lists goes up. > > > > Currently this these lists are all singly linked lists, I could try using something like a hashtable and seeing if I do better than that. But all the information I've found online about hazard pointers I think also use just singly linked lists so I'm not quite sure where my issue is. > > > > Any ideas? > > > > Thanks, > > > > Stephan > > > > ------------------------------------------------------------------------------ > > Stay on top of everything new and different, both inside and > > around Java (TM) technology - register by April 22, and save > > $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. > > 300 plus technical and hands-on sessions. Register today. > > Use priority code J9JMT32. http://p.sf.net/sfu/p > > _______________________________________________ > > GDAlgorithms-list mailing list > > GDA...@li... > > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > > Archives: > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > > ------------------------------------------------------------------------------ > Stay on top of everything new and different, both inside and > around Java (TM) technology - register by April 22, and save > $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. > 300 plus technical and hands-on sessions. Register today. > Use priority code J9JMT32. http://p.sf.net/sfu/p > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |