gdalgorithms-list Mailing List for Game Dev Algorithms (Page 40)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 64-bit another option would be to do pointer-compresssion then do 64-bit
CAS as per 32-bit.

You can look into win32api InterlockedPopEntrySList and
InterlockedPushEntrySList.
They solve aba problem and deletion problem is partially fixed (they still
can do a read access to a node which is popped from a list, so in case you
delete that page with the node you have a problem then).

x86 code is pretty simple. I didn't look into x64 code yet.

Alexander.

2009/4/22 Stephan Rose <ke...@so...>

> On Tue, 2009-04-21 at 18:53 -0700, Nicholas "Indy" Ray wrote:
> > So, excuse if I'm missing a lot, I haven't created any lock-free data
> > structures in about 2 years, But I suspect something like the
> > following might work, It has a bit longer contention, and will have to
> > spin more often, but I suspect there are also better ways to achieve
> > the same thing.
> >
> > do
> > {
> >   node = head;
> >   if(!node) continue;
> > }
> > while(cas(head, node, 0));
> >
> > head = node->next;
> >
> > retData = node->Data;
> >
> > delete node;
>
> Nope, it definitely works, however...
>
> 1. It's not lock-free, your cas on the head to set it to 0 forms a
> lock. :)
>
> 2. It prevents other threads from peeking at the top of the stack
> without potentially locking which in my case is not only lock-free but
> also wait-free. Though honestly not sure how much of an issue that is as
> I'm not really sure how useful it really is to peek at a stack/queue
> being accessed by multiple threads. No guarantee that a following
> pop/dequeue will return the peeked item.
>
> 3. One advantage that locking algorithms do have over lock-free is that
> with the right kind of spinlock, thread starvation can be avoided by
> guaranteeing that threads will obtain their lock in the order they
> requested it. The above lock makes no such guarantees however.
>
>
> >
> > This also will also require at least one dummy node, but I don't
> > suppose that would be a huge problem.
> >
> > Anyways, when I get access to my main computer tonight or tomorrow, I
> > will check to see how my current queue manages memory, which I think
> > is more optimal.
>
> That would be awesome, I'd be most interested in seeing that.
>
> Stephan
>
>
>
>
> ------------------------------------------------------------------------------
> Stay on top of everything new and different, both inside and
> around Java (TM) technology - register by April 22, and save
> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> 300 plus technical and hands-on sessions. Register today.
> Use priority code J9JMT32. http://p.sf.net/sfu/p
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>

-- 
Regards,
Alexander Karnakov

On Tue, 2009-04-21 at 18:53 -0700, Nicholas "Indy" Ray wrote:
> So, excuse if I'm missing a lot, I haven't created any lock-free data
> structures in about 2 years, But I suspect something like the
> following might work, It has a bit longer contention, and will have to
> spin more often, but I suspect there are also better ways to achieve
> the same thing.
> 
> do
> {
>   node = head;
>   if(!node) continue;
> }
> while(cas(head, node, 0));
> 
> head = node->next;
> 
> retData = node->Data;
> 
> delete node;

Nope, it definitely works, however...

1. It's not lock-free, your cas on the head to set it to 0 forms a
lock. :)

2. It prevents other threads from peeking at the top of the stack
without potentially locking which in my case is not only lock-free but
also wait-free. Though honestly not sure how much of an issue that is as
I'm not really sure how useful it really is to peek at a stack/queue
being accessed by multiple threads. No guarantee that a following
pop/dequeue will return the peeked item.

3. One advantage that locking algorithms do have over lock-free is that
with the right kind of spinlock, thread starvation can be avoided by
guaranteeing that threads will obtain their lock in the order they
requested it. The above lock makes no such guarantees however.

> 
> This also will also require at least one dummy node, but I don't
> suppose that would be a huge problem.
> 
> Anyways, when I get access to my main computer tonight or tomorrow, I
> will check to see how my current queue manages memory, which I think
> is more optimal.

That would be awesome, I'd be most interested in seeing that.

Stephan

So, excuse if I'm missing a lot, I haven't created any lock-free data
structures in about 2 years, But I suspect something like the
following might work, It has a bit longer contention, and will have to
spin more often, but I suspect there are also better ways to achieve
the same thing.

do
{
  node = head;
  if(!node) continue;
}
while(cas(head, node, 0));

head = node->next;

retData = node->Data;

delete node;

This also will also require at least one dummy node, but I don't
suppose that would be a huge problem.

Anyways, when I get access to my main computer tonight or tomorrow, I
will check to see how my current queue manages memory, which I think
is more optimal.

Nicholas "Indy" Ray

On Tue, Apr 21, 2009 at 6:15 PM, Stephan Rose <ke...@so...> wrote:
> On Tue, 2009-04-21 at 17:23 -0700, Nicholas "Indy" Ray wrote:
>> On Tue, Apr 21, 2009 at 4:28 PM, Stephan Rose <ke...@so...> wrote:
>> > 2. Remove Node A and delete it. Major crash right there when thread 1
>> > now tries do it's CAS.
>>
>> Here is where I am confused, shouldn't the CAS be on the head pointer
>> during a dequeue so, that if the CAS is successful (assuming that
>> dequeue is the only way to get a reference to an item on the list,
>> which is how lock-free queues should be anyways) you know that you are
>> the only thread that has access to the node?
>
> To get a reference to an item on the list you're correct, you have to
> dequeue it and then it solely belongs to the thread that dequeued it and
> there are no issues. That isn't the problem.
>
> The problem are the nodes that make up the list and the references to
> them. See below for an example.
>
>>
>> > Both problems can be solved by ensuring that Node A is not recycled nor
>> > deleted until all threads release their reference to it. All lock-free
>> > implementations so far I've found have been in managed languages such as
>> > C# or Java for that reason so far as the GC ensures exactly that.
>>
>> Again, not sure why multiple threads have reference to nodes still on the queue.
>
> Well first off, the node declaration itself is simple:
>
> template <typename T>
> struct Node
> {
>  T data;
>  Node* next;
> }
>
> Simple enough, nothing fancy.
>
> Now the stack is initialized with a dummy head node that contains no
> data and initially next = 0.
>
> Now here is what happens, I'll prefix each line with the thread that's
> executing it such as T1, T2, etc. Code is simplified for clarity.
>
> T1 Wants to pop a node off the stack.
>
> T1: Node* node;
> T1: do {
> T1:   node = head->next;
>
> Next, T1 gets preempted and it's T2's turn, it also wants to pop a node
> off the stack.
>
> T2: Node* node;
> T2: do {
> T2:   node = head->next;
>
> Both T1 and T2 are now referencing the same node and are both trying to
> pop the same node off the stack.
>
> T2: } while(!cas(head->next, node, node->next);
>
> T2 is not preempted and succeeds with its CAS and now proceeds to delete
> the node.
>
> T2: delete node;
>
> T2 is now done, control returns to T1 which is still pointing at the
> very node that was just deleted.
>
> T1: } while(!cas(head->next, node, node->next);
> T1: CRASH!
>
> And it doesn't matter if this is running on 1 core, or multiple cores,
> the above will happen when multiple threads try to pop a node from the
> same stack if access to the current node is not guarded to prevent too
> early deletion.
>
> Stephan
>
>
>
> ------------------------------------------------------------------------------
> Stay on top of everything new and different, both inside and
> around Java (TM) technology - register by April 22, and save
> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> 300 plus technical and hands-on sessions. Register today.
> Use priority code J9JMT32. http://p.sf.net/sfu/p
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>

On Tue, Apr 21, 2009 at 5:41 PM, Pal-Kristian Engstad
<pal...@na...> wrote:
> Indeed. Most game-programmers and to some extent game designers, want to
> work on a lot of 3D math. This requires quite a bit of math and using
> infix notation for this is definitely mind-boggling, even for seasoned
> Lisp/Schemers. The second reason is that s-expressions work well for
> expressions, but for sequencing, the syntax is less than stellar.

I do wonder if an infix macro would do the majority of the job, for
instance in places where infix would prove advantageous you could do
something such as:

(define (math-me x y z) (infix x + 2 * (y ^ sqrt(z))))

The macro would be fairly trivial (as far as operator precedence goes)
and would keep such code out of the compiler itself. Additionally it
should also be pretty easy to write code in you're editor (presumably
emacs if we are writing code in a lisp) that could transform prefix
into infix and vice versa.

> third reason is that it is quite possible to have strong macro systems
> (see e.g. Dylan and to a certain extent OCaml) without s-expressions.

I have yet to actually use Dylan, but I have used OCaml and the camlp4
preprocessor, I however found macros being much less intuitive when
the language doesn't model the transformation tree quite so well. I'd
like to know if the problem wasn't so large in Dylan.

> The devil is in the details, I am afraid. As an example (and I'm not
> saying this is impossible with static type checking), in a completely
> dynamic setting it is quite feasible to: introduce new fields in a data
> structure (compile and send to game), edit (static) data that uses the
> data structure (compile and send to game) and finally compile functions
> that reference the data structure and send it to the game (at which
> point the new data is actually used). I know AliceML has a solution to
> this, but from what I've heard, it was quite a challenge.

Yes, this is of course a problem with using more strict and static
run-time data formats. But I have found this problem in Dynamic
Languages as well, often times I've found that the default record
types in some dynamic languages, don't often support introduction of
new fields. This becomes much easier if you can support dynamic
non-contiguous records, and this can still interface very well in a
statically typed language. But It's often the case that in static
languages you want static records. In which case you have no choice
then stopping the world, and upgrading every piece of data, and this
would require a lot of allocations/deallocations, This would lead to
large pauses while upgrading data-types, while that might not be often
enough to matter. When it had to happen, it would certainly put a
stall to the quick incremental cycle  you otherwise have going on.

Nicholas "Indy" Ray

On Tue, 2009-04-21 at 17:23 -0700, Nicholas "Indy" Ray wrote:
> On Tue, Apr 21, 2009 at 4:28 PM, Stephan Rose <ke...@so...> wrote:
> > 2. Remove Node A and delete it. Major crash right there when thread 1
> > now tries do it's CAS.
> 
> Here is where I am confused, shouldn't the CAS be on the head pointer
> during a dequeue so, that if the CAS is successful (assuming that
> dequeue is the only way to get a reference to an item on the list,
> which is how lock-free queues should be anyways) you know that you are
> the only thread that has access to the node?

To get a reference to an item on the list you're correct, you have to
dequeue it and then it solely belongs to the thread that dequeued it and
there are no issues. That isn't the problem.

The problem are the nodes that make up the list and the references to
them. See below for an example.

> 
> > Both problems can be solved by ensuring that Node A is not recycled nor
> > deleted until all threads release their reference to it. All lock-free
> > implementations so far I've found have been in managed languages such as
> > C# or Java for that reason so far as the GC ensures exactly that.
> 
> Again, not sure why multiple threads have reference to nodes still on the queue.

Well first off, the node declaration itself is simple:

template <typename T>
struct Node
{
  T data;
  Node* next;
}

Simple enough, nothing fancy.

Now the stack is initialized with a dummy head node that contains no
data and initially next = 0.

Now here is what happens, I'll prefix each line with the thread that's
executing it such as T1, T2, etc. Code is simplified for clarity.

T1 Wants to pop a node off the stack.

T1: Node* node;
T1: do {
T1:   node = head->next; 

Next, T1 gets preempted and it's T2's turn, it also wants to pop a node
off the stack.

T2: Node* node; 
T2: do {
T2:   node = head->next; 

Both T1 and T2 are now referencing the same node and are both trying to
pop the same node off the stack.

T2: } while(!cas(head->next, node, node->next); 

T2 is not preempted and succeeds with its CAS and now proceeds to delete
the node.

T2: delete node; 

T2 is now done, control returns to T1 which is still pointing at the
very node that was just deleted.

T1: } while(!cas(head->next, node, node->next);
T1: CRASH!

And it doesn't matter if this is running on 1 core, or multiple cores,
the above will happen when multiple threads try to pop a node from the
same stack if access to the current node is not guarded to prevent too
early deletion.

Stephan

Nicholas "Indy" Ray wrote:
> On Tue, Apr 21, 2009 at 12:43 PM, Pal-Kristian Engstad
> <pal...@na...> wrote:
>   
>> Goal was a great system - one that we still greatly miss. If we had to make
>> a Goal2, then we'd probably:
>>
>> Use an SML-ish or Scala-ish surface syntax, while trying to retain the power
>> of Lisp macros.
>>     
>
> Is there any reason you would choose against S-Expressions? I don't
> know about Scala, but I find SML syntax to be a little less
> maintainable for large systems and a lot less usable for Macros; Is
> this mostly a choice of preference by the team, or perhaps you think
> it'd be easier to get new employees to learn?
>   
Indeed. Most game-programmers and to some extent game designers, want to 
work on a lot of 3D math. This requires quite a bit of math and using 
infix notation for this is definitely mind-boggling, even for seasoned 
Lisp/Schemers. The second reason is that s-expressions work well for 
expressions, but for sequencing, the syntax is less than stellar. The 
third reason is that it is quite possible to have strong macro systems 
(see e.g. Dylan and to a certain extent OCaml) without s-expressions. 
And finally, as you mentioned, it does take new employees a rather long 
time to get used to it.
>> Introduce stronger typing features, which is difficult, given the need for
>> REPL and hot updates.
>> Do more in terms of high-level optimization, though LLVM might negate the
>> need for some of that.
>>     
>
> LLVM is rather quite nice, and while it'll take some infrastructure,
> and certainly a resident compiler instance, I don't suspect that hot
> updates would be too much of a problem with a better typed (likely
> type inferred) programming language.
>   
The devil is in the details, I am afraid. As an example (and I'm not 
saying this is impossible with static type checking), in a completely 
dynamic setting it is quite feasible to: introduce new fields in a data 
structure (compile and send to game), edit (static) data that uses the 
data structure (compile and send to game) and finally compile functions 
that reference the data structure and send it to the game (at which 
point the new data is actually used). I know AliceML has a solution to 
this, but from what I've heard, it was quite a challenge.

PKE.

-- 
Pål-Kristian Engstad (en...@na...), 
Lead Graphics & Engine Programmer,
Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North,
Santa Monica, CA 90404, USA. Ph.: (310) 633-9112.

"Emacs would be a far better OS if it was shipped with 
 a halfway-decent text editor." -- Slashdot, Dec 13. 2005.

On Tue, Apr 21, 2009 at 4:28 PM, Stephan Rose <ke...@so...> wrote:
> 2. Remove Node A and delete it. Major crash right there when thread 1
> now tries do it's CAS.

Here is where I am confused, shouldn't the CAS be on the head pointer
during a dequeue so, that if the CAS is successful (assuming that
dequeue is the only way to get a reference to an item on the list,
which is how lock-free queues should be anyways) you know that you are
the only thread that has access to the node?

> Both problems can be solved by ensuring that Node A is not recycled nor
> deleted until all threads release their reference to it. All lock-free
> implementations so far I've found have been in managed languages such as
> C# or Java for that reason so far as the GC ensures exactly that.

Again, not sure why multiple threads have reference to nodes still on the queue.

Nicholas "Indy" Ray

On Tue, 2009-04-21 at 15:33 -0700, Nicholas "Indy" Ray wrote:
> I'm not familiar with Hazard lists, but from reading the pseudo code,
> it seems that you get additional functionality at the cost or
> performance, Unless I am missing something (which is likely, googling
> "hazard lists" doesn't seem to be very helpful.) It seems that it'd
> would be slower by principal. and only losing about 25% performance
> over a spin lock doesn't seem to bad.

Not really gaining any additional functionality, the whole point of
going lock-free is performance gain over spinlocks.

However, there are 2 key problems with lock-free that really have their
root in the same place (apologies if I'm simply repeating stuff you
already know):

In between thread 1 obtaining the pointer to what I'll call Node A and
then performing a CAS to change it, other threads could do the
following:

1. Remove Node A and put it right back by recycling it. ABA problem.
This will cause the first thread's CAS to succeed but with the wrong
data causing data corruption.

2. Remove Node A and delete it. Major crash right there when thread 1
now tries do it's CAS.

Both problems can be solved by ensuring that Node A is not recycled nor
deleted until all threads release their reference to it. All lock-free
implementations so far I've found have been in managed languages such as
C# or Java for that reason so far as the GC ensures exactly that.

In C++ however without a GC, this becomes a bit more problematic. Using
a free-list to recycle nodes will solve problem 2, but it does not solve
ABA, it actually begs for it and rolls out the red carpet and even pays
for the limo.

That can be countered a bit by using tag bits in the pointer, a common
solution to the issue. When working with 32-bit this is easy, 32-bit
pointer, 32-bit tag field, 64-bit CAS. Everything is happy. Each time
the node is recycled, the tag field is increased ensuring that any still
referencing threads will fail on their CAS.

This however becomes problematic when working with 64-bit code and
64-bit pointers! No more spare 32-bits to use for tags. Can use the
lower bits, but that really I'm not comfortable with as without enough
bits, the tag will roll over around and cause ABA anyway.

Which last but not least, brings me to what I'm using right now, Hazard
Pointers. 

Basically, the principle is this:

Each thread contains its own list of 'hazards'. A hazard basically is a
pointer that one or multiple threads are currently referencing. So in my
stack code for instance, when I access to top node, here's what happens:

1. Obtain pointer to node
2. Put the pointer into my hazard list
3. Do whatever I want with the node
4. When done, remove the pointer from my hazard list

In principle, this works identically to a spinlock or mutex when you
look at the code, except that every single thread has it's privately
owned list. So each thread has its own private 'lock' so to speak,
allowing multiple threads access to the same resource without blocking
each other.

However, no thread is allowed to delete or recycle a node as long as any
other threads still have this pointer in their hazard list. That's what
the pseudo code in the previous e-mail does. Solves ABA and the deletion
problem.

However, all that work is relatively pointless if it doesn't perform any
quicker than a simple spinlock.

I was just thinking, there may be one thing I can try to speed things up
a bit. Currently, the hazard list is implemented as follows:

One singly linked list that contains both pointers that are actively
locked and to be deleted. Each record in the list has a status field
that is either 'Active' (currently referenced), 'Delete' (to be deleted
whenever possible) or 'Dead' (either reference has been released or it
has been deleted). Dead records can then be recycled to avoid needing to
grow the list which works quite well. ~15-20 hazard records in varying
states per thread pumping 2^24 values on/off a stack.

This also simplifies the Acquire / Release / Free code as I don't have
to move data from one list to another. I simply just need to find my
record, change the Status and be done.

However, it may end up paying off to maintain two linked lists and
dedicate one just to objects to be deleted. That way when I do my scans
across other threads' lists, I avoid scanning records that the other
thread wants to have deleted.

Come to think of it, maybe even three doubly linked lists that allow me
to move records around between lists and instead of having a status
field in the record, the list that the record is in determines its
status. With a doubly linked list I could do that fairly easily. 

I'll have to play with that and see what happens...

Stephan

> 
> On Tue, Apr 21, 2009 at 1:52 PM, Stephan Rose <ke...@so...> wrote:
> > Currently working on lock-free stack and queue implementations with hazard lists.
> >
> > >From a technical standpoint, they are working nicely, currently doing my testing with the stack as it's just the simpler one implementation-wise so less room for bugs to sneak in.
> >
> > However, performance wise, things aren't so happy.
> >
> > Ran a few tests:
> >
> > Common to all tests:
> > Values pushed/popped: 2^24 per push thread
> > Thread count, 2 pushing 2 popping
> > Processor: Quad core so each thread is running on its own core
> >
> > Stack with spinlock: ~33 seconds
> >
> > Stack with hazard-pointer, properly deleting memory: ~45 seconds
> >
> > Stack with hazard-pointer, leaking memory by just simply marking the pointer as no longer referenced: ~17 seconds
> >
> > The 3rd test does indicate a significant gain over the spinlock, nearly cutting the time in half.
> >
> > However, something about scanning the other threads for pointers that are being referenced is just taking performance and is completely dragging it down. In pseudo count this looks roughly something like this:
> >
> > In pseudocode below:
> >
> > if deleteCount < threshold return; // Tried various values here, values up to around 128 seem to slightly improve performance, while larger values like 1024 totally annihilate performance.
> >
> > foreach node
> >  if node->status = Delete
> >    if !scan(node->pointer) // scan returns false if no other threads hold a ref
> >      delete the node and mark node as free
> >    endif
> >  endif
> > endforeach
> >
> > then the scan function:
> >
> > foreach thread
> >  if thread != thisthread and thread->isrunning
> >    if thread->hasreference(pointer)
> >      return true;
> >    endif
> >  endif
> > endforeach
> >
> > The hasrefeference function then has to cycle through every single entry of it's threads hazard list and see if it has an active reference to that pointer.
> >
> > It works. Slowly. But 1 loop that for each iteration calls another loop which for each iteration calls another loop (try to say that 10 times really fast...) is bound to be like trying to tow a garbage truck with a honda civic, especially as the number of items in the lists goes up.
> >
> > Currently this these lists are all singly linked lists, I could try using something like a hashtable and seeing if I do better than that. But all the information I've found online about hazard pointers I think also use just singly linked lists so I'm not quite sure where my issue is.
> >
> > Any ideas?
> >
> > Thanks,
> >
> > Stephan
> >
> > ------------------------------------------------------------------------------
> > Stay on top of everything new and different, both inside and
> > around Java (TM) technology - register by April 22, and save
> > $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> > 300 plus technical and hands-on sessions. Register today.
> > Use priority code J9JMT32. http://p.sf.net/sfu/p
> > _______________________________________________
> > GDAlgorithms-list mailing list
> > GDA...@li...
> > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> > Archives:
> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
> >
> 
> ------------------------------------------------------------------------------
> Stay on top of everything new and different, both inside and 
> around Java (TM) technology - register by April 22, and save
> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> 300 plus technical and hands-on sessions. Register today. 
> Use priority code J9JMT32. http://p.sf.net/sfu/p
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list

I'm not familiar with Hazard lists, but from reading the pseudo code,
it seems that you get additional functionality at the cost or
performance, Unless I am missing something (which is likely, googling
"hazard lists" doesn't seem to be very helpful.) It seems that it'd
would be slower by principal. and only losing about 25% performance
over a spin lock doesn't seem to bad.

Nicholas "Indy" Ray

On Tue, Apr 21, 2009 at 1:52 PM, Stephan Rose <ke...@so...> wrote:
> Currently working on lock-free stack and queue implementations with hazard lists.
>
> >From a technical standpoint, they are working nicely, currently doing my testing with the stack as it's just the simpler one implementation-wise so less room for bugs to sneak in.
>
> However, performance wise, things aren't so happy.
>
> Ran a few tests:
>
> Common to all tests:
> Values pushed/popped: 2^24 per push thread
> Thread count, 2 pushing 2 popping
> Processor: Quad core so each thread is running on its own core
>
> Stack with spinlock: ~33 seconds
>
> Stack with hazard-pointer, properly deleting memory: ~45 seconds
>
> Stack with hazard-pointer, leaking memory by just simply marking the pointer as no longer referenced: ~17 seconds
>
> The 3rd test does indicate a significant gain over the spinlock, nearly cutting the time in half.
>
> However, something about scanning the other threads for pointers that are being referenced is just taking performance and is completely dragging it down. In pseudo count this looks roughly something like this:
>
> In pseudocode below:
>
> if deleteCount < threshold return; // Tried various values here, values up to around 128 seem to slightly improve performance, while larger values like 1024 totally annihilate performance.
>
> foreach node
>  if node->status = Delete
>    if !scan(node->pointer) // scan returns false if no other threads hold a ref
>      delete the node and mark node as free
>    endif
>  endif
> endforeach
>
> then the scan function:
>
> foreach thread
>  if thread != thisthread and thread->isrunning
>    if thread->hasreference(pointer)
>      return true;
>    endif
>  endif
> endforeach
>
> The hasrefeference function then has to cycle through every single entry of it's threads hazard list and see if it has an active reference to that pointer.
>
> It works. Slowly. But 1 loop that for each iteration calls another loop which for each iteration calls another loop (try to say that 10 times really fast...) is bound to be like trying to tow a garbage truck with a honda civic, especially as the number of items in the lists goes up.
>
> Currently this these lists are all singly linked lists, I could try using something like a hashtable and seeing if I do better than that. But all the information I've found online about hazard pointers I think also use just singly linked lists so I'm not quite sure where my issue is.
>
> Any ideas?
>
> Thanks,
>
> Stephan
>
> ------------------------------------------------------------------------------
> Stay on top of everything new and different, both inside and
> around Java (TM) technology - register by April 22, and save
> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> 300 plus technical and hands-on sessions. Register today.
> Use priority code J9JMT32. http://p.sf.net/sfu/p
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
> Archives:
> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>

On Tue, Apr 21, 2009 at 12:43 PM, Pal-Kristian Engstad
<pal...@na...> wrote:
> Goal was a great system - one that we still greatly miss. If we had to make
> a Goal2, then we'd probably:
>
> Use an SML-ish or Scala-ish surface syntax, while trying to retain the power
> of Lisp macros.

Is there any reason you would choose against S-Expressions? I don't
know about Scala, but I find SML syntax to be a little less
maintainable for large systems and a lot less usable for Macros; Is
this mostly a choice of preference by the team, or perhaps you think
it'd be easier to get new employees to learn?

> Introduce stronger typing features, which is difficult, given the need for
> REPL and hot updates.
> Do more in terms of high-level optimization, though LLVM might negate the
> need for some of that.

LLVM is rather quite nice, and while it'll take some infrastructure,
and certainly a resident compiler instance, I don't suspect that hot
updates would be too much of a problem with a better typed (likely
type inferred) programming language.

Indy

Currently working on lock-free stack and queue implementations with hazard lists.

>From a technical standpoint, they are working nicely, currently doing my testing with the stack as it's just the simpler one implementation-wise so less room for bugs to sneak in.

However, performance wise, things aren't so happy.

Ran a few tests:

Common to all tests:
Values pushed/popped: 2^24 per push thread
Thread count, 2 pushing 2 popping
Processor: Quad core so each thread is running on its own core

Stack with spinlock: ~33 seconds

Stack with hazard-pointer, properly deleting memory: ~45 seconds

Stack with hazard-pointer, leaking memory by just simply marking the pointer as no longer referenced: ~17 seconds

The 3rd test does indicate a significant gain over the spinlock, nearly cutting the time in half. 

However, something about scanning the other threads for pointers that are being referenced is just taking performance and is completely dragging it down. In pseudo count this looks roughly something like this:

In pseudocode below:

if deleteCount < threshold return; // Tried various values here, values up to around 128 seem to slightly improve performance, while larger values like 1024 totally annihilate performance.

foreach node
  if node->status = Delete
    if !scan(node->pointer) // scan returns false if no other threads hold a ref
      delete the node and mark node as free
    endif
  endif
endforeach

then the scan function:

foreach thread
  if thread != thisthread and thread->isrunning
    if thread->hasreference(pointer)
      return true;
    endif
  endif
endforeach

The hasrefeference function then has to cycle through every single entry of it's threads hazard list and see if it has an active reference to that pointer.

It works. Slowly. But 1 loop that for each iteration calls another loop which for each iteration calls another loop (try to say that 10 times really fast...) is bound to be like trying to tow a garbage truck with a honda civic, especially as the number of items in the lists goes up.

Currently this these lists are all singly linked lists, I could try using something like a hashtable and seeing if I do better than that. But all the information I've found online about hazard pointers I think also use just singly linked lists so I'm not quite sure where my issue is.

Any ideas?

Thanks,

Stephan

Gregory Junker wrote:
> NaughtyDog tried -- you have to admit that Lisp *in theory* should be
> perfectly suited to game development: it's core design intent is list
> processing (indeed, it's the name), which arguably games *are*, and it's
> functional, which ought to be perfect for increasingly data-parallel game
> technology designs. If the syntax is just "too weird" -- that can be
> changed, while leaving the core language design intact (again, GOAL). 
>
> http://www.gamasutra.com/features/20020710/white_02.htm
>   
I'll have to correct you in some of your assumptions. Goal:

    * Was imperative and not (very) functional. For instance, it didn't
      support closures.
    * Had okey (but not great) object-oriented support.
    * Had good introspection qualities (except where you didn't need to
      have it).
    * Had an excellent macro processor, corresponding to Lisp.
    * Had excellent support for fibers.
    * Made use of the REPL style of programming, including on-line
      updates of code and data.
    * Was an excellent assembler, since you had register scope as well
      as access to macros and direct access to "high-level" data types.
    * Was quite a poor code generator (but it didn't matter because of
      the above).
    * Had no visual debugger, but the REPL and on-line updates usually
      negated the need for one.

Goal was a great system - one that we still greatly miss. If we had to 
make a Goal2, then we'd probably:

    * Use an SML-ish or Scala-ish surface syntax, while trying to retain
      the power of Lisp macros.
    * Introduce stronger typing features, which is difficult, given the
      need for REPL and hot updates.
    * Do more in terms of high-level optimization, though LLVM might
      negate the need for some of that.
    * Go towards a more functional style, especially due to current
      generation hardware's need for concurrency.

Before anyone jumps up and down, no - we're probably never going to get 
around doing this. We simply don't have the time to spend too much 
effort on such an en devour.

Thanks,

PKE.

-- 
Pål-Kristian Engstad (en...@na...), 
Lead Graphics & Engine Programmer,
Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North,
Santa Monica, CA 90404, USA. Ph.: (310) 633-9112.

"Emacs would be a far better OS if it was shipped with 
 a halfway-decent text editor." -- Slashdot, Dec 13. 2005.

Adrian wrote at 4/20/2009:
>> IME for serialization compatibility in 99% cases you need member
>> removing/adding/retyping members, and changing base classes.
> Not sure if this was clear, but we don't need the multiple field shenanigans
> in the simple cases either.  Removing a field without preserving the data is
> just a matter of deleting it.
Yes, I understood it that way. The comment was aimed at
"[Prop::SaveLoad()]", but I covered that better later, so it has
become redundant.

> forgetting to NOT serialize a field is a silent error that causes no harm
> other than inefficiency and consequently will never be caught.
I wouldn't consider it a potential to cause any significant
inefficiency. In our codebase the temporary (non-serialized) fields
are <1% in declarations and even way less in byte count. Btw, there is
a large amount of classes that carry reflection info just for
convenience of using it for dynamic type handling in the UI (windows,
dialogs, etc.), but they are not serialized anyway, so we don't care
what's the serialization flag on their members. IME, the largest
percentage of file size (across all files in a project) is taken by
POD arrays like textures, vertex arrays, etc. YMMV.

Alen

Will Vale wrote:
> On Tue, 21 Apr 2009 04:55:20 +1200, Jon Watte <jw...@gm...> wrote:
>   
>> You can actually use some template trickery to hoist the RTTI() /
>> MEMBER() parts into the struct itself.
>>     
>
> I think that's pretty neat, but it only moves, rather than removes the  
> duplication, although I can see that it's easier to verify in one place  
> like that. (I also needed to add an implementation to the cpp file to  
> provide storage for foo::_theInfo to get it to link under VC9.)
>
> It also generates code-to-generate-data rather than data, which was  
> something I was keen to avoid as much as possible. That said, if you  
>   

Sorry, the _theInfo can be wrapped as a static singleton in a static 
inline function, and then it will link with VC as well.

I don't actually generate (much) code to generate data, although I 
suppose the constructors do count. It all gets run at static init AFAICT.

The edit for VC++:

  static _info<T> _theInfo;
  static inline _info<T>  &info() {
    return _theInfo;
  }

turns into:

  static inline _info<T>  &info() {
    static _info<T> _theInfo;
    return _theInfo;
  }

Toggling two lines can make all the difference in the world :-)

(And I suppose I could even use the instance() template that automates 
this pattern)

Sincerely,

jw

On Tue, 21 Apr 2009 04:55:20 +1200, Jon Watte <jw...@gm...> wrote:
> You can actually use some template trickery to hoist the RTTI() /
> MEMBER() parts into the struct itself.

I think that's pretty neat, but it only moves, rather than removes the  
duplication, although I can see that it's easier to verify in one place  
like that. (I also needed to add an implementation to the cpp file to  
provide storage for foo::_theInfo to get it to link under VC9.)

It also generates code-to-generate-data rather than data, which was  
something I was keen to avoid as much as possible. That said, if you  
remove the virtual base then the compiler generates pretty good code. That  
would allow you to separate the "find what type" code (one virtual call)  
 from the "interrogate type data" (no virtual calls) code.

Cheers,

Will

> From: Alen Ladavac [mailto:ale...@cr...]
>   _(6) FLOAT pwp_fMinimumWoundEffectStrength; _("Minimum wound effect
> strength" min="0.0f" max="1.0f")

Thanks!  Very interesting.  Definitely just a different aesthetic taste
here.

> IME for serialization compatibility in 99% cases you need member
> removing/adding/retyping members, and changing base classes.

Not sure if this was clear, but we don't need the multiple field shenanigans
in the simple cases either.  Removing a field without preserving the data is
just a matter of deleting it.

> This reminds me of one thing I was meaning to comment earlier... I
> prefer opposite logic to this. If I don't specify anything, I'd
> assume save+load would be the default as that's most often used. But
> again, some like it terse, some don't. :)

I tend to agree with you, but some of my coworkers made a compelling
argument in the other direction.  They argued that forgetting to serialize a
field was an obvious mistake that would be caught immediately, whereas
forgetting to NOT serialize a field is a silent error that causes no harm
other than inefficiency and consequently will never be caught.

Adrian

Monday, April 20, 2009, 10:13:12 PM, Adrian wrote:
> Would you mind sharing a field definition that contains that level of
> annotation?  You can see how cluttered our definitions get in my article,
> but I'm having a hard time envisioning it in your syntax.

Here are some examples (from different classes):

  _(1) Ptr<CFlareConfiguration> ep_pfcConfig; _("Flare")  flags(SMF_NONE|SMF_BROWSE|SMF_NEW)
 ....
  _(3) CEditMesh         *msh_pemEdit;       flags(SMF_OWNED|SMF_EDIT|SMF_CLIPBOARDBARRIER)
 ....
  _(6) FLOAT pwp_fMinimumWoundEffectStrength; _("Minimum wound effect strength" min="0.0f" max="1.0f")
...
 _(211) net QuatVect en_qvNetPlacement; context(CTX_ABS_PLACEMENT)
 _(212) net Vector3f en_vNetVelocity; context(CTX_VELOCITY)
...
  meta unreliable compacted client virtual void SetDriveSteerRatioAndLookDir(Vector3f vDriveSteer, Vector3f vLookDirEul);
...
  meta saved cvar ENGINE_API extern FLOAT efx_fFixedEV; _(renoption perdocument version="2")
...
  meta cvar ENGINE_API extern BOOL ren_bDynamicShadows; _("D&ynamic Shadows" renoption)

Note that most members need only some part of the specs, because the
assumed defaults are set to the common denominator. Also, as you can
see, there are different types of tags ranging from keywords, through
preset flags, to completely flexible hint strings.

> The situation I'm trying to describe is this: someone writes a class that
> holds an angle in degrees.  Later they decide it would be more prudent to
> store the value in radians.  They aren't just removing a field, they're
> converting it.

That's an interesting approach. We have very rarely observed this in
practice, so when it happens, we just keep the old field, hide it from
user, and convert in PostRead() meta-function. This means some extra
memory and disk space until we can trim, as you noted below. We could
save disk space by tagging it as load-only-no-saving. But I do like
the idea that this could be removed from memory immediately.

IME for serialization compatibility in 99% cases you need member
removing/adding/retyping members, and changing base classes. So I
stick to the adage: "Simple things should be simple, complex things
should be possible", and in that spirit don't want to write long
declarations for all members just because 1% will need some special
handling. YMMV.

> .field("angle", &MyClass:m_angle)
> [
>         Prop::SaveLoad()
> ]

This reminds me of one thing I was meaning to comment earlier... I
prefer opposite logic to this. If I don't specify anything, I'd
assume save+load would be the default as that's most often used. But
again, some like it terse, some don't. :)

Alen

> From: Alen Ladavac [mailto:ale...@cr...]
> Similar stuff here. UI-specific info, network-specific info, hints on
> how to behave on clipboard operations, per-class non-member info, any
> additional custom info etc.

Would you mind sharing a field definition that contains that level of
annotation?  You can see how cluttered our definitions get in my article,
but I'm having a hard time envisioning it in your syntax.

> Isn't that unnecessary? With proper format version tracking, you can
> delete and add members to a class without keeping the old members as
> clutter (in 99% most cases).

The situation I'm trying to describe is this: someone writes a class that
holds an angle in degrees.  Later they decide it would be more prudent to
store the value in radians.  They aren't just removing a field, they're
converting it.  In our system the original reflection definition would look
like this:

.field("angle", &MyClass:m_angle)
[
	Prop::SaveLoad()
]

And the new definition would look like this:

.field("angle", &MyClass::GetAngleDegrees, &MyClass::SetAngleDegrees)
[
	Prop::Load()
]
.field("angleRadians", &MyClass::m_angle)
[
	Prop::SaveLoad()
]

Other than the unfortunate code bloat of adding the getters and setter for
degrees, the reflection definition automatically converts old data and loads
new data with zero cost.  It also preserves compatibility with scripts.
When we can afford to stall production and batch process all the data for
all the projects we have going, we can remove the "angle" reflection field
from the code.

Adrian

Monday, April 20, 2009, 8:45:23 PM, Adrian wrote:
> To give you an idea, we have over 60,000 lines of reflection definitions in
> our codebase and most of it is stuff I don't want to see when I'm reading
> the header for a class.

Then we have to agree to disagree. I want to know everything about a
member when I see its declaration. You obviously don't. Well, I
wouldn't want to either, if it was that verbose. It really would
clutter the header. But I argue that we can pull all the meta info
needed in a very light-to-read way. I don't think it is the info
itself that creates the verboseness. It is the requirement to have the
problem solved by the compiler. The trick is that this is orthogonal
to portability, and you don't need the compiler at that stage.

> A lot of it has to do with user interface design and telling the
> cinema system which properties can be animated in what ways.

Similar stuff here. UI-specific info, network-specific info, hints on
how to behave on clipboard operations, per-class non-member info, any
additional custom info etc. We also use the system to define
meta-functions that can be called on a generic object, bindings for
scripting, bindings to global variables (cvars), global functions (for
scripting), etc.

And it is all very terse. Because we feel we need terseness to be
quick and more error-proof (re once-and-only-once) when writing new
code. I understand you don't care about that as much.

> There is also stuff that doesn't tie directly to an actual member of
> a class, like a fake "field" that reads data in an old format and
> writes nothing--performing automatic version upgrade during
> serialization.

Isn't that unnecessary? With proper format version tracking, you can
delete and add members to a class without keeping the old members as
clutter (in 99% most cases).

> It is perfectly valid to let a variable name in C++ diverge from the
> string name in the reflection field.

We support that as an option, but it gets a bit awkward, and prone to
search-replace errors. I prefer the number + name + "editor name"
better.

JM2C,
Alen

The desire to keep the reflection definitions brief and part of the actual
class definition is actually contrary to our design goals for the system.

We anticipated very richly annotated reflection definitions and we didn't
want that information obscuring the class definitions.  This isn't a system
for publishing the memory layout of POD types to make writing objects to
disk and making endian conversion easier; it is framework for publishing
arbitrary metadata for C++ constructs.

To give you an idea, we have over 60,000 lines of reflection definitions in
our codebase and most of it is stuff I don't want to see when I'm reading
the header for a class.  A lot of it has to do with user interface design
and telling the cinema system which properties can be animated in what ways.
There is also stuff that doesn't tie directly to an actual member of a
class, like a fake "field" that reads data in an old format and writes
nothing--performing automatic version upgrade during serialization.

The redundant part of the system that I have the most mixed feelings about
is the manual specification of the name of a field.  Macros could have
automated that in most cases.  Having the name be explicit, however, has
proven useful on numerous occasions.  Our engineers like the flexibility to
refactor the code without having to worry about breaking serialization or
scripting.  It is perfectly valid to let a variable name in C++ diverge from
the string name in the reflection field.  It is also valid, and incredibly
useful, to remove a field entirely from a class and replace its
corresponding reflection entry with a getter/setter that emulates the old
field's behavior.

Adrian

-----Original Message-----
From: Alen Ladavac [mailto:ale...@cr...] 
Sent: Monday, April 20, 2009 6:16 AM
To: Adrian Stone
Cc: 'Game Development Algorithms'
Subject: Re: [Algorithms] Complexity of new hardware

Adrian wrote at 4/19/2009:
> http://gameangst.com/?p=107

The syntax approach is really interesting, but I find the overall
format too verbose. Still too much redundant data for my taste.

We use something much more terse:

meta ver(1) class ENGINE_API CDecalConfig : public CResource {
public:
  CDecalConfig(void);
  ~CDecalConfig(void);
  // ... functions ...

  MD_DECLARE;
public:
  _( 1) Vector2f dc_vSize; _("Size (m)")
  _( 2) IDENT dc_idTextureUVMap; _("Texture UV map")
  _( 3) IDENT dc_idShadowUVMap; _("Shadow UV map")
  _( 4) Ptr<CShaderPreset> dc_pspShader; _("Material")
flags(SMF_NEW|SMF_DELETE|SMF_BROWSE)
};

This of course, requires a custom parser, but I think it is very much
worth the extra typing later on. I find all the approaches that force
you to repeat yourself to be dangerous. You can easily forget to list
one of the variables. With the preprocessor approach, in contrast, you
immediately see everything about a member when you look at its
declaration.

Also, using per-class C++ code that creates meta objects at boot time,
as you mention in the article, is really heavy on the final executable
size, and linker performance. We used that previously, but trying to
ship with that for Xbox1 forced us to rewrite it to use static data
per class, which is then read by the reflection code at boot time to
generate the needed objects. That approach makes handling templates
more complicated, though.

Alen

> this provides the optimum implementation in the sense that it relies 
> only on static init (no dynamic memory allocations) and it doesn't 
> require member definition to be separate from declaration (it's all in 
> the .h file).

Agreed -- ours works the same way. Class and property reflection info is
stored as a static linked list, and the reflection info is declared in the
header (of course, in order for the static registration magic to work, you
do need a single HYP_CLASS_IMPL() macro in the .cpp file).
...

> It's still repeating yourself, though.

Our system has all of the benefits described, with no repetitions -- the
declaration of the property (member) reflection info *is* the declaration of
the struct/class field.

Greg

Will Vale wrote:
> On Sun, 19 Apr 2009 03:03:27 +1200, Jarkko Lempiainen <al...@gm...>  
> wrote:
>   
>> All I have is something like:
>> struct foo
>> { PFC_MONO(foo) {PFC_VAR3(x, y, z);}
>>   int x, y, z;
>> };
>>     
>
> That's admirably brief! I assume you're generating code rather than data  
>   

> // foo.h
> struct foo
> {
> COMPOUND(foo)
>
> int x, y, z;
> };
>   

> // foo.cpp - essentially generates const member_t foo::members[] = { ... };
> RTTI(foo, MEMBER(x) MEMBER(y) MEMBER(z))
>   

This is exactly what I want to avoid -- the members are listed twice, in 
two different files! Version mismatch hell. (The Despair engine write-up 
has this same problem).

You can actually use some template trickery to hoist the RTTI() / 
MEMBER() parts into the struct itself.

This allows you to write something like:

  struct foo {
       int x, y, z;
       RTTI(MEMBER(x) MEMBER(y) MEMBER(z))
  };

That's all there's to it. No additional code, cpp files, or anything 
like that needed (except for whatever runtime support you'll want, like 
stream handling, custom type marshaling, etc).

members() gives you a list of the members; info() gives you information 
about the struct itself. I put an implementation (including a program 
you can compile and run) up for discussion at 
http://www.enchantedage.com/cpp-reflection if you want to take a look. 
Here's a main() program that prints the information about the type "foo":

int main() {
  printf("type: %s\n", foo::info().name());
  printf("size: %ld\n", foo::info().size());
  for (size_t i = 0; i != foo::info().memberCount(); ++i) {
    printf("  %s: offset %ld  size %ld  type %s\n",
      foo::info().members()[i].name,
      foo::info().members()[i].offset,
      foo::info().members()[i].type->size(),
      foo::info().members()[i].type->name());
  }
  return 0;
}

And here's the output:

type: foo
size: 12
  x: offset 0  size 4  type i
  y: offset 4  size 4  type i
  z: offset 8  size 4  type i

You probably want to wrap those ::info().members()[i] accessors in some 
nice readable macro, like TYPE_NTH_MEMBER() or suchlike, but I think 
this provides the optimum implementation in the sense that it relies 
only on static init (no dynamic memory allocations) and it doesn't 
require member definition to be separate from declaration (it's all in 
the .h file).
If you want editor information, you can easily push that into the 
MEMBER() macro.

It's still repeating yourself, though.

Sincerely,

jw

Adrian wrote at 4/19/2009:
> http://gameangst.com/?p=107

The syntax approach is really interesting, but I find the overall
format too verbose. Still too much redundant data for my taste.

We use something much more terse:

meta ver(1) class ENGINE_API CDecalConfig : public CResource {
public:
  CDecalConfig(void);
  ~CDecalConfig(void);
  // ... functions ...

  MD_DECLARE;
public:
  _( 1) Vector2f dc_vSize; _("Size (m)")
  _( 2) IDENT dc_idTextureUVMap; _("Texture UV map")
  _( 3) IDENT dc_idShadowUVMap; _("Shadow UV map")
  _( 4) Ptr<CShaderPreset> dc_pspShader; _("Material") flags(SMF_NEW|SMF_DELETE|SMF_BROWSE)
};

This of course, requires a custom parser, but I think it is very much
worth the extra typing later on. I find all the approaches that force
you to repeat yourself to be dangerous. You can easily forget to list
one of the variables. With the preprocessor approach, in contrast, you
immediately see everything about a member when you look at its
declaration.

Also, using per-class C++ code that creates meta objects at boot time,
as you mention in the article, is really heavy on the final executable
size, and linker performance. We used that previously, but trying to
ship with that for Xbox1 forced us to rewrite it to use static data
per class, which is then read by the reflection code at boot time to
generate the needed objects. That approach makes handling templates
more complicated, though.

Alen

IMO, a fundamental problem with this is that it's difficult to guarantee
that all the data is converted to the new format. If there happens to be
some older data out of the reach of the data conversion and you remove the
conversion code, the data gets corrupted. Overall, I just think it's an
unnecessary maintenance burden for developers to take care of something as
mundane as the data conversion.

For the final build you obviously have all the data available, thus you can
update the data to the most recent version and strip off the type
information from the data. You probably want to process the data anyway to
linearize it for DVD streaming, swap the endianess, etc.

Cheers, Jarkko

-----Original Message-----
From: Adrian Bentley [mailto:ad...@gm...] 
Sent: Monday, April 20, 2009 3:08 AM
To: Game Development Algorithms
Subject: Re: [Algorithms] Complexity of new hardware

If you can break off a script by diffing two sets of type metadata is
that hard coded?  It would be unnecessary maintenance work, unless you
want to be able to change your data and can't do it within your
framework :P.  Maybe there's a way I haven't thought of.  Every time I
look at C# serialization, I've come up with a number of things I
wouldn't be able to do very easily (rename B, break A into 2 pieces,
etc.).

I certainly hadn't thought about it from a middleware perspective, but
all middleware needs a data upgrade path between versions.  Otherwise
you get stuck in time and become another point in favor of rolling
one's own tech.

Unless I'm missing something, you can't strip off all the class/type
information unless all your data is in the final format.  At that
point, you've solved your upgrade problem.  Linearly allocating
_definitely_ helps.  For some, though, even running simple
serialization code may not be fast enough.

Cheers,
Adrian

2000	Jan	Feb	Mar	Apr	May	Jun	Jul (390)	Aug (767)	Sep (940)	Oct (964)	Nov (819)	Dec (762)
2001	Jan (680)	Feb (1075)	Mar (954)	Apr (595)	May (725)	Jun (868)	Jul (678)	Aug (785)	Sep (410)	Oct (395)	Nov (374)	Dec (419)
2002	Jan (699)	Feb (501)	Mar (311)	Apr (334)	May (501)	Jun (507)	Jul (441)	Aug (395)	Sep (540)	Oct (416)	Nov (369)	Dec (373)
2003	Jan (514)	Feb (488)	Mar (396)	Apr (624)	May (590)	Jun (562)	Jul (546)	Aug (463)	Sep (389)	Oct (399)	Nov (333)	Dec (449)
2004	Jan (317)	Feb (395)	Mar (136)	Apr (338)	May (488)	Jun (306)	Jul (266)	Aug (424)	Sep (502)	Oct (170)	Nov (170)	Dec (134)
2005	Jan (249)	Feb (109)	Mar (119)	Apr (282)	May (82)	Jun (113)	Jul (56)	Aug (160)	Sep (89)	Oct (98)	Nov (237)	Dec (297)
2006	Jan (151)	Feb (250)	Mar (222)	Apr (147)	May (266)	Jun (313)	Jul (367)	Aug (135)	Sep (108)	Oct (110)	Nov (220)	Dec (47)
2007	Jan (133)	Feb (144)	Mar (247)	Apr (191)	May (191)	Jun (171)	Jul (160)	Aug (51)	Sep (125)	Oct (115)	Nov (78)	Dec (67)
2008	Jan (165)	Feb (37)	Mar (130)	Apr (111)	May (91)	Jun (142)	Jul (54)	Aug (104)	Sep (89)	Oct (87)	Nov (44)	Dec (54)
2009	Jan (283)	Feb (113)	Mar (154)	Apr (395)	May (62)	Jun (48)	Jul (52)	Aug (54)	Sep (131)	Oct (29)	Nov (32)	Dec (37)
2010	Jan (34)	Feb (36)	Mar (40)	Apr (23)	May (38)	Jun (34)	Jul (36)	Aug (27)	Sep (9)	Oct (18)	Nov (25)	Dec
2011	Jan (1)	Feb (14)	Mar (1)	Apr (5)	May (1)	Jun	Jul	Aug (37)	Sep (6)	Oct (2)	Nov	Dec
2012	Jan	Feb (7)	Mar	Apr (4)	May	Jun (3)	Jul	Aug	Sep (1)	Oct	Nov	Dec (10)
2013	Jan	Feb (1)	Mar (7)	Apr (2)	May	Jun	Jul (9)	Aug	Sep	Oct	Nov	Dec
2014	Jan (14)	Feb	Mar (2)	Apr	May (10)	Jun	Jul	Aug	Sep	Oct	Nov (3)	Dec
2015	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (12)	Nov	Dec (1)
2016	Jan	Feb (1)	Mar (1)	Apr (1)	May	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (2)	Dec