Hi Steve,
I'm asking my question again because in the previous one I've switched
mistakenly VM_Compiler for VM_Assembler. Here it is:

If I understand you right, inorder to write atomically write-barrier that
will fit concurrent garbage collectors as well as stop-the-world ones (in *JAVA*)
the calls to the VM_WriteBarrier can be issued from VM_Compiler and not
from an auxiliary class as VM_Barriers. If the calls will be issued
directly from VM_Compiler there won't be any context switching in the
middle of an update (as the yield points can appear only at the
prologue/epilogue and there's no back-edge in the write barrier code
Am I right?
Why the non-concurrent calls to VM_WriteBarrier methods via VM_Barriers 
are not atomically performed?
Is it because yield points can be pushed between the two calls?
(i.e., between the calls to: VM_Barriers.compilerArrayStoreBarrier
and VM_WriteBarrier.arrayStoreWriteBarrier).