Updating CAI Transformer
fixes pointwise softmax with no forward and skip derivative
adding debug code
better normalization methods
fixes backpropagation with branching
adds plenty of self testing