Adding InitGlorotBengioUniformForAllConvLayers
CAI is again using He for convolutional layers
Convolutional layers got the same initialization as Keras
Updating CAI Transformer
fixes pointwise softmax with no forward and skip derivative
adding debug code