closer implementation of transformer encoder
fixes embedding learning direction
removing min learning rate
removing extra transpose from transformers
experimental max norm in CAI transformer
In testing we trust - fixing initialization of embedding
In testing, we trust: fixes PointwiseSoftMax
protects against overflow with adam