experimental max norm in CAI transformer
In testing we trust - fixing initialization of embedding
In testing, we trust: fixes PointwiseSoftMax
protects against overflow with adam
in code review we trust
updating SGD optimizer
embedding now uses uniform initialization