Fixing DotProducts departing branches
closer implementation of transformer decoder
closer implementation of transformer encoder
fixes embedding learning direction
removing min learning rate
removing extra transpose from transformers
experimental max norm in CAI transformer
In testing we trust - fixing initialization of embedding