Delphi compatibility
Grouped Transformer Decoder
1000+ layers deep support
trying to converge 1000+ deep models
speeding up backpropagation on 1000+ deep models
attempt to stabilize 1000+ layers deep