Coding transformers
coding multi-head attention
bug fixing transformer block
coding AddSingleHeadTransformerBlock
coding single head self-attention