BertViz
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
...The head view visualizes attention for one or more attention heads in the same layer. It is based on the excellent Tensor2Tensor visualization tool. The model view shows a bird's-eye view of attention across all layers and heads. The neuron view visualizes individual neurons in the query and key vectors and shows how they are used to compute attention.