...The repo provides training recipes and models for standard datasets, as well as ablations that show how many non-local blocks to insert and at which stages. Efficient implementations keep memory and compute manageable so the blocks can be added without rewriting the entire backbone. The result is a practical, drop-in mechanism for upgrading purely local video models into context-aware networks with strong benchmark performance.