...This formulation improves action recognition and spatiotemporal reasoning, especially for classes requiring context beyond short temporal windows. The repo provides training recipes and models for standard datasets, as well as ablations that show how many non-local blocks to insert and at which stages. Efficient implementations keep memory and compute manageable so the blocks can be added without rewriting the entire backbone. The result is a practical, drop-in mechanism for upgrading purely local video models into context-aware networks with strong benchmark performance.