Scalene
High-performance CPU, GPU, and memory profiler for Python
... to sort the columns. Scalene is fast. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less). Scalene performs profiling at the line level and per function, pointing to the functions and the specific lines of code responsible for the execution time in your program. Scalene separates out the percentage of memory consumed by Python code vs. native code.