The company I work, uses -lat for binarization as part of processing documents for both document thumbnailing and ocr.
Since the implementation of AdaptiveThreshold at GM is quadratic in function of window size, when calculating the moving mean, it becomes the bottleneck of our process.
So I rewrote the function using a well known Dynamic Programming technique, and achieved good results, with a completely linear process.
There is no openmp optimization at the code, sure can be optimized even more.
The additional cost of memory is small, at order of window height x image width, achieved by using modular arithmetics.
There is a test and workaround for the possible overflow of sum at dynamic buffer when processing insanely big images, which is a side-effect of the technic.
Also fixed a bug of undesired overflow when adding the offset to the mean, which leads to white or dark areas at the image.
That enabled to get a 12x gain in speed, for the windows size we are using, and get much better image quality with bigger window sizes, without additional processing cost.
The patch was tested, but I suggest more tests, I couldn't validate the overflow workaround for lack of memory on my PC
hope the community enjoy.