From: Richard H. <hug...@gm...> - 2013-05-13 21:01:40
|
Hi all, I'm trying to make my transform go fast. I've got a 1920x1080 RGB image being transformed from sRGB to the display profile. I've got a quad core processor on my development box, no shaders or GPU, and I'm trying to do the transform as quickly as possible. I figured the fastest way to do this would be to set up a threadpool with max_threads = 4. Then I have a few choices: * pop a thread from the pool for every line of the image, creating local state with p_in, p_out, width and stride * pop a thread from the pool for every n lines of the image, creating local state with p_in, p_out, width, stride and rows_to_process (where n = height / max_threads) I figured 4 threads should be ~4x faster than using 1 thread (in the second case we should only have 4 threads, so not much overhead), but no matter the value of max_threads or 'n' I can only achieve a ~1.9x speed-up. I've tried with and without cmsFLAGS_NOCACHE. Any pointers very welcome. Thanks, Richard |