Thread: [Lcms-user] Threading performance in LCMS2

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I'm trying to make my transform go fast. I've got a 1920x1080 RGB
image being transformed from sRGB to the display profile. I've got a
quad core processor on my development box, no shaders or GPU, and I'm
trying to do the transform as quickly as possible.

I figured the fastest way to do this would be to set up a threadpool
with max_threads = 4. Then I have a few choices:

* pop a thread from the pool for every line of the image, creating
local state with p_in, p_out, width and stride
* pop a thread from the pool for every n lines of the image, creating
local state with p_in, p_out, width, stride and rows_to_process (where
n = height / max_threads)

I figured 4 threads should be ~4x faster than using 1 thread (in the
second case we should only have 4 threads, so not much overhead), but
no matter the value of max_threads or 'n' I can only achieve a ~1.9x
speed-up. I've tried with and without cmsFLAGS_NOCACHE. Any pointers
very welcome.

Thanks,

Richard

Thread: [Lcms-user] Threading performance in LCMS2

An ICC-based CMM for color management

lcms-user