Re: [Lcms-user] LittleCMS Performance and Non-Intel Processors
An ICC-based CMM for color management
Brought to you by:
mm2
|
From: Graeme G. <gr...@ar...> - 2017-07-31 23:13:24
|
Noel Carboni wrote: > By the way, as an exercise to reinforce the above, I re-coded the > LittleCMS floating point trilinear interpolation algorithm using SSE2 > intrinsics. It ended up delivering the same performance as the C-coded > version. Why not better? Because the table-based design of the Little > CMS library doesn't suit parallel calculations so there were only limited > things I could do. Simplex interpolation is generally faster since it touches fewer node points - something that increases in importance with higher input dimensions - but simplex isn't terribly parallelizable, since it involves a sort. Once the weighting of each nodes is known using simplex or multi-linear, paralleling the output dimensions calculations is a good speedup though. [ How much of a win vector CPU instructions would be is not something I've ever had time to explore in my color engine, and I've been content to stick to portable C code, while wringing what I can out of it. Exploiting GPU texture lookup hardware seems far simpler to code for, for maximum overall speed. ] Cheers, Graeme Gill. |