Thread: [Lcms-user] Support of PreMultiplied RGBA image format...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

Back again :-) This WE, I implemented some basic premultiplied alpha
support.

I agree with Bob and John that Premultiplied alpha doesn't "make a lot
of sense with a color management system since color perception varies
with intensity and the colorspace may not be linear across the color
channels".

But such color management makes sense for images with binary
transparencies (full opacity & full transparency) where only few pixels
could be of intermediary transparency (usually the anti-aliased edges).

LittleCMS 2.4 was not able to manage it correctly and the only way to
manage such images was to:
- turn them into unpremultiplied alpha
- run cmsDoTransform (unoptimized way because even fully transparent
pixels are processed)
- turn back to premultiplied alpha

It was way too slow, so I coded some limited support for premultiplied
alpha. Works if :
- In must be TYPE_BGRA_8
- Out must be TYPE_BGRA_8 too.
- (there must be some internal matrix optimization, because my code
(hack) is using MatShaper8Data)

A new flag has been added : PREMUL_SH and it works only with TYPE_BGRA_8.

But the code, right now, looks like some quick and dirty hack :-(
First, because I'm not used to the internals of this library...
And secondly, because, it doesn't respect the current pipeline.
Because premultiplied alpha means extra computation and I need vey fast
process, I had to merge the InputFormatters, xform and OutputFormatters
into one routine.
In littleCMS 2.4, ~100 CPU cycle was necessary to processed one pixel
BGRA. The optimized code needs ~28 CPU cycles (unlinked alpha). And
premultiplied alpha optimized code requires ~65 CPU cycles for the worst
case (no alpha of 0 or 1.0).

I have done some benchmarking too, using an ECI tagged image + 2 alpha
masks + a random screen profile. The goal was to process a ECI image and
turn it into the screen color space.

Right now, the code is fast enough for my needs (for now). Of course, if
someone can help me to integrate such code (the clean way) I would be
happy to contribute seriously.

In the futur, some real optimization could be done (no SSE or OpenCL
vectorisation code)

Modified code  (derivated from LittleCMS 2.4) + test program can be
found here :

http://sebastienleon.com/info/littleCMS/littleCMS_PreMulAlphaHack.zip
(I give all copyrights to Marti)

Qt 4.x is required to build the test program. (Works on
Mac/Linux/Windows, do : "qmake && make" and "./test")

Best regards

Sebastien Léon
-----------------------------------------
LittleCMS Test/Hacks & simple benchmarking...
Init OK...

******* Start TEST : LittleCMS 2.4 Legacy *******
(test 0 lasts 683725 KCycles).
(test 1 lasts 686102 KCycles).
(test 2 lasts 686626 KCycles).
(test 3 lasts 683895 KCycles).
Average Test lasts 685087 KCycles.
Average CPU Cycle per pixel = 99.85.
-------------------------------------------
******* Start TEST : LittleCMS 2.4 + Unroll3BytesSkip1SwapExtFirst *******
(test 0 lasts 406031 KCycles).
(test 1 lasts 405165 KCycles).
(test 2 lasts 406182 KCycles).
(test 3 lasts 404195 KCycles).
Average Test lasts 405393 KCycles.
Average CPU Cycle per pixel = 59.09.
-------------------------------------------
******* Start TEST : RGBAEngineWithAlphaIgnored *******
(test 0 lasts 189697 KCycles).
(test 1 lasts 188906 KCycles).
(test 2 lasts 191982 KCycles).
(test 3 lasts 190995 KCycles).
Average Test lasts 190395 KCycles.
Average CPU Cycle per pixel = 27.75.
-------------------------------------------
******* Start TEST : PreMulEngineWithNoAlpha *******
(test 0 lasts 209824 KCycles).
(test 1 lasts 208821 KCycles).
(test 2 lasts 208711 KCycles).
(test 3 lasts 207210 KCycles).
Average Test lasts 208641 KCycles.
Average CPU Cycle per pixel = 30.41.
-------------------------------------------
******* Start TEST : PreMulEngineWithPreMulAlpha_WorstCase *******
(test 0 lasts 444388 KCycles).
(test 1 lasts 447862 KCycles).
(test 2 lasts 443876 KCycles).
(test 3 lasts 439628 KCycles).
Average Test lasts 443939 KCycles.
Average CPU Cycle per pixel = 64.70.
-------------------------------------------
******* Start TEST : PreMulEngineWithPreMulAlpha_SpriteCase *******
(test 0 lasts 132157 KCycles).
(test 1 lasts 130319 KCycles).
(test 2 lasts 131186 KCycles).
(test 3 lasts 150089 KCycles).
Average Test lasts 135938 KCycles.
Average CPU Cycle per pixel = 19.81.
-------------------------------------------
Work's done...

Thread: [Lcms-user] Support of PreMultiplied RGBA image format...

An ICC-based CMM for color management

lcms-user