Simd / Discussion / English Common Forum: BGR-24 + Alpha GRAY-8 >> BGR-32

Anonymous - 2016-04-02

Hi.

I'm the anonymous guy who annoyed you in the last few days :D
Well, let me try again :D

I noticed the BgrToBgra function.
In my application I often divide BGR-32 into BGR-24 + GRAY-8 or combine BGR-24 + GRAY-8 into BGR-32 (GRAY-8 is Alpha).
For the first the speed is not so important but for the second it is.

I wonder if that function could be modified to work with a GRAY-8 bitmap data instead of a simple alpha value (and still be fast).
I'm not asking you to do the work, I will do it, just asking if it's possible and maybe for some pointers.

I know there are many files I have to modify to work but I'll start with this one:

template <bool align> void BgrToBgra(const uint8_t * bgr, size_t width, size_t height, size_t bgrStride, uint8_t * bgra, size_t bgraStride, uint8_t *alpha, size_t alphaStride) { bgr += (height - 1) * bgrStride; bgra += (height - 1) * bgraStride; alpha += (height - 1) * alphaStride; assert(width >= A); if(align) assert(Aligned(bgra) && Aligned(bgraStride) && Aligned(bgr) && Aligned(bgrStride) && Aligned(alpha) && Aligned(alphaStride)); size_t alignedWidth = AlignLo(width, A); //__m128i _alpha = _mm_slli_si128(_mm_set1_epi32(alpha), 3); __m128i _shuffle = _mm_setr_epi8(0x0, 0x1, 0x2, -1, 0x3, 0x4, 0x5, -1, 0x6, 0x7, 0x8, -1, 0x9, 0xA, 0xB, -1); for(size_t row = 0; row < height; ++row) { for(size_t col = 0; col < alignedWidth; col += A) BgrToBgra<align>(bgr + 3*col, bgra + 4*col, alpha + col, _shuffle); if(width != alignedWidth) BgrToBgra<false>(bgr + 3*(width - A), bgra + 4*(width - A), alpha + width - A, _shuffle); bgr -= bgrStride; bgra -= bgraStride; alpha -= alphaStride; } }

Is it good?

Regards,
David
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2019-08-21
  
  Post awaiting moderation.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2019-09-07
  
  Post awaiting moderation.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2019-09-10
  
  Post awaiting moderation.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2019-09-25
  
  Post awaiting moderation.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2019-10-12
  
  Post awaiting moderation.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

I modified the base version and the SSSE3 version
But the SSSE3 version is just a little bit faster.

        template <bool align> SIMD_INLINE void BgrToBgra(const uint8_t * bgr, uint8_t * bgra, uint8_t * alpha, __m128i shuffle)
        {
            Store<align>((__m128i*)bgra + 0, _mm_or_si128(_mm_slli_si128(_mm_set_epi32(alpha[3], alpha[2], alpha[1], alpha[0]), 3), _mm_shuffle_epi8(Load<align>((__m128i*)(bgr +  0)), shuffle)));
            Store<align>((__m128i*)bgra + 1, _mm_or_si128(_mm_slli_si128(_mm_set_epi32(alpha[7], alpha[6], alpha[5], alpha[4]), 3), _mm_shuffle_epi8(Load<false>((__m128i*)(bgr + 12)), shuffle)));
            Store<align>((__m128i*)bgra + 2, _mm_or_si128(_mm_slli_si128(_mm_set_epi32(alpha[11], alpha[10], alpha[9], alpha[8]), 3), _mm_shuffle_epi8(Load<false>((__m128i*)(bgr + 24)), shuffle)));
            Store<align>((__m128i*)bgra + 3, _mm_or_si128(_mm_slli_si128(_mm_set_epi32(alpha[15], alpha[14], alpha[13], alpha[12]), 3), _mm_shuffle_epi8(_mm_srli_si128(Load<align>((__m128i*)(bgr + 32)), 4), shuffle)));
        }

        template <bool align> void BgrToBgra(const uint8_t * bgr, size_t width, size_t height, size_t bgrStride, uint8_t * bgra, size_t bgraStride, uint8_t *alpha, size_t alphaStride)
        {
            bgr += (height - 1) * bgrStride;
            bgra += (height - 1) * bgraStride;
            alpha += (height - 1) * alphaStride;

            assert(width >= A);
            if(align)
                assert(Aligned(bgra) && Aligned(bgraStride) && Aligned(bgr) && Aligned(bgrStride) && Aligned(alpha) && Aligned(alphaStride));

            size_t alignedWidth = AlignLo(width, A);

            __m128i _shuffle = _mm_setr_epi8(0x0, 0x1, 0x2, -1, 0x3, 0x4, 0x5, -1, 0x6, 0x7, 0x8, -1, 0x9, 0xA, 0xB, -1);

            for(size_t row = 0; row < height; ++row)
            {
                for(size_t col = 0; col < alignedWidth; col += A)
                    BgrToBgra<align>(bgr + 3*col, bgra + 4*col, alpha + col, _shuffle);
                if(width != alignedWidth)
                    BgrToBgra<false>(bgr + 3*(width - A), bgra + 4*(width - A), alpha + width - A, _shuffle);
                bgr -= bgrStride;
                bgra -= bgraStride;
                alpha -= alphaStride;
            }
        }

Anonymous

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2016-04-03

Also I've made 2 days ago a variation of the AlphaBlending function. It uses the same bitmap for src and dst and just a color for blending.
It's useful when you want to draw a transparent image onto a one color background. With usual functions you have to create a bitmap, fill it with the color (using FillBgr) and blend it with AlphaBlending.
But this way is very slow.
The new function AlphaBlendingColor is much faster.

But since you show(ed) no interest in my ideas to improve your librrary, I see no point in posting it here.

I'm outta here...

Good readens to me

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Yermalayeu Ihar - 2016-04-04

Hi, David.
I'm sorry, but I hadn't possibility to answer at the weekend.

I can see that you decide to extend functionality of Simd Library.
It's good. Can I add you to list of developers of the project?
If you agree then I post you a rules which are used in development of the library.

As to your implementation of Bgr To Bgra, I have some notes:
1) These two implementation are equivalent:
The first:

sum = 0; for(size_t row = 0; row < height; ++row) { for(size_t col = 0; col < width; ++col) sum += src[col]; src += stride; }

The second:

sum = 0; src+= (height - 1) * stride; for(size_t row = 0; row < height; ++row) { for(size_t col = 0; col < width; ++col) sum += src[col]; src -= stride; }

But the first one is better because it uses a continuous memory access, and hardware (or sofware) prefetch works fine. In second case there is a jump of memory access for every row and it leads to cache miss.

2) The using of _mm_set_epi32(alpha[3], alpha[2], alpha[1], alpha[0]) is not good idea because it this intrinsic function doesn't have hardware implementation and has very poor performance.

Last edit: Yermalayeu Ihar 2016-04-04
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2019-12-10

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2019-12-13

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2019-12-13

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2019-12-14

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2019-12-16

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2019-12-17

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2019-12-18

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2019-12-20

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-04-27

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-06-22

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-07-11

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-11-10

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-11-20

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2021-01-06

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2021-03-16

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2021-03-20

Post awaiting moderation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

BGR-24 + Alpha GRAY-8 >> BGR-32

High performance image processing library in C++

Forums

Help

BGR-24 + Alpha GRAY-8 >> BGR-32

BGR-24 + Alpha GRAY-8 >> BGR-32

High performance image processing library in C++

Forums

Help

BGR-24 + Alpha GRAY-8 >> BGR-32 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

BGR-24 + Alpha GRAY-8 >> BGR-32