Menu

BGR-24 + Alpha GRAY-8 >> BGR-32

Anonymous
2016-04-02
2016-04-04
1 2 > >> (Page 1 of 2)
  • Anonymous

    Anonymous - 2016-04-02

    Hi.

    I'm the anonymous guy who annoyed you in the last few days :D
    Well, let me try again :D

    I noticed the BgrToBgra function.
    In my application I often divide BGR-32 into BGR-24 + GRAY-8 or combine BGR-24 + GRAY-8 into BGR-32 (GRAY-8 is Alpha).
    For the first the speed is not so important but for the second it is.

    I wonder if that function could be modified to work with a GRAY-8 bitmap data instead of a simple alpha value (and still be fast).
    I'm not asking you to do the work, I will do it, just asking if it's possible and maybe for some pointers.

    I know there are many files I have to modify to work but I'll start with this one:

    template <bool align> void BgrToBgra(const uint8_t * bgr, size_t width, size_t height, size_t bgrStride, uint8_t * bgra, size_t bgraStride, uint8_t *alpha, size_t alphaStride)
            {
                bgr += (height - 1) * bgrStride;
                bgra += (height - 1) * bgraStride;
                alpha += (height - 1) * alphaStride;
    
                assert(width >= A);
                if(align)
                    assert(Aligned(bgra) && Aligned(bgraStride) && Aligned(bgr) && Aligned(bgrStride) && Aligned(alpha) && Aligned(alphaStride));
    
                size_t alignedWidth = AlignLo(width, A);
    
                //__m128i _alpha = _mm_slli_si128(_mm_set1_epi32(alpha), 3);
                __m128i _shuffle = _mm_setr_epi8(0x0, 0x1, 0x2, -1, 0x3, 0x4, 0x5, -1, 0x6, 0x7, 0x8, -1, 0x9, 0xA, 0xB, -1);
    
                for(size_t row = 0; row < height; ++row)
                {
                    for(size_t col = 0; col < alignedWidth; col += A)
                        BgrToBgra<align>(bgr + 3*col, bgra + 4*col, alpha + col, _shuffle);
                    if(width != alignedWidth)
                        BgrToBgra<false>(bgr + 3*(width - A), bgra + 4*(width - A), alpha + width - A, _shuffle);
                    bgr -= bgrStride;
                    bgra -= bgraStride;
                    alpha -= alphaStride;
                }
            }
    

    Is it good?

    Regards,
    David

     
    • Anonymous

      Anonymous - 2019-08-21
      Post awaiting moderation.
    • Anonymous

      Anonymous - 2019-09-07
      Post awaiting moderation.
    • Anonymous

      Anonymous - 2019-09-10
      Post awaiting moderation.
    • Anonymous

      Anonymous - 2019-09-25
      Post awaiting moderation.
    • Anonymous

      Anonymous - 2019-10-12
      Post awaiting moderation.
  • Anonymous

    Anonymous - 2016-04-02

    I modified the base version and the SSSE3 version
    But the SSSE3 version is just a little bit faster.

            template <bool align> SIMD_INLINE void BgrToBgra(const uint8_t * bgr, uint8_t * bgra, uint8_t * alpha, __m128i shuffle)
            {
                Store<align>((__m128i*)bgra + 0, _mm_or_si128(_mm_slli_si128(_mm_set_epi32(alpha[3], alpha[2], alpha[1], alpha[0]), 3), _mm_shuffle_epi8(Load<align>((__m128i*)(bgr +  0)), shuffle)));
                Store<align>((__m128i*)bgra + 1, _mm_or_si128(_mm_slli_si128(_mm_set_epi32(alpha[7], alpha[6], alpha[5], alpha[4]), 3), _mm_shuffle_epi8(Load<false>((__m128i*)(bgr + 12)), shuffle)));
                Store<align>((__m128i*)bgra + 2, _mm_or_si128(_mm_slli_si128(_mm_set_epi32(alpha[11], alpha[10], alpha[9], alpha[8]), 3), _mm_shuffle_epi8(Load<false>((__m128i*)(bgr + 24)), shuffle)));
                Store<align>((__m128i*)bgra + 3, _mm_or_si128(_mm_slli_si128(_mm_set_epi32(alpha[15], alpha[14], alpha[13], alpha[12]), 3), _mm_shuffle_epi8(_mm_srli_si128(Load<align>((__m128i*)(bgr + 32)), 4), shuffle)));
            }
    
            template <bool align> void BgrToBgra(const uint8_t * bgr, size_t width, size_t height, size_t bgrStride, uint8_t * bgra, size_t bgraStride, uint8_t *alpha, size_t alphaStride)
            {
                bgr += (height - 1) * bgrStride;
                bgra += (height - 1) * bgraStride;
                alpha += (height - 1) * alphaStride;
    
                assert(width >= A);
                if(align)
                    assert(Aligned(bgra) && Aligned(bgraStride) && Aligned(bgr) && Aligned(bgrStride) && Aligned(alpha) && Aligned(alphaStride));
    
                size_t alignedWidth = AlignLo(width, A);
    
                __m128i _shuffle = _mm_setr_epi8(0x0, 0x1, 0x2, -1, 0x3, 0x4, 0x5, -1, 0x6, 0x7, 0x8, -1, 0x9, 0xA, 0xB, -1);
    
                for(size_t row = 0; row < height; ++row)
                {
                    for(size_t col = 0; col < alignedWidth; col += A)
                        BgrToBgra<align>(bgr + 3*col, bgra + 4*col, alpha + col, _shuffle);
                    if(width != alignedWidth)
                        BgrToBgra<false>(bgr + 3*(width - A), bgra + 4*(width - A), alpha + width - A, _shuffle);
                    bgr -= bgrStride;
                    bgra -= bgraStride;
                    alpha -= alphaStride;
                }
            }
    
     
  • Anonymous

    Anonymous - 2016-04-03

    Also I've made 2 days ago a variation of the AlphaBlending function. It uses the same bitmap for src and dst and just a color for blending.
    It's useful when you want to draw a transparent image onto a one color background. With usual functions you have to create a bitmap, fill it with the color (using FillBgr) and blend it with AlphaBlending.
    But this way is very slow.
    The new function AlphaBlendingColor is much faster.

    But since you show(ed) no interest in my ideas to improve your librrary, I see no point in posting it here.

    I'm outta here...

    Good readens to me

     
  • Yermalayeu Ihar

    Yermalayeu Ihar - 2016-04-04

    Hi, David.
    I'm sorry, but I hadn't possibility to answer at the weekend.

    I can see that you decide to extend functionality of Simd Library.
    It's good. Can I add you to list of developers of the project?
    If you agree then I post you a rules which are used in development of the library.

    As to your implementation of Bgr To Bgra, I have some notes:
    1) These two implementation are equivalent:
    The first:

    sum = 0; 
    for(size_t row = 0; row < height; ++row) 
    { 
        for(size_t col = 0; col < width; ++col) 
             sum += src[col]; 
        src += stride; 
    }
    

    The second:

    sum = 0; 
    src+= (height - 1) * stride; 
    for(size_t row = 0; row < height; ++row) 
    { 
         for(size_t col = 0; col < width; ++col) 
              sum += src[col]; 
         src -= stride; 
    }
    

    But the first one is better because it uses a continuous memory access, and hardware (or sofware) prefetch works fine. In second case there is a jump of memory access for every row and it leads to cache miss.

    2) The using of _mm_set_epi32(alpha[3], alpha[2], alpha[1], alpha[0]) is not good idea because it this intrinsic function doesn't have hardware implementation and has very poor performance.

     

    Last edit: Yermalayeu Ihar 2016-04-04
  • Anonymous

    Anonymous - 2019-12-10
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2019-12-13
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2019-12-13
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2019-12-14
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2019-12-16
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2019-12-17
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2019-12-18
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2019-12-20
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2020-04-27
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2020-06-22
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2020-07-11
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2020-11-10
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2020-11-20
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2021-01-06
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2021-03-16
    Post awaiting moderation.
  • Anonymous

    Anonymous - 2021-03-20
    Post awaiting moderation.
1 2 > >> (Page 1 of 2)

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.