I'm the anonymous guy who annoyed you in the last few days :D
Well, let me try again :D
I noticed the BgrToBgra function.
In my application I often divide BGR-32 into BGR-24 + GRAY-8 or combine BGR-24 + GRAY-8 into BGR-32 (GRAY-8 is Alpha).
For the first the speed is not so important but for the second it is.
I wonder if that function could be modified to work with a GRAY-8 bitmap data instead of a simple alpha value (and still be fast).
I'm not asking you to do the work, I will do it, just asking if it's possible and maybe for some pointers.
I know there are many files I have to modify to work but I'll start with this one:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2016-04-03
Also I've made 2 days ago a variation of the AlphaBlending function. It uses the same bitmap for src and dst and just a color for blending.
It's useful when you want to draw a transparent image onto a one color background. With usual functions you have to create a bitmap, fill it with the color (using FillBgr) and blend it with AlphaBlending.
But this way is very slow.
The new function AlphaBlendingColor is much faster.
But since you show(ed) no interest in my ideas to improve your librrary, I see no point in posting it here.
I'm outta here...
Good readens to me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, David.
I'm sorry, but I hadn't possibility to answer at the weekend.
I can see that you decide to extend functionality of Simd Library.
It's good. Can I add you to list of developers of the project?
If you agree then I post you a rules which are used in development of the library.
As to your implementation of Bgr To Bgra, I have some notes:
1) These two implementation are equivalent:
The first:
sum = 0;
for(size_t row = 0; row < height; ++row)
{
for(size_t col = 0; col < width; ++col)
sum += src[col];
src += stride;
}
The second:
sum = 0;
src+= (height - 1) * stride;
for(size_t row = 0; row < height; ++row)
{
for(size_t col = 0; col < width; ++col)
sum += src[col];
src -= stride;
}
But the first one is better because it uses a continuous memory access, and hardware (or sofware) prefetch works fine. In second case there is a jump of memory access for every row and it leads to cache miss.
2) The using of _mm_set_epi32(alpha[3], alpha[2], alpha[1], alpha[0]) is not good idea because it this intrinsic function doesn't have hardware implementation and has very poor performance.
Last edit: Yermalayeu Ihar 2016-04-04
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2019-12-10
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2019-12-13
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2019-12-13
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2019-12-14
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2019-12-16
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2019-12-17
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2019-12-18
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2019-12-20
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2020-04-27
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2020-06-22
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2020-07-11
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2020-11-10
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2020-11-20
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2021-01-06
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2021-03-16
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2021-03-20
Post awaiting moderation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi.
I'm the anonymous guy who annoyed you in the last few days :D
Well, let me try again :D
I noticed the BgrToBgra function.
In my application I often divide BGR-32 into BGR-24 + GRAY-8 or combine BGR-24 + GRAY-8 into BGR-32 (GRAY-8 is Alpha).
For the first the speed is not so important but for the second it is.
I wonder if that function could be modified to work with a GRAY-8 bitmap data instead of a simple alpha value (and still be fast).
I'm not asking you to do the work, I will do it, just asking if it's possible and maybe for some pointers.
I know there are many files I have to modify to work but I'll start with this one:
Is it good?
Regards,
David
I modified the base version and the SSSE3 version
But the SSSE3 version is just a little bit faster.
Also I've made 2 days ago a variation of the AlphaBlending function. It uses the same bitmap for src and dst and just a color for blending.
It's useful when you want to draw a transparent image onto a one color background. With usual functions you have to create a bitmap, fill it with the color (using FillBgr) and blend it with AlphaBlending.
But this way is very slow.
The new function AlphaBlendingColor is much faster.
But since you show(ed) no interest in my ideas to improve your librrary, I see no point in posting it here.
I'm outta here...
Good readens to me
Hi, David.
I'm sorry, but I hadn't possibility to answer at the weekend.
I can see that you decide to extend functionality of Simd Library.
It's good. Can I add you to list of developers of the project?
If you agree then I post you a rules which are used in development of the library.
As to your implementation of Bgr To Bgra, I have some notes:
1) These two implementation are equivalent:
The first:
The second:
But the first one is better because it uses a continuous memory access, and hardware (or sofware) prefetch works fine. In second case there is a jump of memory access for every row and it leads to cache miss.
2) The using of
_mm_set_epi32(alpha[3], alpha[2], alpha[1], alpha[0])
is not good idea because it this intrinsic function doesn't have hardware implementation and has very poor performance.Last edit: Yermalayeu Ihar 2016-04-04