From: Friedmann Y. <Y.F<riedmann@sw...>  20120514 13:24:19
Attachments:
Message as HTML

Hello, How can I convert the columns, or rows of an image to vectors so that instead of going through all the pixels in the image as in for (unsigned int j=1;j<g_image.nj()1; j++) for (unsigned int i=1; i<g_image.ni()1; i++) { do something with pixel(i,j)} I can work with the vectors? (as in matlab or fortran?) will working with such vectors give me such a big efficiency advantage as the ones in matlab for eg? many thanks, Yasmin  
From: Ian Scott <scottim@im...>  20120514 13:48:31

On 14/05/2012 14:23, Friedmann Y. wrote: > Hello, > > How can I convert the columns, or rows of an image to vectors so that > instead of going through all > the pixels in the image as in > > for (unsigned int j=1;j<g_image.nj()1; j++) > for (unsigned int i=1; i<g_image.ni()1; i++) > { do something with pixel(i,j)} > > I can work with the vectors? (as in matlab or fortran?) > will working with such vectors give me such a big efficiency advantage > as the ones in matlab for eg? No. Iteration and dereference in C++ is pretty cheap. Matlab vector and matrix operations are actually implemented in using C (or Fortran or C++) iteration over the underlying array. Fortran can give you a few advantages over C or C++ by assuming no aliasing. But there is no efficiency loss in Fortran by performing the loop yourself. You can make marginal improvements to your loops by precalculating image.nj()1 and image.ni()1 outside the loops. If you calculation is independent of pixel position, you can also ( in the case of compactly stored images) reduce it to a single loop for (VOXEL_TYPE* it=g_image.begin(), end=image.end(); it != end; ++it) do_something_to(*it); If your loop is taking an unreasonable length of time, it is like due to inefficient implementation of the loop contents, failure to turn the compiler's optimiser on. If you desperately want to treat image contents as numeric vectors, your can wrap a raster in a vnl_vector_ref. I can't see why you would want to though. VXL's types follow C++ ideas on type safety. Bringing in Matlab's colloquialisms is not likely to help you in the medium term. Ian. 
From: Ian Scott <scottim@im...>  20120516 12:29:27

On 16/05/2012 12:29, Friedmann Y. wrote: > Thanks Ian, Thanks very much for your tips! To make my question clearer, > here is a bit of my first and second attempts: > in the first attempt i am accessing the pixels u(i,j) and in the second > attempt I am using pointers to the pixels. > one iteration in the first attempt takes 0.9 secs, and in second attempt >  0.7 secs. Is this reasonable? I would have thought that the second way > would be much much faster? The ratio of those numbers is not out of line with what I would expect. I can't comment on their absolute value since I don't know the size of your image, the complexity of your perpixel calculation, or even the speed of your machine. As I said previously, indexing is not that expensive if you use a good optimising compiler. Still, you can usually extract slightly higher performance by doing some pointer arithmetic yourself, which is what you found, and what vil_math tries to do. I would have expected that the costs of your loop are dominated by the floating point calculations, not the loop variant, or the integermultiplication used during the indexing. These integer operations are cheap, and the compiler's optimiser might well be able to spot the similarity of all the pixel dereferences and avoid unnecessary indexing calculations. If the 20% improvement in speed is an issue for you, then you can implement your code that way. But as Tony Hoare (allegedy) said "Premature optimisation is the root of all evil." For example, in my world  large (>1Gb) 3D volume images and effectively randomaccesslookup to the image  the bottleneck often appears to be copying the data from main memory to and from the CPU's L3 cache. Fiddling around with the loop structure often has little effect other than at best obscuring the meaning of the code. Often it introduces hardtofind errors. Ian. PS Please keep these discussions on vxlusers. See the vxluserspolicy for reasons. > > many thanks again for taking the time with this! > > Yasmin: > > first attempt: > > vil_image_view<float> func1(vil_image_view<float> g_image,int itot) { > > float BS,CS(0.),DS(0.),ES(0.); > float FS(0.),GS(0.),HS(0.),IS(0.); > float JS(0.),KS(0.),LS(0.),MS(0.); > > vil_image_view<float> u0,cog; > vil_copy_deep(g_image,u0); > vil_copy_deep(g_image,cog); > > > > for (int it=1;it<itot+1;it++) > { > clock_t tStart = clock(); > for (unsigned int j=1;j<g_image.nj()1; j++) > for (unsigned int i=1; i<g_image.ni()1; i++) > { > BS=(u0(i+1,j)+u0(i1,j))/2.; > CS=(u0(i,j+1)+u0(i,j1))/2.; > DS=(u0(i+1,j1)+u0(i1,j+1))/2.; > ES=(u0(i1,j1)+u0(i+1,j+1))/2.; > //etc > } > } > double timef = (double)(clock()  tStart)/CLOCKS_PER_SEC; > vcl_cout<< it << '\t'<< timef << " seconds" <<vcl_endl; > } > > return u0; > } > > > second attempt (using the math.h way of going over an image) > > void func2(vil_image_view<float>& imB,vil_image_view<float>& im_sum,int > itot) { > > vil_image_view<float> imA; > vil_copy_deep(imB,imA); > unsigned ni = imA.ni();//width > unsigned nj = imA.nj(); //height > unsigned np = imA.nplanes(); > im_sum.set_size(ni,nj,np); > vcl_ptrdiff_t istepA=imA.istep(),jstepA=imA.jstep(),pstepA = > imA.planestep(); > vcl_ptrdiff_t istepS=im_sum.istep(),jstepS=im_sum.jstep(),pstepS = > im_sum.planestep(); > vcl_ptrdiff_t istepB=imB.istep(),jstepB=imB.jstep(),pstepB = > imB.planestep(); > const float* planeA = imA.top_left_ptr(); > const float* u11 = planeA+1+ni; > const float* planeB = imB.top_left_ptr(); > const float* b11 = planeB+1+ni; > float* planeS = im_sum.top_left_ptr(); > float* cog11 = planeS+1+ni; > > float* > BS=imA.top_left_ptr(),*CS=imA.top_left_ptr(),*DS=imA.top_left_ptr(),*ES=imA.top_left_ptr(); > float* > FS=imA.top_left_ptr(),*GS=imA.top_left_ptr(),*HS=imA.top_left_ptr(),*IS=imA.top_left_ptr(); > float* > JS=imA.top_left_ptr(),*KS=imA.top_left_ptr(),*LS=imA.top_left_ptr(),*MS=imA.top_left_ptr(); > // vil_image_view<float> u0,cog; > const float two = 2.0; > //vil_copy_deep(imA,u0); > //vil_copy_deep(imA,cog); > > for (int it=1;it<itot+1;it++) > { > clock_t tStart = clock(); > for (unsigned p=0;p<np;++p,planeA += pstepA,planeB += pstepB,planeS += > pstepS){ > const float* rowA = u11; > const float* rowB = b11; > float* rowS = cog11; > for (unsigned j=0;j<nj2;++j,rowA += jstepA,rowB += jstepB,rowS += jstepS){ > const float* pixelA = rowA; > const float* pixelB = rowB; > float* pixelS = rowS; > for (unsigned i=0;i<ni2;++i,pixelA+=istepA,pixelB+=istepB,pixelS+=istepS){ > float aa = float(*(pixelA+1)); > float bb = float(*(pixelA1)); > > *BS=(aa+bb)/two; > *CS=(float(*(pixelA+ni))+float(*(pixelAni)))/two; > *DS=(float(*(pixelA+1ni))+float(*(pixelA1+ni)))/2.0f; > //DS=(u0(i+1,j1)+u0(i1,j+1))/2.; > *ES=(float(*(pixelA1ni))+float(*(pixelA+1+ni)))/2.0f; > //ES=(u0(i1,j1)+u0(i+1,j+1))/2.; > > //etc... > } 
From: Friedmann Y. <Y.F<riedmann@sw...>  20120516 12:52:43
Attachments:
Message as HTML

So is it right to assume that when using vectors in MATLAB to do the same calculations, their 10 times higher efficiency is due to compiler optimization? Yasmin On 16/05/2012 12:29, Friedmann Y. wrote: > Thanks Ian, Thanks very much for your tips! To make my question clearer, > here is a bit of my first and second attempts: > in the first attempt i am accessing the pixels u(i,j) and in the second > attempt I am using pointers to the pixels. > one iteration in the first attempt takes 0.9 secs, and in second attempt >  0.7 secs. Is this reasonable? I would have thought that the second way > would be much much faster? The ratio of those numbers is not out of line with what I would expect. I can't comment on their absolute value since I don't know the size of your image, the complexity of your perpixel calculation, or even the speed of your machine. As I said previously, indexing is not that expensive if you use a good optimising compiler. Still, you can usually extract slightly higher performance by doing some pointer arithmetic yourself, which is what you found, and what vil_math tries to do. I would have expected that the costs of your loop are dominated by the floating point calculations, not the loop variant, or the integermultiplication used during the indexing. These integer operations are cheap, and the compiler's optimiser might well be able to spot the similarity of all the pixel dereferences and avoid unnecessary indexing calculations. If the 20% improvement in speed is an issue for you, then you can implement your code that way. But as Tony Hoare (allegedy) said "Premature optimisation is the root of all evil." For example, in my world  large (>1Gb) 3D volume images and effectively randomaccesslookup to the image  the bottleneck often appears to be copying the data from main memory to and from the CPU's L3 cache. Fiddling around with the loop structure often has little effect other than at best obscuring the meaning of the code. Often it introduces hardtofind errors. Ian. PS Please keep these discussions on vxlusers. See the vxluserspolicy for reasons. > > many thanks again for taking the time with this! > > Yasmin: > > first attempt: > > vil_image_view<float> func1(vil_image_view<float> g_image,int itot) { > > float BS,CS(0.),DS(0.),ES(0.); > float FS(0.),GS(0.),HS(0.),IS(0.); > float JS(0.),KS(0.),LS(0.),MS(0.); > > vil_image_view<float> u0,cog; > vil_copy_deep(g_image,u0); > vil_copy_deep(g_image,cog); > > > > for (int it=1;it<itot+1;it++) > { > clock_t tStart = clock(); > for (unsigned int j=1;j<g_image.nj()1; j++) > for (unsigned int i=1; i<g_image.ni()1; i++) > { > BS=(u0(i+1,j)+u0(i1,j))/2.; > CS=(u0(i,j+1)+u0(i,j1))/2.; > DS=(u0(i+1,j1)+u0(i1,j+1))/2.; > ES=(u0(i1,j1)+u0(i+1,j+1))/2.; > //etc > } > } > double timef = (double)(clock()  tStart)/CLOCKS_PER_SEC; > vcl_cout<< it << '\t'<< timef << " seconds" <<vcl_endl; > } > > return u0; > } > > > second attempt (using the math.h way of going over an image) > > void func2(vil_image_view<float>& imB,vil_image_view<float>& im_sum,int > itot) { > > vil_image_view<float> imA; > vil_copy_deep(imB,imA); > unsigned ni = imA.ni();//width > unsigned nj = imA.nj(); //height > unsigned np = imA.nplanes(); > im_sum.set_size(ni,nj,np); > vcl_ptrdiff_t istepA=imA.istep(),jstepA=imA.jstep(),pstepA = > imA.planestep(); > vcl_ptrdiff_t istepS=im_sum.istep(),jstepS=im_sum.jstep(),pstepS = > im_sum.planestep(); > vcl_ptrdiff_t istepB=imB.istep(),jstepB=imB.jstep(),pstepB = > imB.planestep(); > const float* planeA = imA.top_left_ptr(); > const float* u11 = planeA+1+ni; > const float* planeB = imB.top_left_ptr(); > const float* b11 = planeB+1+ni; > float* planeS = im_sum.top_left_ptr(); > float* cog11 = planeS+1+ni; > > float* > BS=imA.top_left_ptr(),*CS=imA.top_left_ptr(),*DS=imA.top_left_ptr(),*ES=imA.top_left_ptr(); > float* > FS=imA.top_left_ptr(),*GS=imA.top_left_ptr(),*HS=imA.top_left_ptr(),*IS=imA.top_left_ptr(); > float* > JS=imA.top_left_ptr(),*KS=imA.top_left_ptr(),*LS=imA.top_left_ptr(),*MS=imA.top_left_ptr(); > // vil_image_view<float> u0,cog; > const float two = 2.0; > //vil_copy_deep(imA,u0); > //vil_copy_deep(imA,cog); > > for (int it=1;it<itot+1;it++) > { > clock_t tStart = clock(); > for (unsigned p=0;p<np;++p,planeA += pstepA,planeB += pstepB,planeS += > pstepS){ > const float* rowA = u11; > const float* rowB = b11; > float* rowS = cog11; > for (unsigned j=0;j<nj2;++j,rowA += jstepA,rowB += jstepB,rowS += jstepS){ > const float* pixelA = rowA; > const float* pixelB = rowB; > float* pixelS = rowS; > for (unsigned i=0;i<ni2;++i,pixelA+=istepA,pixelB+=istepB,pixelS+=istepS){ > float aa = float(*(pixelA+1)); > float bb = float(*(pixelA1)); > > *BS=(aa+bb)/two; > *CS=(float(*(pixelA+ni))+float(*(pixelAni)))/two; > *DS=(float(*(pixelA+1ni))+float(*(pixelA1+ni)))/2.0f; > //DS=(u0(i+1,j1)+u0(i1,j+1))/2.; > *ES=(float(*(pixelA1ni))+float(*(pixelA+1+ni)))/2.0f; > //ES=(u0(i1,j1)+u0(i+1,j+1))/2.; > > //etc... > } 
From: Ian Scott <scottim@im...>  20120516 13:15:50

On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlabclone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation  therefore no opportunity for any optimisation. Ian. 
From: Friedmann Y. <Y.F<riedmann@sw...>  20120516 13:57:01
Attachments:
Message as HTML

so how is it that the vectorised calcs are so much faster in matlab? Original Message From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlabclone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation  therefore no opportunity for any optimisation. Ian. 
From: Ian Scott <scottim@im...>  20120516 14:26:41

On 16/05/2012 14:55, Friedmann Y. wrote: > so how is it that the vectorised calcs are so much faster in matlab? Taking the example of a = some_large_matrix .^ 2 This is cheap to parse and dispatch. The only interpreterlevel operations are 1. Parse line. 2. look up variable "large_matrix" and store in internal register V1. 3. look up/create variable "a" once store in internal register V2. 4. Call internal function per_element_power(V1, 2, V2). The internal function per_element_power will have been written in C (or C++, FORTRAN, or assembler) and will have been optimised by which ever compiler they used to compile MATLAB. If you wrote the full loop version for i=1:size(some_large_matrix,1) for j=1:size(some_large_matrix,2) a(i,j) = some_large_matrix(i,j) ^ 2; then the interpreter would repeated be asking 19: Loop stuff 10. Parse loop internals. 11. lookup variable some_large_matrix and store in v1 12. lookup variable i and store in v2 13. lookup variable j and store in v3 14. call dereference_matrix(v1, v2, v3, v4) 1518  same again for a into v8 19. Call internal function full_power(v4, 2, v8) 2025 Some more loop stuff  jump back to line 4ish several thousand times. I'm afraid these questions are getting a little far from VXL. I'd suggest reading a book, or taking a course on compilers and interpreters, if you want to know more. Ian. 
From: Wheeler, Frederick W (GE Global Research) <wheeler@ge...>  20120516 14:33:21
Attachments:
Message as HTML

Are you wondering why vectorized calcs in Matlab are faster than nonvectorized calculations in Matlab? Or are you wondering why vectorized calculations in Matlab are faster that the same calculations in VXL? From: Friedmann Y. [mailto:Y.Friedmann@...] Sent: Wednesday, May 16, 2012 9:55 AM To: Ian Scott Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image so how is it that the vectorised calcs are so much faster in matlab? Original Message From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlabclone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation  therefore no opportunity for any optimisation. Ian. 
From: Rasmus Reinhold Paulsen <rrp@im...>  20120516 14:37:50
Attachments:
Message as HTML

I cannot say if that is true  it would a big surprise for me. In particular if you are a little careful in your C++ calling. However, GPU optimized code etc can do a lot today. Perhaps certain underlying routines in Matlab are optimized for multicores/GPUs etc. What I do know is that you have to be careful how you measure your time. In particular clock() is not good to use in small loops (granularity of 10 ms, if I remember correctly). Either call you your routine 1000 times an measure time outside or use a better timer. I think there is one called something like QueryPerformanceCounter() However, I am not following the recent trends...so my information might be a little stale... Cheers, Rasmus From: Friedmann Y. [mailto:Y.Friedmann@...] Sent: 16. maj 2012 15:55 To: Ian Scott Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image so how is it that the vectorised calcs are so much faster in matlab? Original Message From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlabclone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation  therefore no opportunity for any optimisation. Ian. 
From: Friedmann Y. <Y.F<riedmann@sw...>  20120516 14:53:11
Attachments:
Message as HTML

I am wondering why vectorized calcs in Matlab are faster that the same calculations in VXL, even using the pointerwisw code ? Original Message From: Wheeler, Frederick W (GE Global Research) [mailto:wheeler@...] Sent: Wed 16/05/2012 15:32 To: Friedmann Y. Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image Are you wondering why vectorized calcs in Matlab are faster than nonvectorized calculations in Matlab? Or are you wondering why vectorized calculations in Matlab are faster that the same calculations in VXL? From: Friedmann Y. [mailto:Y.Friedmann@...] Sent: Wednesday, May 16, 2012 9:55 AM To: Ian Scott Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image so how is it that the vectorised calcs are so much faster in matlab? Original Message From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlabclone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation  therefore no opportunity for any optimisation. Ian. 
From: Ian Scott <scottim@im...>  20120516 15:12:42

On 16/05/2012 15:50, Friedmann Y. wrote: > I am wondering why vectorized calcs in Matlab are faster that the same > calculations > in VXL, even using the pointerwisw code ? Ahh. I understand your question now. Probably due to nonaliasing assumptions and use of SSE SIMD extensions on x86. I also believe recent versions of Matlab can use the GPU for some work. I and an intern tried to get the vnl to use SSE2 intrinsics, but we gave up before it worked reliably across the variety of compilers & platforms in use on the dashboard. I've also had a go at adding nonalias directives  but without any detectable improvement. If all your work can be easily be coded in Matlab vectorised format then use Matlab. If you need to use C++/VXL for other reasons, and you want to have a go at finishing the vnl/SSE stuff  I'll happily point you at the right direction. Ian. 
From: Friedmann Y. <Y.F<riedmann@sw...>  20120516 15:05:59
Attachments:
Message as HTML

Thanks Rasmus, I will give it a try... Original Message From: Rasmus Reinhold Paulsen [mailto:rrp@...] Sent: Wed 16/05/2012 15:37 To: VxlUsers Subject: Re: [Vxlusers] vectorise image I cannot say if that is true  it would a big surprise for me. In particular if you are a little careful in your C++ calling. However, GPU optimized code etc can do a lot today. Perhaps certain underlying routines in Matlab are optimized for multicores/GPUs etc. What I do know is that you have to be careful how you measure your time. In particular clock() is not good to use in small loops (granularity of 10 ms, if I remember correctly). Either call you your routine 1000 times an measure time outside or use a better timer. I think there is one called something like QueryPerformanceCounter() However, I am not following the recent trends...so my information might be a little stale... Cheers, Rasmus From: Friedmann Y. [mailto:Y.Friedmann@...] Sent: 16. maj 2012 15:55 To: Ian Scott Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image so how is it that the vectorised calcs are so much faster in matlab? Original Message From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlabclone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation  therefore no opportunity for any optimisation. Ian. 
From: Friedmann Y. <Y.F<riedmann@sw...>  20120517 08:52:08
Attachments:
Message as HTML

Infact, my project is to transfer code from matlab to c++ and incorporate it into a windows application. I definitely not got the kind of knowledge to finish your work on the vnl/SSE stuff at the moment... speed is not the main issue really, I was just a bit frustrated that I wasnt "beating" the matlab code. cheers! Original Message From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 16:12 To: Friedmann Y. Cc: VxlUsers Subject: Re: [Vxlusers] vectorise image On 16/05/2012 15:50, Friedmann Y. wrote: > I am wondering why vectorized calcs in Matlab are faster that the same > calculations > in VXL, even using the pointerwisw code ? Ahh. I understand your question now. Probably due to nonaliasing assumptions and use of SSE SIMD extensions on x86. I also believe recent versions of Matlab can use the GPU for some work. I and an intern tried to get the vnl to use SSE2 intrinsics, but we gave up before it worked reliably across the variety of compilers & platforms in use on the dashboard. I've also had a go at adding nonalias directives  but without any detectable improvement. If all your work can be easily be coded in Matlab vectorised format then use Matlab. If you need to use C++/VXL for other reasons, and you want to have a go at finishing the vnl/SSE stuff  I'll happily point you at the right direction. Ian. 