## vxl-users

 [Vxl-users] vectorise image From: Friedmann Y. - 2012-05-14 13:24:19 Attachments: Message as HTML ```Hello, How can I convert the columns, or rows of an image to vectors so that instead of going through all the pixels in the image as in for (unsigned int j=1;j
 Re: [Vxl-users] vectorise image From: Ian Scott - 2012-05-14 13:48:31 ```On 14/05/2012 14:23, Friedmann Y. wrote: > Hello, > > How can I convert the columns, or rows of an image to vectors so that > instead of going through all > the pixels in the image as in > > for (unsigned int j=1;j for (unsigned int i=1; i { do something with pixel(i,j)} > > I can work with the vectors? (as in matlab or fortran?) > will working with such vectors give me such a big efficiency advantage > as the ones in matlab for eg? No. Iteration and dereference in C++ is pretty cheap. Matlab vector and matrix operations are actually implemented in using C (or Fortran or C++) iteration over the underlying array. Fortran can give you a few advantages over C or C++ by assuming no aliasing. But there is no efficiency loss in Fortran by performing the loop yourself. You can make marginal improvements to your loops by precalculating image.nj()-1 and image.ni()-1 outside the loops. If you calculation is independent of pixel position, you can also ( in the case of compactly stored images) reduce it to a single loop for (VOXEL_TYPE* it=g_image.begin(), end=image.end(); it != end; ++it) do_something_to(*it); If your loop is taking an unreasonable length of time, it is like due to inefficient implementation of the loop contents, failure to turn the compiler's optimiser on. If you desperately want to treat image contents as numeric vectors, your can wrap a raster in a vnl_vector_ref. I can't see why you would want to though. VXL's types follow C++ ideas on type safety. Bringing in Matlab's colloquialisms is not likely to help you in the medium term. Ian. ```
 Re: [Vxl-users] vectorise image From: Ian Scott - 2012-05-16 12:29:27 ```On 16/05/2012 12:29, Friedmann Y. wrote: > Thanks Ian, Thanks very much for your tips! To make my question clearer, > here is a bit of my first and second attempts: > in the first attempt i am accessing the pixels u(i,j) and in the second > attempt I am using pointers to the pixels. > one iteration in the first attempt takes 0.9 secs, and in second attempt > - 0.7 secs. Is this reasonable? I would have thought that the second way > would be much much faster? The ratio of those numbers is not out of line with what I would expect. I can't comment on their absolute value since I don't know the size of your image, the complexity of your per-pixel calculation, or even the speed of your machine. As I said previously, indexing is not that expensive if you use a good optimising compiler. Still, you can usually extract slightly higher performance by doing some pointer arithmetic yourself, which is what you found, and what vil_math tries to do. I would have expected that the costs of your loop are dominated by the floating point calculations, not the loop variant, or the integer-multiplication used during the indexing. These integer operations are cheap, and the compiler's optimiser might well be able to spot the similarity of all the pixel dereferences and avoid unnecessary indexing calculations. If the 20% improvement in speed is an issue for you, then you can implement your code that way. But as Tony Hoare (allegedy) said "Premature optimisation is the root of all evil." For example, in my world - large (>1Gb) 3D volume images and effectively random-access-lookup to the image - the bottleneck often appears to be copying the data from main memory to and from the CPU's L3 cache. Fiddling around with the loop structure often has little effect other than at best obscuring the meaning of the code. Often it introduces hard-to-find errors. Ian. PS Please keep these discussions on vxl-users. See the vxl-users-policy for reasons. > > many thanks again for taking the time with this! > > Yasmin: > > first attempt: > > vil_image_view func1(vil_image_view g_image,int itot) { > > float BS,CS(0.),DS(0.),ES(0.); > float FS(0.),GS(0.),HS(0.),IS(0.); > float JS(0.),KS(0.),LS(0.),MS(0.); > > vil_image_view u0,cog; > vil_copy_deep(g_image,u0); > vil_copy_deep(g_image,cog); > > > > for (int it=1;it { > clock_t tStart = clock(); > for (unsigned int j=1;j for (unsigned int i=1; i { > BS=(u0(i+1,j)+u0(i-1,j))/2.; > CS=(u0(i,j+1)+u0(i,j-1))/2.; > DS=(u0(i+1,j-1)+u0(i-1,j+1))/2.; > ES=(u0(i-1,j-1)+u0(i+1,j+1))/2.; > //etc > } > } > double timef = (double)(clock() - tStart)/CLOCKS_PER_SEC; > vcl_cout<< it << '\t'<< timef << " seconds" < } > > return u0; > } > > > second attempt (using the math.h way of going over an image) > > void func2(vil_image_view& imB,vil_image_view& im_sum,int > itot) { > > vil_image_view imA; > vil_copy_deep(imB,imA); > unsigned ni = imA.ni();//width > unsigned nj = imA.nj(); //height > unsigned np = imA.nplanes(); > im_sum.set_size(ni,nj,np); > vcl_ptrdiff_t istepA=imA.istep(),jstepA=imA.jstep(),pstepA = > imA.planestep(); > vcl_ptrdiff_t istepS=im_sum.istep(),jstepS=im_sum.jstep(),pstepS = > im_sum.planestep(); > vcl_ptrdiff_t istepB=imB.istep(),jstepB=imB.jstep(),pstepB = > imB.planestep(); > const float* planeA = imA.top_left_ptr(); > const float* u11 = planeA+1+ni; > const float* planeB = imB.top_left_ptr(); > const float* b11 = planeB+1+ni; > float* planeS = im_sum.top_left_ptr(); > float* cog11 = planeS+1+ni; > > float* > BS=imA.top_left_ptr(),*CS=imA.top_left_ptr(),*DS=imA.top_left_ptr(),*ES=imA.top_left_ptr(); > float* > FS=imA.top_left_ptr(),*GS=imA.top_left_ptr(),*HS=imA.top_left_ptr(),*IS=imA.top_left_ptr(); > float* > JS=imA.top_left_ptr(),*KS=imA.top_left_ptr(),*LS=imA.top_left_ptr(),*MS=imA.top_left_ptr(); > // vil_image_view u0,cog; > const float two = 2.0; > //vil_copy_deep(imA,u0); > //vil_copy_deep(imA,cog); > > for (int it=1;it { > clock_t tStart = clock(); > for (unsigned p=0;p pstepS){ > const float* rowA = u11; > const float* rowB = b11; > float* rowS = cog11; > for (unsigned j=0;j const float* pixelA = rowA; > const float* pixelB = rowB; > float* pixelS = rowS; > for (unsigned i=0;i float aa = float(*(pixelA+1)); > float bb = float(*(pixelA-1)); > > *BS=(aa+bb)/two; > *CS=(float(*(pixelA+ni))+float(*(pixelA-ni)))/two; > *DS=(float(*(pixelA+1-ni))+float(*(pixelA-1+ni)))/2.0f; > //DS=(u0(i+1,j-1)+u0(i-1,j+1))/2.; > *ES=(float(*(pixelA-1-ni))+float(*(pixelA+1+ni)))/2.0f; > //ES=(u0(i-1,j-1)+u0(i+1,j+1))/2.; > > //etc... > } ```
 Re: [Vxl-users] vectorise image From: Friedmann Y. - 2012-05-16 12:52:43 Attachments: Message as HTML ```So is it right to assume that when using vectors in MATLAB to do the same calculations, their 10 times higher efficiency is due to compiler optimization? Yasmin On 16/05/2012 12:29, Friedmann Y. wrote: > Thanks Ian, Thanks very much for your tips! To make my question clearer, > here is a bit of my first and second attempts: > in the first attempt i am accessing the pixels u(i,j) and in the second > attempt I am using pointers to the pixels. > one iteration in the first attempt takes 0.9 secs, and in second attempt > - 0.7 secs. Is this reasonable? I would have thought that the second way > would be much much faster? The ratio of those numbers is not out of line with what I would expect. I can't comment on their absolute value since I don't know the size of your image, the complexity of your per-pixel calculation, or even the speed of your machine. As I said previously, indexing is not that expensive if you use a good optimising compiler. Still, you can usually extract slightly higher performance by doing some pointer arithmetic yourself, which is what you found, and what vil_math tries to do. I would have expected that the costs of your loop are dominated by the floating point calculations, not the loop variant, or the integer-multiplication used during the indexing. These integer operations are cheap, and the compiler's optimiser might well be able to spot the similarity of all the pixel dereferences and avoid unnecessary indexing calculations. If the 20% improvement in speed is an issue for you, then you can implement your code that way. But as Tony Hoare (allegedy) said "Premature optimisation is the root of all evil." For example, in my world - large (>1Gb) 3D volume images and effectively random-access-lookup to the image - the bottleneck often appears to be copying the data from main memory to and from the CPU's L3 cache. Fiddling around with the loop structure often has little effect other than at best obscuring the meaning of the code. Often it introduces hard-to-find errors. Ian. PS Please keep these discussions on vxl-users. See the vxl-users-policy for reasons. > > many thanks again for taking the time with this! > > Yasmin: > > first attempt: > > vil_image_view func1(vil_image_view g_image,int itot) { > > float BS,CS(0.),DS(0.),ES(0.); > float FS(0.),GS(0.),HS(0.),IS(0.); > float JS(0.),KS(0.),LS(0.),MS(0.); > > vil_image_view u0,cog; > vil_copy_deep(g_image,u0); > vil_copy_deep(g_image,cog); > > > > for (int it=1;it { > clock_t tStart = clock(); > for (unsigned int j=1;j for (unsigned int i=1; i { > BS=(u0(i+1,j)+u0(i-1,j))/2.; > CS=(u0(i,j+1)+u0(i,j-1))/2.; > DS=(u0(i+1,j-1)+u0(i-1,j+1))/2.; > ES=(u0(i-1,j-1)+u0(i+1,j+1))/2.; > //etc > } > } > double timef = (double)(clock() - tStart)/CLOCKS_PER_SEC; > vcl_cout<< it << '\t'<< timef << " seconds" < } > > return u0; > } > > > second attempt (using the math.h way of going over an image) > > void func2(vil_image_view& imB,vil_image_view& im_sum,int > itot) { > > vil_image_view imA; > vil_copy_deep(imB,imA); > unsigned ni = imA.ni();//width > unsigned nj = imA.nj(); //height > unsigned np = imA.nplanes(); > im_sum.set_size(ni,nj,np); > vcl_ptrdiff_t istepA=imA.istep(),jstepA=imA.jstep(),pstepA = > imA.planestep(); > vcl_ptrdiff_t istepS=im_sum.istep(),jstepS=im_sum.jstep(),pstepS = > im_sum.planestep(); > vcl_ptrdiff_t istepB=imB.istep(),jstepB=imB.jstep(),pstepB = > imB.planestep(); > const float* planeA = imA.top_left_ptr(); > const float* u11 = planeA+1+ni; > const float* planeB = imB.top_left_ptr(); > const float* b11 = planeB+1+ni; > float* planeS = im_sum.top_left_ptr(); > float* cog11 = planeS+1+ni; > > float* > BS=imA.top_left_ptr(),*CS=imA.top_left_ptr(),*DS=imA.top_left_ptr(),*ES=imA.top_left_ptr(); > float* > FS=imA.top_left_ptr(),*GS=imA.top_left_ptr(),*HS=imA.top_left_ptr(),*IS=imA.top_left_ptr(); > float* > JS=imA.top_left_ptr(),*KS=imA.top_left_ptr(),*LS=imA.top_left_ptr(),*MS=imA.top_left_ptr(); > // vil_image_view u0,cog; > const float two = 2.0; > //vil_copy_deep(imA,u0); > //vil_copy_deep(imA,cog); > > for (int it=1;it { > clock_t tStart = clock(); > for (unsigned p=0;p pstepS){ > const float* rowA = u11; > const float* rowB = b11; > float* rowS = cog11; > for (unsigned j=0;j const float* pixelA = rowA; > const float* pixelB = rowB; > float* pixelS = rowS; > for (unsigned i=0;i float aa = float(*(pixelA+1)); > float bb = float(*(pixelA-1)); > > *BS=(aa+bb)/two; > *CS=(float(*(pixelA+ni))+float(*(pixelA-ni)))/two; > *DS=(float(*(pixelA+1-ni))+float(*(pixelA-1+ni)))/2.0f; > //DS=(u0(i+1,j-1)+u0(i-1,j+1))/2.; > *ES=(float(*(pixelA-1-ni))+float(*(pixelA+1+ni)))/2.0f; > //ES=(u0(i-1,j-1)+u0(i+1,j+1))/2.; > > //etc... > } ```
 Re: [Vxl-users] vectorise image From: Ian Scott - 2012-05-16 13:15:50 ```On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlab-clone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation - therefore no opportunity for any optimisation. Ian. ```
 Re: [Vxl-users] vectorise image From: Friedmann Y. - 2012-05-16 13:57:01 Attachments: Message as HTML ```so how is it that the vectorised calcs are so much faster in matlab? -----Original Message----- From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlab-clone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation - therefore no opportunity for any optimisation. Ian. ```
 Re: [Vxl-users] vectorise image From: Ian Scott - 2012-05-16 14:26:41 ``` On 16/05/2012 14:55, Friedmann Y. wrote: > so how is it that the vectorised calcs are so much faster in matlab? Taking the example of a = some_large_matrix .^ 2 This is cheap to parse and dispatch. The only interpreter-level operations are 1. Parse line. 2. look up variable "large_matrix" and store in internal register V1. 3. look up/create variable "a" once store in internal register V2. 4. Call internal function per_element_power(V1, 2, V2). The internal function per_element_power will have been written in C (or C++, FORTRAN, or assembler) and will have been optimised by which ever compiler they used to compile MATLAB. If you wrote the full loop version for i=1:size(some_large_matrix,1) for j=1:size(some_large_matrix,2) a(i,j) = some_large_matrix(i,j) ^ 2; then the interpreter would repeated be asking 1-9: Loop stuff 10. Parse loop internals. 11. lookup variable some_large_matrix and store in v1 12. lookup variable i and store in v2 13. lookup variable j and store in v3 14. call dereference_matrix(v1, v2, v3, v4) 15-18 - same again for a into v8 19. Call internal function full_power(v4, 2, v8) 20-25 Some more loop stuff - jump back to line 4-ish several thousand times. I'm afraid these questions are getting a little far from VXL. I'd suggest reading a book, or taking a course on compilers and interpreters, if you want to know more. Ian. ```
 Re: [Vxl-users] vectorise image From: Wheeler, Frederick W (GE Global Research) - 2012-05-16 14:33:21 Attachments: Message as HTML ``` Are you wondering why vectorized calcs in Matlab are faster than non-vectorized calculations in Matlab? Or are you wondering why vectorized calculations in Matlab are faster that the same calculations in VXL? From: Friedmann Y. [mailto:Y.Friedmann@...] Sent: Wednesday, May 16, 2012 9:55 AM To: Ian Scott Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image so how is it that the vectorised calcs are so much faster in matlab? -----Original Message----- From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlab-clone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation - therefore no opportunity for any optimisation. Ian. ```
 Re: [Vxl-users] vectorise image From: Rasmus Reinhold Paulsen - 2012-05-16 14:37:50 Attachments: Message as HTML ```I cannot say if that is true - it would a big surprise for me. In particular if you are a little careful in your C++ calling. However, GPU optimized code etc can do a lot today. Perhaps certain underlying routines in Matlab are optimized for multi-cores/GPUs etc. What I do know is that you have to be careful how you measure your time. In particular clock() is not good to use in small loops (granularity of 10 ms, if I remember correctly). Either call you your routine 1000 times an measure time outside or use a better timer. I think there is one called something like QueryPerformanceCounter() However, I am not following the recent trends...so my information might be a little stale... Cheers, Rasmus From: Friedmann Y. [mailto:Y.Friedmann@...] Sent: 16. maj 2012 15:55 To: Ian Scott Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image so how is it that the vectorised calcs are so much faster in matlab? -----Original Message----- From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlab-clone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation - therefore no opportunity for any optimisation. Ian. ```
 Re: [Vxl-users] vectorise image From: Friedmann Y. - 2012-05-16 14:53:11 Attachments: Message as HTML ```I am wondering why vectorized calcs in Matlab are faster that the same calculations in VXL, even using the pointer-wisw code ? -----Original Message----- From: Wheeler, Frederick W (GE Global Research) [mailto:wheeler@...] Sent: Wed 16/05/2012 15:32 To: Friedmann Y. Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image Are you wondering why vectorized calcs in Matlab are faster than non-vectorized calculations in Matlab? Or are you wondering why vectorized calculations in Matlab are faster that the same calculations in VXL? From: Friedmann Y. [mailto:Y.Friedmann@...] Sent: Wednesday, May 16, 2012 9:55 AM To: Ian Scott Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image so how is it that the vectorised calcs are so much faster in matlab? -----Original Message----- From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlab-clone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation - therefore no opportunity for any optimisation. Ian. ```
 Re: [Vxl-users] vectorise image From: Ian Scott - 2012-05-16 15:12:42 ```On 16/05/2012 15:50, Friedmann Y. wrote: > I am wondering why vectorized calcs in Matlab are faster that the same > calculations > in VXL, even using the pointer-wisw code ? Ahh. I understand your question now. Probably due to non-aliasing assumptions and use of SSE SIMD extensions on x86. I also believe recent versions of Matlab can use the GPU for some work. I and an intern tried to get the vnl to use SSE2 intrinsics, but we gave up before it worked reliably across the variety of compilers & platforms in use on the dashboard. I've also had a go at adding non-alias directives - but without any detectable improvement. If all your work can be easily be coded in Matlab vectorised format then use Matlab. If you need to use C++/VXL for other reasons, and you want to have a go at finishing the vnl/SSE stuff - I'll happily point you at the right direction. Ian. ```
 Re: [Vxl-users] vectorise image From: Friedmann Y. - 2012-05-16 15:05:59 Attachments: Message as HTML ```Thanks Rasmus, I will give it a try... -----Original Message----- From: Rasmus Reinhold Paulsen [mailto:rrp@...] Sent: Wed 16/05/2012 15:37 To: Vxl-Users Subject: Re: [Vxl-users] vectorise image I cannot say if that is true - it would a big surprise for me. In particular if you are a little careful in your C++ calling. However, GPU optimized code etc can do a lot today. Perhaps certain underlying routines in Matlab are optimized for multi-cores/GPUs etc. What I do know is that you have to be careful how you measure your time. In particular clock() is not good to use in small loops (granularity of 10 ms, if I remember correctly). Either call you your routine 1000 times an measure time outside or use a better timer. I think there is one called something like QueryPerformanceCounter() However, I am not following the recent trends...so my information might be a little stale... Cheers, Rasmus From: Friedmann Y. [mailto:Y.Friedmann@...] Sent: 16. maj 2012 15:55 To: Ian Scott Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image so how is it that the vectorised calcs are so much faster in matlab? -----Original Message----- From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 14:15 To: Friedmann Y. Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image On 16/05/2012 13:49, Friedmann Y. wrote: > > So is it right to assume that when using vectors in MATLAB to do the > same calculations, their 10 times higher efficiency is due to compiler > optimization? > > Yasmin It is long time since I used matlab proper, but at that time it was a not a compiled language. Octave, the GPL matlab-clone behaves that way now. Everything was looked up on demand. Not just indexing, but even variable name dereferencing. Loop content was evaluated (and possibly even parsed) afresh every iteration. No compilation - therefore no opportunity for any optimisation. Ian. ```
 Re: [Vxl-users] vectorise image From: Friedmann Y. - 2012-05-17 08:52:08 Attachments: Message as HTML ```Infact, my project is to transfer code from matlab to c++ and incorporate it into a windows application. I definitely not got the kind of knowledge to finish your work on the vnl/SSE stuff at the moment... speed is not the main issue really, I was just a bit frustrated that I wasnt "beating" the matlab code. cheers! -----Original Message----- From: Ian Scott [mailto:scottim@...] Sent: Wed 16/05/2012 16:12 To: Friedmann Y. Cc: Vxl-Users Subject: Re: [Vxl-users] vectorise image On 16/05/2012 15:50, Friedmann Y. wrote: > I am wondering why vectorized calcs in Matlab are faster that the same > calculations > in VXL, even using the pointer-wisw code ? Ahh. I understand your question now. Probably due to non-aliasing assumptions and use of SSE SIMD extensions on x86. I also believe recent versions of Matlab can use the GPU for some work. I and an intern tried to get the vnl to use SSE2 intrinsics, but we gave up before it worked reliably across the variety of compilers & platforms in use on the dashboard. I've also had a go at adding non-alias directives - but without any detectable improvement. If all your work can be easily be coded in Matlab vectorised format then use Matlab. If you need to use C++/VXL for other reasons, and you want to have a go at finishing the vnl/SSE stuff - I'll happily point you at the right direction. Ian. ```