From: Ian Scott <scottim@im...>  20120516 12:29:27

On 16/05/2012 12:29, Friedmann Y. wrote: > Thanks Ian, Thanks very much for your tips! To make my question clearer, > here is a bit of my first and second attempts: > in the first attempt i am accessing the pixels u(i,j) and in the second > attempt I am using pointers to the pixels. > one iteration in the first attempt takes 0.9 secs, and in second attempt >  0.7 secs. Is this reasonable? I would have thought that the second way > would be much much faster? The ratio of those numbers is not out of line with what I would expect. I can't comment on their absolute value since I don't know the size of your image, the complexity of your perpixel calculation, or even the speed of your machine. As I said previously, indexing is not that expensive if you use a good optimising compiler. Still, you can usually extract slightly higher performance by doing some pointer arithmetic yourself, which is what you found, and what vil_math tries to do. I would have expected that the costs of your loop are dominated by the floating point calculations, not the loop variant, or the integermultiplication used during the indexing. These integer operations are cheap, and the compiler's optimiser might well be able to spot the similarity of all the pixel dereferences and avoid unnecessary indexing calculations. If the 20% improvement in speed is an issue for you, then you can implement your code that way. But as Tony Hoare (allegedy) said "Premature optimisation is the root of all evil." For example, in my world  large (>1Gb) 3D volume images and effectively randomaccesslookup to the image  the bottleneck often appears to be copying the data from main memory to and from the CPU's L3 cache. Fiddling around with the loop structure often has little effect other than at best obscuring the meaning of the code. Often it introduces hardtofind errors. Ian. PS Please keep these discussions on vxlusers. See the vxluserspolicy for reasons. > > many thanks again for taking the time with this! > > Yasmin: > > first attempt: > > vil_image_view<float> func1(vil_image_view<float> g_image,int itot) { > > float BS,CS(0.),DS(0.),ES(0.); > float FS(0.),GS(0.),HS(0.),IS(0.); > float JS(0.),KS(0.),LS(0.),MS(0.); > > vil_image_view<float> u0,cog; > vil_copy_deep(g_image,u0); > vil_copy_deep(g_image,cog); > > > > for (int it=1;it<itot+1;it++) > { > clock_t tStart = clock(); > for (unsigned int j=1;j<g_image.nj()1; j++) > for (unsigned int i=1; i<g_image.ni()1; i++) > { > BS=(u0(i+1,j)+u0(i1,j))/2.; > CS=(u0(i,j+1)+u0(i,j1))/2.; > DS=(u0(i+1,j1)+u0(i1,j+1))/2.; > ES=(u0(i1,j1)+u0(i+1,j+1))/2.; > //etc > } > } > double timef = (double)(clock()  tStart)/CLOCKS_PER_SEC; > vcl_cout<< it << '\t'<< timef << " seconds" <<vcl_endl; > } > > return u0; > } > > > second attempt (using the math.h way of going over an image) > > void func2(vil_image_view<float>& imB,vil_image_view<float>& im_sum,int > itot) { > > vil_image_view<float> imA; > vil_copy_deep(imB,imA); > unsigned ni = imA.ni();//width > unsigned nj = imA.nj(); //height > unsigned np = imA.nplanes(); > im_sum.set_size(ni,nj,np); > vcl_ptrdiff_t istepA=imA.istep(),jstepA=imA.jstep(),pstepA = > imA.planestep(); > vcl_ptrdiff_t istepS=im_sum.istep(),jstepS=im_sum.jstep(),pstepS = > im_sum.planestep(); > vcl_ptrdiff_t istepB=imB.istep(),jstepB=imB.jstep(),pstepB = > imB.planestep(); > const float* planeA = imA.top_left_ptr(); > const float* u11 = planeA+1+ni; > const float* planeB = imB.top_left_ptr(); > const float* b11 = planeB+1+ni; > float* planeS = im_sum.top_left_ptr(); > float* cog11 = planeS+1+ni; > > float* > BS=imA.top_left_ptr(),*CS=imA.top_left_ptr(),*DS=imA.top_left_ptr(),*ES=imA.top_left_ptr(); > float* > FS=imA.top_left_ptr(),*GS=imA.top_left_ptr(),*HS=imA.top_left_ptr(),*IS=imA.top_left_ptr(); > float* > JS=imA.top_left_ptr(),*KS=imA.top_left_ptr(),*LS=imA.top_left_ptr(),*MS=imA.top_left_ptr(); > // vil_image_view<float> u0,cog; > const float two = 2.0; > //vil_copy_deep(imA,u0); > //vil_copy_deep(imA,cog); > > for (int it=1;it<itot+1;it++) > { > clock_t tStart = clock(); > for (unsigned p=0;p<np;++p,planeA += pstepA,planeB += pstepB,planeS += > pstepS){ > const float* rowA = u11; > const float* rowB = b11; > float* rowS = cog11; > for (unsigned j=0;j<nj2;++j,rowA += jstepA,rowB += jstepB,rowS += jstepS){ > const float* pixelA = rowA; > const float* pixelB = rowB; > float* pixelS = rowS; > for (unsigned i=0;i<ni2;++i,pixelA+=istepA,pixelB+=istepB,pixelS+=istepS){ > float aa = float(*(pixelA+1)); > float bb = float(*(pixelA1)); > > *BS=(aa+bb)/two; > *CS=(float(*(pixelA+ni))+float(*(pixelAni)))/two; > *DS=(float(*(pixelA+1ni))+float(*(pixelA1+ni)))/2.0f; > //DS=(u0(i+1,j1)+u0(i1,j+1))/2.; > *ES=(float(*(pixelA1ni))+float(*(pixelA+1+ni)))/2.0f; > //ES=(u0(i1,j1)+u0(i+1,j+1))/2.; > > //etc... > } 