From: Kay F. J. <_k...@ya...> - 2011-11-26 12:11:48
|
Hi group! I'm doing some work on hugin's python interface, in the course of which I have experimentally introduced some code into libpano which might you might find interesting. I've described what I've done in http://groups.google.com/group/hugin-ptx/browse_thread/thread/19b059cf02bb3b6b# I think that there might be more code in libpano which would benefit from vectorization. The coding involved isn't a big deal, but the performance gains can be substantial. My proposed new routine would live in math.c: // KFJ 2011-11-23 // this function transforms many coordinates at once. // The coordinates to be transformed are passed via the pointers // x_dest and y_dest, and the result of the transformation is // stored in the memory pointed to by x_src and y_src. // all vectors have to have space for count doubles. // Memory pointed to by x_dest and y_dest will only be // read, not written to. // params points to a stack of fDesc structures, which // defines the order, type and additional parameters // of the partial transformations. int v_execute_stack_new ( double* x_dest, // x coordinates destination double* y_dest, // y coordinates destination double* x_src, // x coordinates source double* y_src, // y coordinates source int count, // number of coordinates void* params ) // pointer to vector of fDesc { register struct fDesc* stack = (struct fDesc *) params; register double * xd = x_dest ; register double * yd = y_dest ; register double * xs = x_src ; register double * ys = y_src ; register int loop_index = count ; if ( count <= 0 ) return 1 ; // running the first function on the stack, the destination // coordinates are taken from the x_dest and y_dest vectors // and the result is written to x_src and y_src if ( stack->func != NULL ) { while ( loop_index-- ) if ( ! (stack->func) ( *xd++ , *yd++ , xs++ , ys++ , stack->param ) ) return 0 ; stack++ ; } // from the second function on, the destination coordinates are // taken from x_src and y_src and written back to these vectors while( (stack->func) != NULL ) { loop_index = count ; xs = x_src ; ys = y_src ; while ( loop_index-- ) { if ( ! (stack->func) ( *xs , *ys , xs , ys , stack->param ) ) return 0 ; xs++ ; ys++ ; } stack++ ; } // if no call to a transformation function has caused a premature // return of zero, we arrive here and return 1 to indicate success. return 1 ; } I've used the most efficient looping strategy I found, saving the register doubles used in execute_stack (which, by the way, would benefit from adopting the same looping strategy - but it's only something like a five percent gain). This code is perliminary insofar as I intend to modify it to use strided data, but the modification would merely mean the introduction of the strides as additional parameters and their use instead of the plain incremetations - I might stick with this version for simplicity's sake, though, and require strided data to be reshaped on the python side, which wouldn' hurt too badly performancewise, since the actual transforms are the most expensive bits. If you would consider inclusion of such code into libpano, I'd happily provide a mercurial patch to integrate it into the body of code. What do you say? Kay |
From: Kay F. J. <k_...@we...> - 2012-01-29 10:33:57
|
Hi group! I have decided to abandon my proposal to introduce vectorized versions of execute_stack_new to libpano. The code performed as expected, but when I compared it to code which has the vectorization at a higher level and uses execute_stack_new unmodified, I found that there is no significant performance difference after all. Therefore I am now introducing the vectorization via hugin's python interface and hope to have the additional code in hugin's code base in the near future. In case any of you still feel you'd like to introduce a vectorized version of execute_stack_new to libpano, I still have the code and I'd be happy to share it. Kay |