Re: [Vxl-maintainers] beginnings of vpdl added to core

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Marcus Brubaker wrote:
>>> Fair enough.  In that case I would add methods to the base class 
>>> which support evaluating the density at multiple points.  I.E., they 
>>> take as input a matrix where each column (or row) is a single element 
>>> and they return a vector with the pdf evaluated at each point.  This 
>>> would allow efficient PDF evaluation at multiple points.  The nice 
>>> thing is that a default implementation in terms of prob_density can 
>>> be used to reduce overhead for people implementing distributions.
>>>

Matthew Leotta wrote:
>> I have to admit, my experience in implementing various distributions 
>> is quite limited.  It hadn't crossed my mind that it could be 
>> significantly more efficient to evaluate a distribution at many points 
>> in a single function call (rather than a function call for each 
>> point).  Is your reasoning for this based on virtual function overhead 
>> and multiple evaluations of a normalization constant, or are there 
>> other factors?  Could you give an example of a distribution that would 
>> benefit from this?

Marcus Brubaker wrote:
> All of the above really.  Virtual function overhead and normalization 
> constants can hurt.  More generally though being able to use SIMD 
> instructions and preserving cache locality can be a big win.  Pretty 
> much any distribution could be helped by this if you're evaluating at 
> more than a few points.  E.G., for a multivariate gaussian, one 
> matrix-matrix multiplication is typically cheaper (due to SIMD 
> instructions, cache locality, etc) than multiple matrix vector multiplies.
> 

For anything other than a trivial PDF (possibly a 1D Epanechnikov 
quadratic kernel?), I would have thought that the virtual function call 
cost would be trivial.

Whilst the arguments for SIMD compatibility and cache-locality is true, 
the problems with applying these arguments are that

1. They pretty-much require vnl_matrix types for parameters and results, 
destroying the type idioms we use in VXL and C++ in general. In extremis 
you have Matlab's everything-is-a-matrix mess.

2. If you apply this argument everywhere you end up massively 
multiplying your API to deal with all cases where SIMD speedups might be 
needed - since your particular matrix folding will depend on the 
particular cache-locality characteristics of your data configuration and 
algorithms.

3. Reordering code for SIMD and cache locality really should be the 
domain of the optimising compiler - although I should admit to the 
endlessly-dashed optimism I have towards compiler performance.

I would suggest one of the following options.

1. Not providing multi-sample methods in the general base-class vpdl 
API. If someone wants a specific case they can either write their own 
private code tailored to their requirements. At best it could be added 
to either the relevant vpdt PDF class, or derived vpdl PDF class

2. Providing a general multi-sample API that uses iterators. If the 
iterators point to compact memory, then you will get cache locality for 
free and SIMD if you do some careful iterator-traits template wizardry. 
This way you maintain the integrity of the type system and limit the API 
size increase.

I kind of like solution 2 because it also ties in with a good design for 
the builders. The use of iterators could merge the API for batch and 
online learning of PDF parameters. Currently vpdfl uses mbl_data_wrapper 
which vaguely equivalent to a pair of STL iterators. Replacing this with 
proper iterator pairs could make the code much more accessible to 
existing C++ programmers, and leverage the existing iterator based 
algorithms.

The only trick is to avoid the huge template instantiation sizes of 
using templated iterators with complicated algorithms. One way around 
this it to proved a polymorphic strategy tree of iterator types, and use 
this where there are no speed optimisations.

Ian.