Re: [Vxl-maintainers] beginnings of vpdl added to core

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mar 20, 2009, at 12:51 PM, Ian Scott wrote:

> Matthew Leotta wrote:
>> On Mar 20, 2009, at 9:19 AM, Ian Scott wrote:
>>> Marcus Brubaker wrote:
>>>>>> Fair enough.  In that case I would add methods to the base  
>>>>>> class  which support evaluating the density at multiple  
>>>>>> points.  I.E.,  they take as input a matrix where each column  
>>>>>> (or row) is a  single element and they return a vector with the  
>>>>>> pdf evaluated at  each point.  This would allow efficient PDF  
>>>>>> evaluation at  multiple points.  The nice thing is that a  
>>>>>> default implementation  in terms of prob_density can be used to  
>>>>>> reduce overhead for  people implementing distributions.
>>>>>>
>>>
>>> Matthew Leotta wrote:
>>>>> I have to admit, my experience in implementing various   
>>>>> distributions is quite limited.  It hadn't crossed my mind that  
>>>>> it  could be significantly more efficient to evaluate a  
>>>>> distribution  at many points in a single function call (rather  
>>>>> than a function  call for each point).  Is your reasoning for  
>>>>> this based on virtual  function overhead and multiple  
>>>>> evaluations of a normalization  constant, or are there other  
>>>>> factors?  Could you give an example  of a distribution that  
>>>>> would benefit from this?
>>> Marcus Brubaker wrote:
>>>> All of the above really.  Virtual function overhead and   
>>>> normalization constants can hurt.  More generally though being  
>>>> able  to use SIMD instructions and preserving cache locality can  
>>>> be a big  win.  Pretty much any distribution could be helped by  
>>>> this if  you're evaluating at more than a few points.  E.G., for  
>>>> a  multivariate gaussian, one matrix-matrix multiplication is   
>>>> typically cheaper (due to SIMD instructions, cache locality,  
>>>> etc)  than multiple matrix vector multiplies.
>>> For anything other than a trivial PDF (possibly a 1D Epanechnikov   
>>> quadratic kernel?), I would have thought that the virtual  
>>> function  call cost would be trivial.
>> The virtual functions do seem to be trivial with respect to  
>> calling  time.  However, the existence of any virtual functions  
>> adds to the  size of the class, and this may not be trivial for  
>> memory allocation.   Consider a scalar Gaussian of type float.  It  
>> requires 8 bytes (4 for  each of 2 floats for mean and variance).   
>> When you use virtual  functions you need 12 bytes (4 more for the  
>> virtual function table  pointer).  That's 50% more and is not  
>> trivial when you have an array  of a million distributions.
>> This is the main reason I added vpdt and went back to the idea of  
>> a  pure template library that is wrapped by the polymorphic  
>> strategy tree  in vpdl.  The lack of virtual functions may also  
>> allow more inlining  and optimizations by some compilers.
>
> Oh - absolutely. I was only considering the vpdl strategy tree case,  
> where you have already agreed to the VFPT pointer cost.
>
>>> Whilst the arguments for SIMD compatibility and cache-locality is   
>>> true, the problems with applying these arguments are that
>>>
>>> 1. They pretty-much require vnl_matrix types for parameters and   
>>> results, destroying the type idioms we use in VXL and C++ in   
>>> general. In extremis you have Matlab's everything-is-a-matrix mess.
>>>
>>> 2. If you apply this argument everywhere you end up massively   
>>> multiplying your API to deal with all cases where SIMD speedups   
>>> might be needed - since your particular matrix folding will  
>>> depend  on the particular cache-locality characteristics of your  
>>> data  configuration and algorithms.
>>>
>>> 3. Reordering code for SIMD and cache locality really should be  
>>> the  domain of the optimising compiler - although I should admit  
>>> to the  endlessly-dashed optimism I have towards compiler  
>>> performance.
>>>
>>> I would suggest one of the following options.
>>>
>>> 1. Not providing multi-sample methods in the general base-class  
>>> vpdl  API. If someone wants a specific case they can either write  
>>> their  own private code tailored to their requirements. At best it  
>>> could be  added to either the relevant vpdt PDF class, or derived  
>>> vpdl PDF class
>>>
>>> 2. Providing a general multi-sample API that uses iterators. If  
>>> the  iterators point to compact memory, then you will get cache  
>>> locality  for free and SIMD if you do some careful iterator-traits  
>>> template  wizardry. This way you maintain the integrity of the  
>>> type system and  limit the API size increase.
>>>
>>> I kind of like solution 2 because it also ties in with a good  
>>> design  for the builders. The use of iterators could merge the API  
>>> for batch  and online learning of PDF parameters. Currently vpdfl  
>>> uses  mbl_data_wrapper which vaguely equivalent to a pair of STL   
>>> iterators. Replacing this with proper iterator pairs could make  
>>> the  code much more accessible to existing C++ programmers, and  
>>> leverage  the existing iterator based algorithms.
>> I agree that 2 is better.  Each distribution should know how to  
>> handle  a single point and external functions can iterate over  
>> multiple  points.  This is why I've added virtual functions to the  
>> vpdl base  distribution for normalization constant and unnormalized  
>> density.  The  biggest savings will probably come from computing  
>> normalization once  and then applying it to each unnormalized  
>> density as they are computed.
>>> The only trick is to avoid the huge template instantiation sizes  
>>> of  using templated iterators with complicated algorithms. One  
>>> way  around this it to proved a polymorphic strategy tree of  
>>> iterator  types, and use this where there are no speed  
>>> optimisations.
>>>
>> I'm not convinced that huge template instantiation will be an  
>> issue.   I think most of these functions are simple enough to be  
>> inlined in a  header file and only instantiated by the user as  
>> needed (i.e. not in  libvpdl for several possible iterator type).
>
> Well I guess it depends on what you think is too big - Some of the  
> PDF building functions in vpdfl are well over 100 lines. It wouldn't  
> be hard to think of a MAP-estimating PDF learning algorithm that  
> used cross-validation, higher-order statistical biases and/or  
> complexity control and that had 1000 LoC.

I was actually only referring to the original concern that Marcus had  
about efficiently evaluating probabilities at a collections of  
points.  These types of functions should be generally be under 10  
LoC.  I don't think builder functions should be treated the same way.   
However, it might be possible to have light-weight templated functions  
that wrap an arbitrary iterator into a polymorphic iterator and then  
call a builder implementation that uses the polymorphic iterator base  
class.  This would allow the user to use any type of iterator and hide  
the fact that a special polymorphic iterator type is required.  I  
haven't thought about this enough to know exactly how it might work.

>
> I know it isn't part of your brief, but the pattern used in vpdl may  
> well set the standard for future libraries that need to train  
> something based on lots of examples. vpdfl certainly set the  
> standard for a classifier API (see mul/clsfy) where some of my  
> private support vector machine training algorithms run into the  
> thousands LoC. It also set the standard for Manchester's private AAM  
> libraries, which have building algorithms running easily into the  
> 10^5 LoC.
>
> The additional problem with these other training algorithms is that  
> they often have multiple inputs - e.g. weights (which maybe used by  
> vpdl), or classification targets. The number of possible template  
> types grow polynomially.
>
> Still - I guess they could all be inlined - at the risk of a large  
> compilation time cost. In any case, any such template variation  
> should be limited to the templated build or density functions - it  
> would be disastrous if the iterator types affected the type of the  
> vpdl_distribution or vpdl_builder classes. We would have to put  
> every possible type combination and derivative into the vsl's  
> polymorphic loader scheme.

I certainly agree.  These templates for iterators should have no  
effect on the types used in distributions.

Matt