From: Matthew Leotta <matt.leotta@gm...>  20090320 16:01:05

On Mar 20, 2009, at 9:19 AM, Ian Scott wrote: > > Marcus Brubaker wrote: >>>> Fair enough. In that case I would add methods to the base class >>>> which support evaluating the density at multiple points. I.E., >>>> they take as input a matrix where each column (or row) is a >>>> single element and they return a vector with the pdf evaluated at >>>> each point. This would allow efficient PDF evaluation at >>>> multiple points. The nice thing is that a default implementation >>>> in terms of prob_density can be used to reduce overhead for >>>> people implementing distributions. >>>> > > > Matthew Leotta wrote: >>> I have to admit, my experience in implementing various >>> distributions is quite limited. It hadn't crossed my mind that it >>> could be significantly more efficient to evaluate a distribution >>> at many points in a single function call (rather than a function >>> call for each point). Is your reasoning for this based on virtual >>> function overhead and multiple evaluations of a normalization >>> constant, or are there other factors? Could you give an example >>> of a distribution that would benefit from this? > > Marcus Brubaker wrote: >> All of the above really. Virtual function overhead and >> normalization constants can hurt. More generally though being able >> to use SIMD instructions and preserving cache locality can be a big >> win. Pretty much any distribution could be helped by this if >> you're evaluating at more than a few points. E.G., for a >> multivariate gaussian, one matrixmatrix multiplication is >> typically cheaper (due to SIMD instructions, cache locality, etc) >> than multiple matrix vector multiplies. > > For anything other than a trivial PDF (possibly a 1D Epanechnikov > quadratic kernel?), I would have thought that the virtual function > call cost would be trivial. The virtual functions do seem to be trivial with respect to calling time. However, the existence of any virtual functions adds to the size of the class, and this may not be trivial for memory allocation. Consider a scalar Gaussian of type float. It requires 8 bytes (4 for each of 2 floats for mean and variance). When you use virtual functions you need 12 bytes (4 more for the virtual function table pointer). That's 50% more and is not trivial when you have an array of a million distributions. This is the main reason I added vpdt and went back to the idea of a pure template library that is wrapped by the polymorphic strategy tree in vpdl. The lack of virtual functions may also allow more inlining and optimizations by some compilers. > Whilst the arguments for SIMD compatibility and cachelocality is > true, the problems with applying these arguments are that > > 1. They prettymuch require vnl_matrix types for parameters and > results, destroying the type idioms we use in VXL and C++ in > general. In extremis you have Matlab's everythingisamatrix mess. > > 2. If you apply this argument everywhere you end up massively > multiplying your API to deal with all cases where SIMD speedups > might be needed  since your particular matrix folding will depend > on the particular cachelocality characteristics of your data > configuration and algorithms. > > 3. Reordering code for SIMD and cache locality really should be the > domain of the optimising compiler  although I should admit to the > endlesslydashed optimism I have towards compiler performance. > > I would suggest one of the following options. > > 1. Not providing multisample methods in the general baseclass vpdl > API. If someone wants a specific case they can either write their > own private code tailored to their requirements. At best it could be > added to either the relevant vpdt PDF class, or derived vpdl PDF class > > 2. Providing a general multisample API that uses iterators. If the > iterators point to compact memory, then you will get cache locality > for free and SIMD if you do some careful iteratortraits template > wizardry. This way you maintain the integrity of the type system and > limit the API size increase. > > I kind of like solution 2 because it also ties in with a good design > for the builders. The use of iterators could merge the API for batch > and online learning of PDF parameters. Currently vpdfl uses > mbl_data_wrapper which vaguely equivalent to a pair of STL > iterators. Replacing this with proper iterator pairs could make the > code much more accessible to existing C++ programmers, and leverage > the existing iterator based algorithms. I agree that 2 is better. Each distribution should know how to handle a single point and external functions can iterate over multiple points. This is why I've added virtual functions to the vpdl base distribution for normalization constant and unnormalized density. The biggest savings will probably come from computing normalization once and then applying it to each unnormalized density as they are computed. > The only trick is to avoid the huge template instantiation sizes of > using templated iterators with complicated algorithms. One way > around this it to proved a polymorphic strategy tree of iterator > types, and use this where there are no speed optimisations. > I'm not convinced that huge template instantiation will be an issue. I think most of these functions are simple enough to be inlined in a header file and only instantiated by the user as needed (i.e. not in libvpdl for several possible iterator type). Matt 
From: Matthew Leotta <matt.leotta@gm...>  20090216 16:30:53

Dear All, I've just checked in the first part of vpdl (probability distribution library) into the vxl core. I have not modified any CMakeLists.txt outside of vpdl. If you want to build it you'll need to manually add it to core/CMakeLists.txt. At some point I can check in a CMake build option that defaults to off. If you have time, please take a look at the code and let me know if the design looks reasonable. It is quite incomplete at this point, but it should give you the idea. I'm only working on the distribution classes and I welcome help in contributing the design of builders, samplers, or any other essential parts. When I get community approval for this design I'll added more distributions and write the book chapter. The general design is like this: vpdl_distribution<T,n> is the templated base class for distributions. The template parameters are T, the floating point type (float or double) and n, the dimension. For n > 1 the distributions work with vnl_vector_fixed<T,n> and vnl_matrix_fixed<T,n,n> types. For n == 1 they work with T directly for scalar computations. The special case of n == 0 (the default) works with vnl_vector<T> and vnl_matrix<T> for dimension specified at run time. While vpdl_distribution<T,n> should be used as the base class, it is inherited from vpdl_base_traits<T,n>. vpdl_base_traits is partially specialized to create typedefs and functions that reduce the need for specialization in later derived classes. For example, vpdl_distribution<T,n>::index(v,i) is a static member function that provides access to the ith element of vector v even if v is really a scalar of type T (in which case it returns the scalar value). I currently have two working distributions with test cases: vpdl_gaussian_sphere and vpdl_gaussian_indep. Both are restricted version of Gaussian distributions (with hyperspherical and independent covariances). The general Gaussian is a work in progress, and I think it will used the eigenvector representation of mul/vpdfl. I've inlined everything in the Gaussians so that no .txx file is needed. The vpdl_distribution does use .txx with many instantiations in the Templates subdirectory. Does anyone have any preference on the use of .txx files here? It seems like a very large number of files would be needed in the Templates directory if I use .txx files. Many of these instantiations might rarely be used. However, if everything is inlined it could lead to more code bloat. This design integrates bsta more tightly than originally considered. If the virtual functions do not create too large of a performance hit, then I might not need to create separate classes with wrappers. I will need to do performance tests to know for sure. Matt 
From: Marcus Brubaker <mbrubake@cs...>  20090303 18:10:20

As a lurker but someone with a strong interest probability distribution code, I have a few thoughts. 1. Many useful distributions are not over vectors or scalars. E.G., distributions over quaternions and matrices. While practically anything can be shoehorned into a distribution over a vector, it's worth thinking about how you would implement a distribution over other types. 2. Not all distribution functions can be readily normalized and many uses of distributions don't require the normalization constant. It may be worth allowing both of these cases. 3. Computing the cdf, inverse cdf, etc can be hard or practically impossible (except for approximations) for many distributions. Handling this gracefully is important. At the very least, when documenting the base class, give guidance on how to handle this. E.G., when is it appropriate to use an approximation. How should unimplemented functions be handled? Maybe there should be "approximate" versions or an approximation parameters for some of these functions which allows the user to specify whether approximations are allowed or how accurate things need to be. Anyway, vpdl looks like a great start and I'm looking forward to seeing where it goes from here. Cheers, Marcus Matthew Leotta wrote: > Dear All, > > I've just checked in the first part of vpdl (probability distribution > library) into the vxl core. I have not modified any CMakeLists.txt > outside of vpdl. If you want to build it you'll need to manually add > it to core/CMakeLists.txt. At some point I can check in a CMake build > option that defaults to off. > > If you have time, please take a look at the code and let me know if > the design looks reasonable. It is quite incomplete at this point, > but it should give you the idea. I'm only working on the distribution > classes and I welcome help in contributing the design of builders, > samplers, or any other essential parts. When I get community approval > for this design I'll added more distributions and write the book > chapter. > > The general design is like this: vpdl_distribution<T,n> is the > templated base class for distributions. The template parameters are > T, the floating point type (float or double) and n, the dimension. > For n > 1 the distributions work with vnl_vector_fixed<T,n> and > vnl_matrix_fixed<T,n,n> types. For n == 1 they work with T directly > for scalar computations. The special case of n == 0 (the default) > works with vnl_vector<T> and vnl_matrix<T> for dimension specified at > run time. While vpdl_distribution<T,n> should be used as the base > class, it is inherited from vpdl_base_traits<T,n>. vpdl_base_traits > is partially specialized to create typedefs and functions that reduce > the need for specialization in later derived classes. For example, > vpdl_distribution<T,n>::index(v,i) is a static member function that > provides access to the ith element of vector v even if v is really a > scalar of type T (in which case it returns the scalar value). > > I currently have two working distributions with test cases: > vpdl_gaussian_sphere and vpdl_gaussian_indep. Both are restricted > version of Gaussian distributions (with hyperspherical and > independent covariances). The general Gaussian is a work in progress, > and I think it will used the eigenvector representation of mul/vpdfl. > > I've inlined everything in the Gaussians so that no .txx file is > needed. The vpdl_distribution does use .txx with many instantiations > in the Templates subdirectory. Does anyone have any preference on the > use of .txx files here? It seems like a very large number of files > would be needed in the Templates directory if I use .txx files. Many > of these instantiations might rarely be used. However, if everything > is inlined it could lead to more code bloat. > > This design integrates bsta more tightly than originally considered. > If the virtual functions do not create too large of a performance hit, > then I might not need to create separate classes with wrappers. I > will need to do performance tests to know for sure. > > Matt > >  > Open Source Business Conference (OSBC), March 2425, 2009, San Francisco, CA > OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > Strategies to boost innovation and cut costs with open source participation > Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Vxlmaintainers mailing list > Vxlmaintainers@... > https://lists.sourceforge.net/lists/listinfo/vxlmaintainers > 
From: Matthew Leotta <matt.leotta@gm...>  20090303 21:03:07

On Mar 3, 2009, at 3:14 PM, Marcus Brubaker wrote: > Matthew Leotta wrote: >> Marcus, >> >> It's great to have some feedback. Thanks! It's helpful to know >> what other people are looking for in a distribution library so that >> I can make it broadly usable. >> >> On Mar 3, 2009, at 12:47 PM, Marcus Brubaker wrote: >>> 1. Many useful distributions are not over vectors or scalars. >>> E.G., distributions over quaternions and matrices. While >>> practically anything can be shoehorned into a distribution over a >>> vector, it's worth thinking about how you would implement a >>> distribution over other types. >> >> I've thought about this to some degree, but I think it might be too >> complicated to allow even more types. Each new type requires >> template specialization of all the commonly used distributions. >> I've avoided this to some degree by defining a simple set of >> operations (element access, addition, dot product, etc.) that work >> appropriately on vectors and scalars. This would be hard to >> generalize to arbitrary types. I think it's easier to shoehorn >> into a vector type as you say. > > Fair enough. In that case I would add methods to the base class > which support evaluating the density at multiple points. I.E., they > take as input a matrix where each column (or row) is a single > element and they return a vector with the pdf evaluated at each > point. This would allow efficient PDF evaluation at multiple > points. The nice thing is that a default implementation in terms of > prob_density can be used to reduce overhead for people implementing > distributions. > > This could be done with a list or something like that instead of a > matrix, but the copying required could kill any meaningful > performance benefits. I have to admit, my experience in implementing various distributions is quite limited. It hadn't crossed my mind that it could be significantly more efficient to evaluate a distribution at many points in a single function call (rather than a function call for each point). Is your reasoning for this based on virtual function overhead and multiple evaluations of a normalization constant, or are there other factors? Could you give an example of a distribution that would benefit from this? > >>> 2. Not all distribution functions can be readily normalized and >>> many uses of distributions don't require the normalization >>> constant. It may be worth allowing both of these cases. >> >> Good point. I've assumed that all simple distributions will be >> normalized. Maybe there should be a function call that indicates >> whether the densities are normalized? Maybe there should be a >> separate virtual function for unnormalized density? It could be >> faster to compute the unnormalized version even when normalization >> is possible. Thoughts? > > There are definitely functions which are normalizable but for which > the normalization is expensive (Gamma and Beta distributions are > moderate examples of cases like this, the Bingham distribution is a > more extreme case). In these cases it's important to compute the > normalization only when necessary and to precompute it and cache it > if possible. > > I think either approach is reasonable. Having an extra parameter > passed to (log_)prob_density with a sensible default value reduces > function clutter. Alternately, having, for instance, > (log_)prob_density_unnorm and (log_)prob_density_normconst functions > allows (log_)prob_density to be implemented in the base class. I'll think about this and try to work out a reasonable solution. One design challenge with respect to caching is balancing memory footprint with efficiency. For some applications, a distribution is create once and then evaluated at many points. Caching is a good idea for that. For other applications (like background modeling), many distributions are created and evaluated at only one point before changing the distribution parameters. In that case, caching buys you nothing and it wastes valuable memory. > >>> 3. Computing the cdf, inverse cdf, etc can be hard or practically >>> impossible (except for approximations) for many distributions. >>> Handling this gracefully is important. At the very least, when >>> documenting the base class, give guidance on how to handle this. >>> E.G., when is it appropriate to use an approximation. How should >>> unimplemented functions be handled? Maybe there should be >>> "approximate" versions or an approximation parameters for some of >>> these functions which allows the user to specify whether >>> approximations are allowed or how accurate things need to be. >> >> I'm going to leave this up to others to figure out what is best. I >> was hoping maybe someone could implement some numerical quadrature >> algorithms to handle some of the cases. I imagine there will still >> be cases that are difficult to compute. Maybe there should also be >> a function to indicate that the cdf is not available? > > Adding a function is probably overkill for the vast majority of use > cases. If this information needs to be available at runtime, then I > would add a bool (or an int with flags) to indicate it. I'm > somewhat agnostic about a specific solution, but rather I think its > important to document what a class should do in these cases before > people start implementing too many classes. I'm unsure what other people's needs are for this. For now why don't we say that unimplemented cdf functions return 1.0? I can update the documentation. > >>> Anyway, vpdl looks like a great start and I'm looking forward to >>> seeing where it goes from here. >> >> My goal in writing vpdl was to move the functionality of bsta into >> core in a way that is general enough to support the needs of other >> VXL users. Specifically I am aiming to support the functionality >> of VXL's other primary probability libraries: vpdfl and pdf1d. I >> will not have time to implement all of these features, but I hope >> to get the framework in place so that others can build upon it. If >> you have specific suggestions about how to improve the code, I'd >> love to hear them. It would be even better if you can help >> contribute to the code. > > I'm crunched for time right now (impending ICCV deadline and all) > but I might be able to lend a more direct hand in the future. I > have implemented a fair number of more "exotic" distributions that I > would be happy to contribute to the effort. In particular I have > samplers implemented for distributions like Gamma, Beta, BetaPrime, > Exponential, Truncated Normal, etc. Understood. I'm a little crunched myself. I'm happy to accept your contributions whenever you have time. > > Cheers, > Marcus >> >> Matt >> >>> >>> Cheers, >>> Marcus >>> >>> Matthew Leotta wrote: >>>> Dear All, >>>> >>>> I've just checked in the first part of vpdl (probability >>>> distribution library) into the vxl core. I have not modified >>>> any CMakeLists.txt outside of vpdl. If you want to build it >>>> you'll need to manually add it to core/CMakeLists.txt. At some >>>> point I can check in a CMake build option that defaults to off. >>>> >>>> If you have time, please take a look at the code and let me know >>>> if the design looks reasonable. It is quite incomplete at this >>>> point, but it should give you the idea. I'm only working on the >>>> distribution classes and I welcome help in contributing the >>>> design of builders, samplers, or any other essential parts. >>>> When I get community approval for this design I'll added more >>>> distributions and write the book chapter. >>>> >>>> The general design is like this: vpdl_distribution<T,n> is the >>>> templated base class for distributions. The template parameters >>>> are T, the floating point type (float or double) and n, the >>>> dimension. For n > 1 the distributions work with >>>> vnl_vector_fixed<T,n> and vnl_matrix_fixed<T,n,n> types. For n >>>> == 1 they work with T directly for scalar computations. The >>>> special case of n == 0 (the default) works with vnl_vector<T> >>>> and vnl_matrix<T> for dimension specified at run time. While >>>> vpdl_distribution<T,n> should be used as the base class, it is >>>> inherited from vpdl_base_traits<T,n>. vpdl_base_traits is >>>> partially specialized to create typedefs and functions that >>>> reduce the need for specialization in later derived classes. >>>> For example, vpdl_distribution<T,n>::index(v,i) is a static >>>> member function that provides access to the ith element of >>>> vector v even if v is really a scalar of type T (in which case >>>> it returns the scalar value). >>>> >>>> I currently have two working distributions with test cases: >>>> vpdl_gaussian_sphere and vpdl_gaussian_indep. Both are >>>> restricted version of Gaussian distributions (with hyper >>>> spherical and independent covariances). The general Gaussian is >>>> a work in progress, and I think it will used the eigenvector >>>> representation of mul/vpdfl. >>>> >>>> I've inlined everything in the Gaussians so that no .txx file is >>>> needed. The vpdl_distribution does use .txx with many >>>> instantiations in the Templates subdirectory. Does anyone have >>>> any preference on the use of .txx files here? It seems like a >>>> very large number of files would be needed in the Templates >>>> directory if I use .txx files. Many of these instantiations >>>> might rarely be used. However, if everything is inlined it >>>> could lead to more code bloat. >>>> >>>> This design integrates bsta more tightly than originally >>>> considered. If the virtual functions do not create too large of >>>> a performance hit, then I might not need to create separate >>>> classes with wrappers. I will need to do performance tests to >>>> know for sure. >>>> >>>> Matt >>>> >>>>  >>>> Open Source Business Conference (OSBC), March 2425, 2009, San >>>> Francisco, CA >>>> OSBC tackles the biggest issue in open source: Open Sourcing the >>>> Enterprise >>>> Strategies to boost innovation and cut costs with open source >>>> participation >>>> Receive a $600 discount off the registration fee with the source >>>> code: SFAD >>>> http://p.sf.net/sfu/XcvMzF8H >>>> _______________________________________________ >>>> Vxlmaintainers mailing list >>>> Vxlmaintainers@... >>>> https://lists.sourceforge.net/lists/listinfo/vxlmaintainers >>>> >>> >> > 
From: Marcus Brubaker <mbrubake@cs...>  20090303 21:27:56

Matthew Leotta wrote: > > On Mar 3, 2009, at 3:14 PM, Marcus Brubaker wrote: > >> Matthew Leotta wrote: >>> Marcus, >>> >>> It's great to have some feedback. Thanks! It's helpful to know >>> what other people are looking for in a distribution library so that >>> I can make it broadly usable. >>> >>> On Mar 3, 2009, at 12:47 PM, Marcus Brubaker wrote: >>>> 1. Many useful distributions are not over vectors or scalars. >>>> E.G., distributions over quaternions and matrices. While >>>> practically anything can be shoehorned into a distribution over a >>>> vector, it's worth thinking about how you would implement a >>>> distribution over other types. >>> >>> I've thought about this to some degree, but I think it might be too >>> complicated to allow even more types. Each new type requires >>> template specialization of all the commonly used distributions. >>> I've avoided this to some degree by defining a simple set of >>> operations (element access, addition, dot product, etc.) that work >>> appropriately on vectors and scalars. This would be hard to >>> generalize to arbitrary types. I think it's easier to shoehorn into >>> a vector type as you say. >> >> Fair enough. In that case I would add methods to the base class >> which support evaluating the density at multiple points. I.E., they >> take as input a matrix where each column (or row) is a single element >> and they return a vector with the pdf evaluated at each point. This >> would allow efficient PDF evaluation at multiple points. The nice >> thing is that a default implementation in terms of prob_density can >> be used to reduce overhead for people implementing distributions. >> >> This could be done with a list or something like that instead of a >> matrix, but the copying required could kill any meaningful >> performance benefits. > > I have to admit, my experience in implementing various distributions > is quite limited. It hadn't crossed my mind that it could be > significantly more efficient to evaluate a distribution at many points > in a single function call (rather than a function call for each > point). Is your reasoning for this based on virtual function overhead > and multiple evaluations of a normalization constant, or are there > other factors? Could you give an example of a distribution that would > benefit from this? All of the above really. Virtual function overhead and normalization constants can hurt. More generally though being able to use SIMD instructions and preserving cache locality can be a big win. Pretty much any distribution could be helped by this if you're evaluating at more than a few points. E.G., for a multivariate gaussian, one matrixmatrix multiplication is typically cheaper (due to SIMD instructions, cache locality, etc) than multiple matrix vector multiplies. >>>> 2. Not all distribution functions can be readily normalized and >>>> many uses of distributions don't require the normalization >>>> constant. It may be worth allowing both of these cases. >>> >>> Good point. I've assumed that all simple distributions will be >>> normalized. Maybe there should be a function call that indicates >>> whether the densities are normalized? Maybe there should be a >>> separate virtual function for unnormalized density? It could be >>> faster to compute the unnormalized version even when normalization >>> is possible. Thoughts? >> >> There are definitely functions which are normalizable but for which >> the normalization is expensive (Gamma and Beta distributions are >> moderate examples of cases like this, the Bingham distribution is a >> more extreme case). In these cases it's important to compute the >> normalization only when necessary and to precompute it and cache it >> if possible. >> >> I think either approach is reasonable. Having an extra parameter >> passed to (log_)prob_density with a sensible default value reduces >> function clutter. Alternately, having, for instance, >> (log_)prob_density_unnorm and (log_)prob_density_normconst functions >> allows (log_)prob_density to be implemented in the base class. > > I'll think about this and try to work out a reasonable solution. > > One design challenge with respect to caching is balancing memory > footprint with efficiency. For some applications, a distribution is > create once and then evaluated at many points. Caching is a good idea > for that. For other applications (like background modeling), many > distributions are created and evaluated at only one point before > changing the distribution parameters. In that case, caching buys you > nothing and it wastes valuable memory. Indeed, I'm inclined make it possible for a user do their own caching, although it is only a single scalar per distribution in most cases. In terms of memory is won't hurt more than one or two virtual functions would. >>>> 3. Computing the cdf, inverse cdf, etc can be hard or practically >>>> impossible (except for approximations) for many distributions. >>>> Handling this gracefully is important. At the very least, when >>>> documenting the base class, give guidance on how to handle this. >>>> E.G., when is it appropriate to use an approximation. How should >>>> unimplemented functions be handled? Maybe there should be >>>> "approximate" versions or an approximation parameters for some of >>>> these functions which allows the user to specify whether >>>> approximations are allowed or how accurate things need to be. >>> >>> I'm going to leave this up to others to figure out what is best. I >>> was hoping maybe someone could implement some numerical quadrature >>> algorithms to handle some of the cases. I imagine there will still >>> be cases that are difficult to compute. Maybe there should also be >>> a function to indicate that the cdf is not available? >> >> Adding a function is probably overkill for the vast majority of use >> cases. If this information needs to be available at runtime, then I >> would add a bool (or an int with flags) to indicate it. I'm somewhat >> agnostic about a specific solution, but rather I think its important >> to document what a class should do in these cases before people start >> implementing too many classes. > > I'm unsure what other people's needs are for this. For now why don't > we say that unimplemented cdf functions return 1.0? I can update the > documentation. Sounds fair for now. As distributions for which the CDF is difficult come up we can reevaluate things. Perhaps the generic numerical quadrature routine could later be implemented in the base class as an obvious approximation to the true CDF. Cheers, Marcus 
From: Matthew Leotta <matt.leotta@gm...>  20090304 15:07:58

On Mar 3, 2009, at 4:27 PM, Marcus Brubaker wrote: > Matthew Leotta wrote: >> >> On Mar 3, 2009, at 3:14 PM, Marcus Brubaker wrote: >> >>> Matthew Leotta wrote: >>>> Marcus, >>>> >>>> It's great to have some feedback. Thanks! It's helpful to know >>>> what other people are looking for in a distribution library so >>>> that I can make it broadly usable. >>>> >>>> On Mar 3, 2009, at 12:47 PM, Marcus Brubaker wrote: >>>>> 1. Many useful distributions are not over vectors or scalars. >>>>> E.G., distributions over quaternions and matrices. While >>>>> practically anything can be shoehorned into a distribution over >>>>> a vector, it's worth thinking about how you would implement a >>>>> distribution over other types. >>>> >>>> I've thought about this to some degree, but I think it might be >>>> too complicated to allow even more types. Each new type requires >>>> template specialization of all the commonly used distributions. >>>> I've avoided this to some degree by defining a simple set of >>>> operations (element access, addition, dot product, etc.) that >>>> work appropriately on vectors and scalars. This would be hard to >>>> generalize to arbitrary types. I think it's easier to shoehorn >>>> into a vector type as you say. >>> >>> Fair enough. In that case I would add methods to the base class >>> which support evaluating the density at multiple points. I.E., >>> they take as input a matrix where each column (or row) is a single >>> element and they return a vector with the pdf evaluated at each >>> point. This would allow efficient PDF evaluation at multiple >>> points. The nice thing is that a default implementation in terms >>> of prob_density can be used to reduce overhead for people >>> implementing distributions. >>> >>> This could be done with a list or something like that instead of a >>> matrix, but the copying required could kill any meaningful >>> performance benefits. >> >> I have to admit, my experience in implementing various >> distributions is quite limited. It hadn't crossed my mind that it >> could be significantly more efficient to evaluate a distribution at >> many points in a single function call (rather than a function call >> for each point). Is your reasoning for this based on virtual >> function overhead and multiple evaluations of a normalization >> constant, or are there other factors? Could you give an example of >> a distribution that would benefit from this? > > All of the above really. Virtual function overhead and > normalization constants can hurt. More generally though being able > to use SIMD instructions and preserving cache locality can be a big > win. Pretty much any distribution could be helped by this if you're > evaluating at more than a few points. E.G., for a multivariate > gaussian, one matrixmatrix multiplication is typically cheaper (due > to SIMD instructions, cache locality, etc) than multiple matrix > vector multiplies. Alright, I'll keep this in mind. I'm not sure if I like the idea of needing to form a matrix out of the points, but maybe something with iterators would work. One thing I am concerned about is cluttering up the interface with virtual functions to add more efficient implementations for functionality that already exists. what functions do you need to apply over multiple points? probability density? log density? cdf? Do we now need multipoint version of each function? I'm starting to lean back to favoring my original plan for vpdl in which the library has two distinct parts: 1) a pure generic library with templates (header files only, no compiled library) that focuses on maximum speed and minimal memory when types are known at compile time. 2) a polymorphic library (like the current code) that may wrap the generic library code in some cases and focuses on runtime flexibility and easy of use. These multipoint functions would fit better into (1), but I think they might be too much clutter for (2). > >>>>> 2. Not all distribution functions can be readily normalized and >>>>> many uses of distributions don't require the normalization >>>>> constant. It may be worth allowing both of these cases. >>>> >>>> Good point. I've assumed that all simple distributions will be >>>> normalized. Maybe there should be a function call that indicates >>>> whether the densities are normalized? Maybe there should be a >>>> separate virtual function for unnormalized density? It could be >>>> faster to compute the unnormalized version even when >>>> normalization is possible. Thoughts? >>> >>> There are definitely functions which are normalizable but for >>> which the normalization is expensive (Gamma and Beta distributions >>> are moderate examples of cases like this, the Bingham distribution >>> is a more extreme case). In these cases it's important to compute >>> the normalization only when necessary and to precompute it and >>> cache it if possible. >>> >>> I think either approach is reasonable. Having an extra parameter >>> passed to (log_)prob_density with a sensible default value reduces >>> function clutter. Alternately, having, for instance, >>> (log_)prob_density_unnorm and (log_)prob_density_normconst >>> functions allows (log_)prob_density to be implemented in the base >>> class. >> >> I'll think about this and try to work out a reasonable solution. >> >> One design challenge with respect to caching is balancing memory >> footprint with efficiency. For some applications, a distribution >> is create once and then evaluated at many points. Caching is a >> good idea for that. For other applications (like background >> modeling), many distributions are created and evaluated at only one >> point before changing the distribution parameters. In that case, >> caching buys you nothing and it wastes valuable memory. > > Indeed, I'm inclined make it possible for a user do their own > caching, although it is only a single scalar per distribution in > most cases. In terms of memory is won't hurt more than one or two > virtual functions would. In terms of the two part design (above), I think internal caching would be allowed in (2) to improve performance at the expense of memory and without complicating the API. In (1) we would rely on the multipoint evaluations or user caching. Incidentally, adding more virtual function should not increase the memory footprint of a class. The virtual function table is shared among all instances of a class. There is only the cost of a single pointer to the virtual function table if virtual functions are used at all. However, this small cost can be a big deal and is part of the motivation for wanting the two part design. Consider a scalar gaussian distribution that only stores two floats (mean and variance). Adding a virtual functions increases memory use by 50%. That could be critical if you have an array of millions of such distributions (e.g. in background modeling). > >>>>> 3. Computing the cdf, inverse cdf, etc can be hard or >>>>> practically impossible (except for approximations) for many >>>>> distributions. Handling this gracefully is important. At the >>>>> very least, when documenting the base class, give guidance on >>>>> how to handle this. E.G., when is it appropriate to use an >>>>> approximation. How should unimplemented functions be handled? >>>>> Maybe there should be "approximate" versions or an approximation >>>>> parameters for some of these functions which allows the user to >>>>> specify whether approximations are allowed or how accurate >>>>> things need to be. >>>> >>>> I'm going to leave this up to others to figure out what is best. >>>> I was hoping maybe someone could implement some numerical >>>> quadrature algorithms to handle some of the cases. I imagine >>>> there will still be cases that are difficult to compute. Maybe >>>> there should also be a function to indicate that the cdf is not >>>> available? >>> >>> Adding a function is probably overkill for the vast majority of >>> use cases. If this information needs to be available at runtime, >>> then I would add a bool (or an int with flags) to indicate it. >>> I'm somewhat agnostic about a specific solution, but rather I >>> think its important to document what a class should do in these >>> cases before people start implementing too many classes. >> >> I'm unsure what other people's needs are for this. For now why >> don't we say that unimplemented cdf functions return 1.0? I can >> update the documentation. > > Sounds fair for now. As distributions for which the CDF is > difficult come up we can reevaluate things. Perhaps the generic > numerical quadrature routine could later be implemented in the base > class as an obvious approximation to the true CDF. After some thought, I've changed my mind on the 1.0 value. I think it should be a quiet NAN (vcl_numeric_limits<T>::quiet_NaN()). Mixture distribution could contain a mix of distributions that do and do not support cdf. The cdf of such a mixture would be a linear combination of 1 and a value between 0 and 1. That could lead to some unexpected results. Using NaN should resolve that since the NaN will propagate. Matt 
From: Peter Vanroose <peter_vanroose@ya...>  20090305 08:19:46

> ... I think it should be a quiet NAN (vcl_numeric_limits<T>::quiet_NaN()). Agreed. Other argument against 1: due to rounding errors, a computed cdf value could come out as e.g. 1e8, so a test for "cdf < 0" would not be able to distinguish the two cases.  Peter. __________________________________________________________ Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo. http://www.kelkoo.se/c100390123lanutansakerhet.html?partnerId=96915014 
From: Matthew Leotta <matt.leotta@gm...>  20090320 16:01:05

On Mar 20, 2009, at 9:19 AM, Ian Scott wrote: > > Marcus Brubaker wrote: >>>> Fair enough. In that case I would add methods to the base class >>>> which support evaluating the density at multiple points. I.E., >>>> they take as input a matrix where each column (or row) is a >>>> single element and they return a vector with the pdf evaluated at >>>> each point. This would allow efficient PDF evaluation at >>>> multiple points. The nice thing is that a default implementation >>>> in terms of prob_density can be used to reduce overhead for >>>> people implementing distributions. >>>> > > > Matthew Leotta wrote: >>> I have to admit, my experience in implementing various >>> distributions is quite limited. It hadn't crossed my mind that it >>> could be significantly more efficient to evaluate a distribution >>> at many points in a single function call (rather than a function >>> call for each point). Is your reasoning for this based on virtual >>> function overhead and multiple evaluations of a normalization >>> constant, or are there other factors? Could you give an example >>> of a distribution that would benefit from this? > > Marcus Brubaker wrote: >> All of the above really. Virtual function overhead and >> normalization constants can hurt. More generally though being able >> to use SIMD instructions and preserving cache locality can be a big >> win. Pretty much any distribution could be helped by this if >> you're evaluating at more than a few points. E.G., for a >> multivariate gaussian, one matrixmatrix multiplication is >> typically cheaper (due to SIMD instructions, cache locality, etc) >> than multiple matrix vector multiplies. > > For anything other than a trivial PDF (possibly a 1D Epanechnikov > quadratic kernel?), I would have thought that the virtual function > call cost would be trivial. The virtual functions do seem to be trivial with respect to calling time. However, the existence of any virtual functions adds to the size of the class, and this may not be trivial for memory allocation. Consider a scalar Gaussian of type float. It requires 8 bytes (4 for each of 2 floats for mean and variance). When you use virtual functions you need 12 bytes (4 more for the virtual function table pointer). That's 50% more and is not trivial when you have an array of a million distributions. This is the main reason I added vpdt and went back to the idea of a pure template library that is wrapped by the polymorphic strategy tree in vpdl. The lack of virtual functions may also allow more inlining and optimizations by some compilers. > Whilst the arguments for SIMD compatibility and cachelocality is > true, the problems with applying these arguments are that > > 1. They prettymuch require vnl_matrix types for parameters and > results, destroying the type idioms we use in VXL and C++ in > general. In extremis you have Matlab's everythingisamatrix mess. > > 2. If you apply this argument everywhere you end up massively > multiplying your API to deal with all cases where SIMD speedups > might be needed  since your particular matrix folding will depend > on the particular cachelocality characteristics of your data > configuration and algorithms. > > 3. Reordering code for SIMD and cache locality really should be the > domain of the optimising compiler  although I should admit to the > endlesslydashed optimism I have towards compiler performance. > > I would suggest one of the following options. > > 1. Not providing multisample methods in the general baseclass vpdl > API. If someone wants a specific case they can either write their > own private code tailored to their requirements. At best it could be > added to either the relevant vpdt PDF class, or derived vpdl PDF class > > 2. Providing a general multisample API that uses iterators. If the > iterators point to compact memory, then you will get cache locality > for free and SIMD if you do some careful iteratortraits template > wizardry. This way you maintain the integrity of the type system and > limit the API size increase. > > I kind of like solution 2 because it also ties in with a good design > for the builders. The use of iterators could merge the API for batch > and online learning of PDF parameters. Currently vpdfl uses > mbl_data_wrapper which vaguely equivalent to a pair of STL > iterators. Replacing this with proper iterator pairs could make the > code much more accessible to existing C++ programmers, and leverage > the existing iterator based algorithms. I agree that 2 is better. Each distribution should know how to handle a single point and external functions can iterate over multiple points. This is why I've added virtual functions to the vpdl base distribution for normalization constant and unnormalized density. The biggest savings will probably come from computing normalization once and then applying it to each unnormalized density as they are computed. > The only trick is to avoid the huge template instantiation sizes of > using templated iterators with complicated algorithms. One way > around this it to proved a polymorphic strategy tree of iterator > types, and use this where there are no speed optimisations. > I'm not convinced that huge template instantiation will be an issue. I think most of these functions are simple enough to be inlined in a header file and only instantiated by the user as needed (i.e. not in libvpdl for several possible iterator type). Matt 
From: Ian Scott <ian.scott@st...>  20090320 16:51:14

Matthew Leotta wrote: > On Mar 20, 2009, at 9:19 AM, Ian Scott wrote: > >> Marcus Brubaker wrote: >>>>> Fair enough. In that case I would add methods to the base class >>>>> which support evaluating the density at multiple points. I.E., >>>>> they take as input a matrix where each column (or row) is a >>>>> single element and they return a vector with the pdf evaluated at >>>>> each point. This would allow efficient PDF evaluation at >>>>> multiple points. The nice thing is that a default implementation >>>>> in terms of prob_density can be used to reduce overhead for >>>>> people implementing distributions. >>>>> >> >> Matthew Leotta wrote: >>>> I have to admit, my experience in implementing various >>>> distributions is quite limited. It hadn't crossed my mind that it >>>> could be significantly more efficient to evaluate a distribution >>>> at many points in a single function call (rather than a function >>>> call for each point). Is your reasoning for this based on virtual >>>> function overhead and multiple evaluations of a normalization >>>> constant, or are there other factors? Could you give an example >>>> of a distribution that would benefit from this? >> Marcus Brubaker wrote: >>> All of the above really. Virtual function overhead and >>> normalization constants can hurt. More generally though being able >>> to use SIMD instructions and preserving cache locality can be a big >>> win. Pretty much any distribution could be helped by this if >>> you're evaluating at more than a few points. E.G., for a >>> multivariate gaussian, one matrixmatrix multiplication is >>> typically cheaper (due to SIMD instructions, cache locality, etc) >>> than multiple matrix vector multiplies. >> For anything other than a trivial PDF (possibly a 1D Epanechnikov >> quadratic kernel?), I would have thought that the virtual function >> call cost would be trivial. > > The virtual functions do seem to be trivial with respect to calling > time. However, the existence of any virtual functions adds to the > size of the class, and this may not be trivial for memory allocation. > Consider a scalar Gaussian of type float. It requires 8 bytes (4 for > each of 2 floats for mean and variance). When you use virtual > functions you need 12 bytes (4 more for the virtual function table > pointer). That's 50% more and is not trivial when you have an array > of a million distributions. > > This is the main reason I added vpdt and went back to the idea of a > pure template library that is wrapped by the polymorphic strategy tree > in vpdl. The lack of virtual functions may also allow more inlining > and optimizations by some compilers. Oh  absolutely. I was only considering the vpdl strategy tree case, where you have already agreed to the VFPT pointer cost. > > >> Whilst the arguments for SIMD compatibility and cachelocality is >> true, the problems with applying these arguments are that >> >> 1. They prettymuch require vnl_matrix types for parameters and >> results, destroying the type idioms we use in VXL and C++ in >> general. In extremis you have Matlab's everythingisamatrix mess. >> >> 2. If you apply this argument everywhere you end up massively >> multiplying your API to deal with all cases where SIMD speedups >> might be needed  since your particular matrix folding will depend >> on the particular cachelocality characteristics of your data >> configuration and algorithms. >> >> 3. Reordering code for SIMD and cache locality really should be the >> domain of the optimising compiler  although I should admit to the >> endlesslydashed optimism I have towards compiler performance. >> >> I would suggest one of the following options. >> >> 1. Not providing multisample methods in the general baseclass vpdl >> API. If someone wants a specific case they can either write their >> own private code tailored to their requirements. At best it could be >> added to either the relevant vpdt PDF class, or derived vpdl PDF class >> >> 2. Providing a general multisample API that uses iterators. If the >> iterators point to compact memory, then you will get cache locality >> for free and SIMD if you do some careful iteratortraits template >> wizardry. This way you maintain the integrity of the type system and >> limit the API size increase. >> >> I kind of like solution 2 because it also ties in with a good design >> for the builders. The use of iterators could merge the API for batch >> and online learning of PDF parameters. Currently vpdfl uses >> mbl_data_wrapper which vaguely equivalent to a pair of STL >> iterators. Replacing this with proper iterator pairs could make the >> code much more accessible to existing C++ programmers, and leverage >> the existing iterator based algorithms. > > I agree that 2 is better. Each distribution should know how to handle > a single point and external functions can iterate over multiple > points. This is why I've added virtual functions to the vpdl base > distribution for normalization constant and unnormalized density. The > biggest savings will probably come from computing normalization once > and then applying it to each unnormalized density as they are computed. > >> The only trick is to avoid the huge template instantiation sizes of >> using templated iterators with complicated algorithms. One way >> around this it to proved a polymorphic strategy tree of iterator >> types, and use this where there are no speed optimisations. >> > > I'm not convinced that huge template instantiation will be an issue. > I think most of these functions are simple enough to be inlined in a > header file and only instantiated by the user as needed (i.e. not in > libvpdl for several possible iterator type). Well I guess it depends on what you think is too big  Some of the PDF building functions in vpdfl are well over 100 lines. It wouldn't be hard to think of a MAPestimating PDF learning algorithm that used crossvalidation, higherorder statistical biases and/or complexity control and that had 1000 LoC. I know it isn't part of your brief, but the pattern used in vpdl may well set the standard for future libraries that need to train something based on lots of examples. vpdfl certainly set the standard for a classifier API (see mul/clsfy) where some of my private support vector machine training algorithms run into the thousands LoC. It also set the standard for Manchester's private AAM libraries, which have building algorithms running easily into the 10^5 LoC. The additional problem with these other training algorithms is that they often have multiple inputs  e.g. weights (which maybe used by vpdl), or classification targets. The number of possible template types grow polynomially. Still  I guess they could all be inlined  at the risk of a large compilation time cost. In any case, any such template variation should be limited to the templated build or density functions  it would be disastrous if the iterator types affected the type of the vpdl_distribution or vpdl_builder classes. We would have to put every possible type combination and derivative into the vsl's polymorphic loader scheme. Ian. 
From: Matthew Leotta <matt.leotta@gm...>  20090320 17:21:25

On Mar 20, 2009, at 12:51 PM, Ian Scott wrote: > Matthew Leotta wrote: >> On Mar 20, 2009, at 9:19 AM, Ian Scott wrote: >>> Marcus Brubaker wrote: >>>>>> Fair enough. In that case I would add methods to the base >>>>>> class which support evaluating the density at multiple >>>>>> points. I.E., they take as input a matrix where each column >>>>>> (or row) is a single element and they return a vector with the >>>>>> pdf evaluated at each point. This would allow efficient PDF >>>>>> evaluation at multiple points. The nice thing is that a >>>>>> default implementation in terms of prob_density can be used to >>>>>> reduce overhead for people implementing distributions. >>>>>> >>> >>> Matthew Leotta wrote: >>>>> I have to admit, my experience in implementing various >>>>> distributions is quite limited. It hadn't crossed my mind that >>>>> it could be significantly more efficient to evaluate a >>>>> distribution at many points in a single function call (rather >>>>> than a function call for each point). Is your reasoning for >>>>> this based on virtual function overhead and multiple >>>>> evaluations of a normalization constant, or are there other >>>>> factors? Could you give an example of a distribution that >>>>> would benefit from this? >>> Marcus Brubaker wrote: >>>> All of the above really. Virtual function overhead and >>>> normalization constants can hurt. More generally though being >>>> able to use SIMD instructions and preserving cache locality can >>>> be a big win. Pretty much any distribution could be helped by >>>> this if you're evaluating at more than a few points. E.G., for >>>> a multivariate gaussian, one matrixmatrix multiplication is >>>> typically cheaper (due to SIMD instructions, cache locality, >>>> etc) than multiple matrix vector multiplies. >>> For anything other than a trivial PDF (possibly a 1D Epanechnikov >>> quadratic kernel?), I would have thought that the virtual >>> function call cost would be trivial. >> The virtual functions do seem to be trivial with respect to >> calling time. However, the existence of any virtual functions >> adds to the size of the class, and this may not be trivial for >> memory allocation. Consider a scalar Gaussian of type float. It >> requires 8 bytes (4 for each of 2 floats for mean and variance). >> When you use virtual functions you need 12 bytes (4 more for the >> virtual function table pointer). That's 50% more and is not >> trivial when you have an array of a million distributions. >> This is the main reason I added vpdt and went back to the idea of >> a pure template library that is wrapped by the polymorphic >> strategy tree in vpdl. The lack of virtual functions may also >> allow more inlining and optimizations by some compilers. > > Oh  absolutely. I was only considering the vpdl strategy tree case, > where you have already agreed to the VFPT pointer cost. > >>> Whilst the arguments for SIMD compatibility and cachelocality is >>> true, the problems with applying these arguments are that >>> >>> 1. They prettymuch require vnl_matrix types for parameters and >>> results, destroying the type idioms we use in VXL and C++ in >>> general. In extremis you have Matlab's everythingisamatrix mess. >>> >>> 2. If you apply this argument everywhere you end up massively >>> multiplying your API to deal with all cases where SIMD speedups >>> might be needed  since your particular matrix folding will >>> depend on the particular cachelocality characteristics of your >>> data configuration and algorithms. >>> >>> 3. Reordering code for SIMD and cache locality really should be >>> the domain of the optimising compiler  although I should admit >>> to the endlesslydashed optimism I have towards compiler >>> performance. >>> >>> I would suggest one of the following options. >>> >>> 1. Not providing multisample methods in the general baseclass >>> vpdl API. If someone wants a specific case they can either write >>> their own private code tailored to their requirements. At best it >>> could be added to either the relevant vpdt PDF class, or derived >>> vpdl PDF class >>> >>> 2. Providing a general multisample API that uses iterators. If >>> the iterators point to compact memory, then you will get cache >>> locality for free and SIMD if you do some careful iteratortraits >>> template wizardry. This way you maintain the integrity of the >>> type system and limit the API size increase. >>> >>> I kind of like solution 2 because it also ties in with a good >>> design for the builders. The use of iterators could merge the API >>> for batch and online learning of PDF parameters. Currently vpdfl >>> uses mbl_data_wrapper which vaguely equivalent to a pair of STL >>> iterators. Replacing this with proper iterator pairs could make >>> the code much more accessible to existing C++ programmers, and >>> leverage the existing iterator based algorithms. >> I agree that 2 is better. Each distribution should know how to >> handle a single point and external functions can iterate over >> multiple points. This is why I've added virtual functions to the >> vpdl base distribution for normalization constant and unnormalized >> density. The biggest savings will probably come from computing >> normalization once and then applying it to each unnormalized >> density as they are computed. >>> The only trick is to avoid the huge template instantiation sizes >>> of using templated iterators with complicated algorithms. One >>> way around this it to proved a polymorphic strategy tree of >>> iterator types, and use this where there are no speed >>> optimisations. >>> >> I'm not convinced that huge template instantiation will be an >> issue. I think most of these functions are simple enough to be >> inlined in a header file and only instantiated by the user as >> needed (i.e. not in libvpdl for several possible iterator type). > > Well I guess it depends on what you think is too big  Some of the > PDF building functions in vpdfl are well over 100 lines. It wouldn't > be hard to think of a MAPestimating PDF learning algorithm that > used crossvalidation, higherorder statistical biases and/or > complexity control and that had 1000 LoC. I was actually only referring to the original concern that Marcus had about efficiently evaluating probabilities at a collections of points. These types of functions should be generally be under 10 LoC. I don't think builder functions should be treated the same way. However, it might be possible to have lightweight templated functions that wrap an arbitrary iterator into a polymorphic iterator and then call a builder implementation that uses the polymorphic iterator base class. This would allow the user to use any type of iterator and hide the fact that a special polymorphic iterator type is required. I haven't thought about this enough to know exactly how it might work. > > I know it isn't part of your brief, but the pattern used in vpdl may > well set the standard for future libraries that need to train > something based on lots of examples. vpdfl certainly set the > standard for a classifier API (see mul/clsfy) where some of my > private support vector machine training algorithms run into the > thousands LoC. It also set the standard for Manchester's private AAM > libraries, which have building algorithms running easily into the > 10^5 LoC. > > The additional problem with these other training algorithms is that > they often have multiple inputs  e.g. weights (which maybe used by > vpdl), or classification targets. The number of possible template > types grow polynomially. > > Still  I guess they could all be inlined  at the risk of a large > compilation time cost. In any case, any such template variation > should be limited to the templated build or density functions  it > would be disastrous if the iterator types affected the type of the > vpdl_distribution or vpdl_builder classes. We would have to put > every possible type combination and derivative into the vsl's > polymorphic loader scheme. I certainly agree. These templates for iterators should have no effect on the types used in distributions. Matt 
From: Ian Scott <ian.scott@gm...>  20090320 13:19:55

Marcus Brubaker wrote: >>> Fair enough. In that case I would add methods to the base class >>> which support evaluating the density at multiple points. I.E., they >>> take as input a matrix where each column (or row) is a single element >>> and they return a vector with the pdf evaluated at each point. This >>> would allow efficient PDF evaluation at multiple points. The nice >>> thing is that a default implementation in terms of prob_density can >>> be used to reduce overhead for people implementing distributions. >>> Matthew Leotta wrote: >> I have to admit, my experience in implementing various distributions >> is quite limited. It hadn't crossed my mind that it could be >> significantly more efficient to evaluate a distribution at many points >> in a single function call (rather than a function call for each >> point). Is your reasoning for this based on virtual function overhead >> and multiple evaluations of a normalization constant, or are there >> other factors? Could you give an example of a distribution that would >> benefit from this? Marcus Brubaker wrote: > All of the above really. Virtual function overhead and normalization > constants can hurt. More generally though being able to use SIMD > instructions and preserving cache locality can be a big win. Pretty > much any distribution could be helped by this if you're evaluating at > more than a few points. E.G., for a multivariate gaussian, one > matrixmatrix multiplication is typically cheaper (due to SIMD > instructions, cache locality, etc) than multiple matrix vector multiplies. > For anything other than a trivial PDF (possibly a 1D Epanechnikov quadratic kernel?), I would have thought that the virtual function call cost would be trivial. Whilst the arguments for SIMD compatibility and cachelocality is true, the problems with applying these arguments are that 1. They prettymuch require vnl_matrix types for parameters and results, destroying the type idioms we use in VXL and C++ in general. In extremis you have Matlab's everythingisamatrix mess. 2. If you apply this argument everywhere you end up massively multiplying your API to deal with all cases where SIMD speedups might be needed  since your particular matrix folding will depend on the particular cachelocality characteristics of your data configuration and algorithms. 3. Reordering code for SIMD and cache locality really should be the domain of the optimising compiler  although I should admit to the endlesslydashed optimism I have towards compiler performance. I would suggest one of the following options. 1. Not providing multisample methods in the general baseclass vpdl API. If someone wants a specific case they can either write their own private code tailored to their requirements. At best it could be added to either the relevant vpdt PDF class, or derived vpdl PDF class 2. Providing a general multisample API that uses iterators. If the iterators point to compact memory, then you will get cache locality for free and SIMD if you do some careful iteratortraits template wizardry. This way you maintain the integrity of the type system and limit the API size increase. I kind of like solution 2 because it also ties in with a good design for the builders. The use of iterators could merge the API for batch and online learning of PDF parameters. Currently vpdfl uses mbl_data_wrapper which vaguely equivalent to a pair of STL iterators. Replacing this with proper iterator pairs could make the code much more accessible to existing C++ programmers, and leverage the existing iterator based algorithms. The only trick is to avoid the huge template instantiation sizes of using templated iterators with complicated algorithms. One way around this it to proved a polymorphic strategy tree of iterator types, and use this where there are no speed optimisations. Ian. 