From: Matthew Leotta <matt.leotta@gm...>  20090303 21:03:07

On Mar 3, 2009, at 3:14 PM, Marcus Brubaker wrote: > Matthew Leotta wrote: >> Marcus, >> >> It's great to have some feedback. Thanks! It's helpful to know >> what other people are looking for in a distribution library so that >> I can make it broadly usable. >> >> On Mar 3, 2009, at 12:47 PM, Marcus Brubaker wrote: >>> 1. Many useful distributions are not over vectors or scalars. >>> E.G., distributions over quaternions and matrices. While >>> practically anything can be shoehorned into a distribution over a >>> vector, it's worth thinking about how you would implement a >>> distribution over other types. >> >> I've thought about this to some degree, but I think it might be too >> complicated to allow even more types. Each new type requires >> template specialization of all the commonly used distributions. >> I've avoided this to some degree by defining a simple set of >> operations (element access, addition, dot product, etc.) that work >> appropriately on vectors and scalars. This would be hard to >> generalize to arbitrary types. I think it's easier to shoehorn >> into a vector type as you say. > > Fair enough. In that case I would add methods to the base class > which support evaluating the density at multiple points. I.E., they > take as input a matrix where each column (or row) is a single > element and they return a vector with the pdf evaluated at each > point. This would allow efficient PDF evaluation at multiple > points. The nice thing is that a default implementation in terms of > prob_density can be used to reduce overhead for people implementing > distributions. > > This could be done with a list or something like that instead of a > matrix, but the copying required could kill any meaningful > performance benefits. I have to admit, my experience in implementing various distributions is quite limited. It hadn't crossed my mind that it could be significantly more efficient to evaluate a distribution at many points in a single function call (rather than a function call for each point). Is your reasoning for this based on virtual function overhead and multiple evaluations of a normalization constant, or are there other factors? Could you give an example of a distribution that would benefit from this? > >>> 2. Not all distribution functions can be readily normalized and >>> many uses of distributions don't require the normalization >>> constant. It may be worth allowing both of these cases. >> >> Good point. I've assumed that all simple distributions will be >> normalized. Maybe there should be a function call that indicates >> whether the densities are normalized? Maybe there should be a >> separate virtual function for unnormalized density? It could be >> faster to compute the unnormalized version even when normalization >> is possible. Thoughts? > > There are definitely functions which are normalizable but for which > the normalization is expensive (Gamma and Beta distributions are > moderate examples of cases like this, the Bingham distribution is a > more extreme case). In these cases it's important to compute the > normalization only when necessary and to precompute it and cache it > if possible. > > I think either approach is reasonable. Having an extra parameter > passed to (log_)prob_density with a sensible default value reduces > function clutter. Alternately, having, for instance, > (log_)prob_density_unnorm and (log_)prob_density_normconst functions > allows (log_)prob_density to be implemented in the base class. I'll think about this and try to work out a reasonable solution. One design challenge with respect to caching is balancing memory footprint with efficiency. For some applications, a distribution is create once and then evaluated at many points. Caching is a good idea for that. For other applications (like background modeling), many distributions are created and evaluated at only one point before changing the distribution parameters. In that case, caching buys you nothing and it wastes valuable memory. > >>> 3. Computing the cdf, inverse cdf, etc can be hard or practically >>> impossible (except for approximations) for many distributions. >>> Handling this gracefully is important. At the very least, when >>> documenting the base class, give guidance on how to handle this. >>> E.G., when is it appropriate to use an approximation. How should >>> unimplemented functions be handled? Maybe there should be >>> "approximate" versions or an approximation parameters for some of >>> these functions which allows the user to specify whether >>> approximations are allowed or how accurate things need to be. >> >> I'm going to leave this up to others to figure out what is best. I >> was hoping maybe someone could implement some numerical quadrature >> algorithms to handle some of the cases. I imagine there will still >> be cases that are difficult to compute. Maybe there should also be >> a function to indicate that the cdf is not available? > > Adding a function is probably overkill for the vast majority of use > cases. If this information needs to be available at runtime, then I > would add a bool (or an int with flags) to indicate it. I'm > somewhat agnostic about a specific solution, but rather I think its > important to document what a class should do in these cases before > people start implementing too many classes. I'm unsure what other people's needs are for this. For now why don't we say that unimplemented cdf functions return 1.0? I can update the documentation. > >>> Anyway, vpdl looks like a great start and I'm looking forward to >>> seeing where it goes from here. >> >> My goal in writing vpdl was to move the functionality of bsta into >> core in a way that is general enough to support the needs of other >> VXL users. Specifically I am aiming to support the functionality >> of VXL's other primary probability libraries: vpdfl and pdf1d. I >> will not have time to implement all of these features, but I hope >> to get the framework in place so that others can build upon it. If >> you have specific suggestions about how to improve the code, I'd >> love to hear them. It would be even better if you can help >> contribute to the code. > > I'm crunched for time right now (impending ICCV deadline and all) > but I might be able to lend a more direct hand in the future. I > have implemented a fair number of more "exotic" distributions that I > would be happy to contribute to the effort. In particular I have > samplers implemented for distributions like Gamma, Beta, BetaPrime, > Exponential, Truncated Normal, etc. Understood. I'm a little crunched myself. I'm happy to accept your contributions whenever you have time. > > Cheers, > Marcus >> >> Matt >> >>> >>> Cheers, >>> Marcus >>> >>> Matthew Leotta wrote: >>>> Dear All, >>>> >>>> I've just checked in the first part of vpdl (probability >>>> distribution library) into the vxl core. I have not modified >>>> any CMakeLists.txt outside of vpdl. If you want to build it >>>> you'll need to manually add it to core/CMakeLists.txt. At some >>>> point I can check in a CMake build option that defaults to off. >>>> >>>> If you have time, please take a look at the code and let me know >>>> if the design looks reasonable. It is quite incomplete at this >>>> point, but it should give you the idea. I'm only working on the >>>> distribution classes and I welcome help in contributing the >>>> design of builders, samplers, or any other essential parts. >>>> When I get community approval for this design I'll added more >>>> distributions and write the book chapter. >>>> >>>> The general design is like this: vpdl_distribution<T,n> is the >>>> templated base class for distributions. The template parameters >>>> are T, the floating point type (float or double) and n, the >>>> dimension. For n > 1 the distributions work with >>>> vnl_vector_fixed<T,n> and vnl_matrix_fixed<T,n,n> types. For n >>>> == 1 they work with T directly for scalar computations. The >>>> special case of n == 0 (the default) works with vnl_vector<T> >>>> and vnl_matrix<T> for dimension specified at run time. While >>>> vpdl_distribution<T,n> should be used as the base class, it is >>>> inherited from vpdl_base_traits<T,n>. vpdl_base_traits is >>>> partially specialized to create typedefs and functions that >>>> reduce the need for specialization in later derived classes. >>>> For example, vpdl_distribution<T,n>::index(v,i) is a static >>>> member function that provides access to the ith element of >>>> vector v even if v is really a scalar of type T (in which case >>>> it returns the scalar value). >>>> >>>> I currently have two working distributions with test cases: >>>> vpdl_gaussian_sphere and vpdl_gaussian_indep. Both are >>>> restricted version of Gaussian distributions (with hyper >>>> spherical and independent covariances). The general Gaussian is >>>> a work in progress, and I think it will used the eigenvector >>>> representation of mul/vpdfl. >>>> >>>> I've inlined everything in the Gaussians so that no .txx file is >>>> needed. The vpdl_distribution does use .txx with many >>>> instantiations in the Templates subdirectory. Does anyone have >>>> any preference on the use of .txx files here? It seems like a >>>> very large number of files would be needed in the Templates >>>> directory if I use .txx files. Many of these instantiations >>>> might rarely be used. However, if everything is inlined it >>>> could lead to more code bloat. >>>> >>>> This design integrates bsta more tightly than originally >>>> considered. If the virtual functions do not create too large of >>>> a performance hit, then I might not need to create separate >>>> classes with wrappers. I will need to do performance tests to >>>> know for sure. >>>> >>>> Matt >>>> >>>>  >>>> Open Source Business Conference (OSBC), March 2425, 2009, San >>>> Francisco, CA >>>> OSBC tackles the biggest issue in open source: Open Sourcing the >>>> Enterprise >>>> Strategies to boost innovation and cut costs with open source >>>> participation >>>> Receive a $600 discount off the registration fee with the source >>>> code: SFAD >>>> http://p.sf.net/sfu/XcvMzF8H >>>> _______________________________________________ >>>> Vxlmaintainers mailing list >>>> Vxlmaintainers@... >>>> https://lists.sourceforge.net/lists/listinfo/vxlmaintainers >>>> >>> >> > 