We may have some manpower now to start working on a probability
library that unifies bsta, pdf1d, and vpdfl. It looks like I might
have some funding to work on it, and Miguel has some students at the
University of Puerto Rico who might be able to help. Peter also
expressed interest at one point in getting involved. I'm not asking
for you to contribute to this effort, but I am looking for your
approval. The core of the library will be based on pdf1d and vpdfl.
I'm hoping that we can create a library meets everyones needs.
The biggest challenge I see is unifying pdf1d with vpdfl. We would
need to use vnl_vector of size 1 and vnl_matrix of size 1x1 for the
univariate case. My questions are:
1) Is this too much overhead for the univariate case?
2) Will this be too complicated for users who just want to work with
scalars in the univariate distributions?
We might be able to address number 2 by using proxy classes for the
vector and matrix that can implicitly cast to and from a scalar.
If this proposed design is acceptable to you I'll move to the next
phase and start to create header files.
On Jan 15, 2009, at 2:48 PM, Matthew Leotta wrote:
> Peter, Ian, and anyone else interested,
> In an attempt to keep things moving on this combined probability
> library I'd like to make a proposal on how to integrate bsta, pdf1d,
> and vpdfl. I have a few general questions to be resolved first. If
> we can come to an agreement on the general way to proceed then I'll
> take some time to make a more detailed class specification.
> I've reviewed all three libraries in more detail, and I'm even more
> convinced that we can't completely merge them and still meet the
> requirements of both Brown and Manchester. However, I think we
> could adopt the strategy pattern of vpdfl as the primary design and
> then put the generic programming design of bsta in a subdirectory
> and make it a pure template library. The template distributions
> could mirror a subset of the strategy pattern distributions and have
> wrapper classes to use template distributions in the strategy
> pattern framework. For example (naming conventions subject to
> template <class T> class gaussian<T> : public
> base_distribution<T>; // uses virtual functions
> template <class T> class gaussian_ref<T> : public gaussian<T> //
> uses virtual functions
> and in the template library
> template <class T, unsigned N> class gaussian_fixed<T,N>; // no
> virtual functions
> The gaussian_fixed class is unrelated to base_distribution. It's
> data is represented in terms of vnl_vector_fixed and
> vnl_matrix_fixed while the gaussian_ref class contains
> vnl_vector_ref and vnl_matrix_ref with the same data
> representation. So the fixed size data can be used in the strategy
> pattern framework for functions that are not speed critical. The
> generic template part of the library could be used on it's own, but
> would only contain the basic data structures and algorithms that
> need to be optimized for speed or memory layout.
> What I think we would need:
> 1) A common set naming conventions and data representations for
> distributions found in both designs.
> a) both bsta and vpdfl have axis aligned and full covariance
> Gaussians (with different names),
> vpdlf also has a truncated principal components Gaussian while
> bsta also has a spherically symmetric Gaussian.
> b) vpdfl currently stores a Gaussian covariance matrix in
> eigenvalue decomposition,
> while bsta stores the original matrix and caches its inverse
> 2) The strategy pattern code (vpdfl) should be templated over
> numeric type (float or double).
> 3) Acceptance that in the scalar case (if num_dimensions is 1 at run
> time) some scalar values will have to be represented as vnl_vector
> of size 1 and vnl_matrix of size 1x1.
> I am most worried about number 3. This is why pdf1d exists. Can
> pdf1d be a special case of vpdfl? If someone is interested in
> working only in 1-d will they be put off by the need to use 1-d
> vectors and matrices? bsta solves this with template specialization
> that substitutes vnl_vector_fixed<double,N> with double when N==1.
> Other things I would like to see in the strategy pattern:
> 1) it would be nice if the distribution is more than just a density
> (pdf). I would also like to see cumulative calculations (cdf).
> Sometimes density is not enough and you want to know the actual
> probability integrated over some area. If we can evaluate the cdf
> then we can at least get the probability in an axis-aligned bounding
> box (by evaluating cdf at the box corners). some parts of bsta
> support this.
> 2) It would be nice if we could work out recursive estimation for
> the "builder" classes. It's nice that the mbl_data_wrapper doesn't
> require that you have all the data in memory, but it does require
> that you use all the data before you get the estimated
> distribution. If a builder supports recursive estimation it would
> be nice if you could feed the data points one at a time and stop at
> any point to get the distribution so far.
> 3) It might be good to clean up the vpdfl interface a bit. Some
> functions on vpdfl_pdf base seem a little to application or
> distribution specific for the general case. n_peaks() and peak(int)
> seem to make sense for mixtures of Gaussians with well separate
> components, but it might be misleading (and difficult to compute
> accurately) in the general case. Others like nearest_plausible seem
> a bit application specific and maybe should non-member functions to
> reduce clutter.
> On Jan 12, 2009, at 2:07 PM, Peter Vanroose wrote:
>> OK, I see your point.
>> I don't know what your time frame is; maybe we'll have time to
>> first arrange some kind of "common brain storm" forum; I'm
>> definitely interested in joining in (although I've not used any of
>> the three libraries yet; but as a mathematician/statistician I'm
>> certainly interested in the topic).
>> Then someone could write out new class specs (any takers?), again
>> with a few days time for others to suggest adaptations/additions/...
>> Finally, we could divide work to do the actual "move", i.e.,
>> implement the new classes with the implementations from the three
>> existing libraries.
>> If we find (say) 3 or 4 people having time to do this, in a
>> reasonable short time span, this should be doable.
>> Your second path (keeping the libraries "as is") is easier, but
>> would maybe not fulfill the requirement of "make sure others are
>> using bsta".
>> -- Peter.
>>> I think there are a couple of different ways of moving
>>> forward with this statistical library.
>>> One is to bring both the template library (bsta) and the
>>> strategy patterned libraries (pdf1d and vpdfl) into one
>>> library. If we take this route then I think your strategy
>>> is a good one. However I'm not sure who is going to
>>> write out the clean design. I think we would need input
>>> from Brown and Manchester (and anyone else interested) to
>>> meet everyone's needs. We don't have all the
>>> interested parties sitting together. I suppose this could
>>> be done over e-mail, but someone would need to take the lead
>>> and propose an initial design. I'd be willing to help
>>> with fitting the bsta code into the design, but I don't
>>> think I could take the lead on this as I'm currently
>>> trying to write my Phd thesis.
>>> Another option is to consider keeping the libraries
>>> separate. bsta is really a generic template library in the
>>> style of STL. After cleaning out the old deprecated code
>>> you would find that the only compiled files are the .cxx
>>> files in the Templates subdirectory used to instantiate with
>>> some of the more common types. All of the data structures
>>> and algorithms are templated so that the code can be
>>> optimized at compile time for whichever data types and
>>> dimensionality are chosen. This is a bit more extreme than
>>> in vnl where the vnl_vector_fixed is almost always wrapped
>>> as a vnl_vector for use in algorithms. I think there may be
>>> some merit to keeping bsta as a pure template library. It
>>> might be useful to have a strategy patterned library wrap
>>> some of the templated distribution classes, but I don't
>>> see how the templated algorithms can make much use of
>>> strategy patterned distribution classes. Thoughts?
>>> My ulterior motive here is to document the design and use
>>> of some libraries I've written and used in my thesis.
>>> I'm already doing this with vidl2 and trying to see if I
>>> can do the same with bsta. The stipulations set forth by my
>>> advisor (Joe Mundy) are
>>> 1) The library is one I have designed (at least the initial
>>> 2) The library is used in my thesis work.
>>> 3) I am to write a VXL book chapter to document the code
>>> and include a copy as an appendix in my thesis.
>>> 4) The library must be used by others and promoted to core.
>>> The "promoted to core" part is the most difficult
>>> since there has been nothing added to core in a long long
>>> time. I am willing to clean up, rename, move, and
>>> thoroughly document bsta on my own if it is accepted into
>>> core. I don't think I have the time to redesign it to
>>> merge with pdf1d and vpdfl while finishing my thesis. In
>>> that case, I would probably have to cut if from my thesis
>>> and put off writing a book chapter until some unforeseen
>>> future date.
>>> On Jan 8, 2009, at 5:00 AM, Peter Vanroose wrote:
>>>> Interesting discussion!
>>>> I believe we should (1) write out a "clean"
>>> design, essentially from scratch, indeed including several
>>> "points-of-view" (in the style of vnl_vector vs.
>>> vnl_vector_fixed), but fairly complete (i.e., including
>>> functionality which is currently either not used or not
>>> fully implemented).
>>>> Then (2) gradually fill this framework with
>>> implementations from bsta, pdf1d and vpdfl (and possibly
>>> other places).
>>>> Next (3) replace implementations in bsta, pdf1d and
>>> vpdfl by (inline) calls to the new library.
>>>> And finally (4) gradually replace (in client code) all
>>> use of the then "old" libraries by directly
>>> accessing the new core library.
>>>> We've had experience and good results with a
>>> similar approach, when we converted TargetJr into vxl. It
>>> took us then 6 days with 10 people (sitting together in
>>> Oxford) to have a complete and working set of core
>>> libraries; the first 2 days were mainly spent to just write
>>> out the design (and discuss choices to be made).
>>>> For a new statistics library, I guess we'll need
>>> less than 50 man-hours to do a similar thing (which
>>> corresponds to steps 1 and 2 above).
>>>> -- Peter.
>> Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo.