From: Ian S. <ian...@st...> - 2009-01-06 13:12:01
|
Matthew Leotta wrote: > Hi Everyone, > > As part of writing my thesis, Prof. Mundy has suggested that I write > VXL book chapters for some of the supporting libraries I have written > and try to promote them to core if there is community interest. The > plan is to contribute book chapters both to VXL and as appendices in > my thesis. One such library is vidl2 which is already in the works to > move to core. I would like to gauge the interest in promoting a > couple of others. Both of these would need to be renamed when moved, > I'm open to suggestions on naming. > > > 1) BSTA - Brown Statistical Library (brl/bbas/bsta) > This is a template library for probability distributions which is > templated over dimension and numerical type. The current code is > focused on Gaussian distributions and mixtures of distributions, but > the framework is designed to work with arbitrary distributions. There > is also some old code in the library that does not fit in the > framework and would need to be cleaned out. > Currently implemented uses: video background modeling and > probabilistic voxel world modeling. > Dependencies: vnl and vnl_algo (deprecated code also uses vbl) There is definitely room for a distributions library in core. All of our code currently depends on vpdfl and pdf1d which have provided probability distributions over vectors and scalars respectively for a very long time. I think bsta's more general templated type is a better idea here. A caveat: I know mul/vpdfl very well and bsta only from skimming the code for twenty minutes. A doxygen build of bsta would be helpful as a start, but isn't in our or kitware's documentation. The problems of comparing two libraries (or languages) are well known (see Stroustrup's FAQ) and generally solved by refusing to engage in a comparison. In the case of a promotion proposal, we really do need to compare the libraries, and come to conclusion about the features we would like to see in a core library. I think someone from Brown should look at vpdfl and discuss its limitations from your needs. There are some other features of vpdfl/pdf1d that I haven't found in bsta. A couple of the more important features are 1. Types that are independent of vector-length. We work with vector-spaces of dimensions from 1 to 100000. Most of the time we cannot know the dimensionality until runtime, and even if we did, the proliferation of types and object code would be prohibitive. 2. A base class (at least over the double scalars and vectors) set up as a strategy pattern, so that the actual distribution in use can be selected at runtime. 3. The base class should have API to perform the following a. mean and stddev (or throw for those types for which it is not meaningful). b. For double vectors and scalars we need p, grad_p, log_p (for accuracy and efficiency), threshold, nearest_plasusible, c. Independent sampling (must store an RNG somewhere) d. Estimating parameters from population. (Although there is no "best" algorithm for many distributions) > > > 2) IMESH - Indexed Mesh Library (brl/bbas/imesh) > This is a library for representing 2-d surface meshes in N-d > (usually 2-d or 3-d) space. The mesh is represented by indexing into > vectors of data. The basic data structures provide an index face > mesh. An indexed half-edge representation can optionally be computed > for fast traversal of manifold meshes. The half-edge data structure > includes iterators for mesh surface traversal. There is code for > reading and writing some mesh file formats and algorithms for basic > mesh operations like triangulation and subdivision. There is also an > imesh_algo library with more complex algorithms like PCA on mesh > vertex positions, kd-trees for mesh faces, implicit surfaces from > meshes, and projecting 3-d meshes into images. > Currently implemented uses: deformable vehicle mesh models and meshing > of buildings from lidar. > Dependencies: limited use of vgl and vul (these could be removed if > necessary to make imesh level 1 core). imesh_algo depends on vil, > vgl_algo, vnl_algo, and vpgl. > > > I think VPGL is also a good candidate for promotion to core, but > probably not as part of my thesis work. Well you would have to promote vpgl first before imesh_algo, and I'm not sure of the point of promoting imesh without imesh_algo. > > I'd be interested to know if anyone is using these libraries and if > there is any interest in promoting either of these to core. > IIRC The last time we had the discussion about this, the promotion should be supported by another group who was successfully using the author's code. This does appear to be the case for vpgl. Regards, Ian. |
From: Matthew L. <mat...@gm...> - 2009-01-06 16:23:24
|
Hi Ian, Thanks for your feedback. I'd like to address some of your concerns, but first I'd like revise my ideas about promoting these libraries. I'm thinking that BSTA and IMESH might not be mature enough or used widely enough for immediate promotion to core. However, what my proposal is really about is writing documentation. I'd like write the book chapters BEFORE these libraries are promoted to core. My thinking is that these libraries are general enough to go into core and better documentation might foster more wide spread use and contribution to the code. Eventually this could make these libraries suitable for core. So the question becomes: where do I put VXL book chapters for non- core libraries? On Jan 6, 2009, at 7:56 AM, Ian Scott wrote: > A doxygen build of bsta would be helpful as a > start, but isn't in our or kitware's documentation. BSTA is enabled in doxygen, but currently lacks a much needed introduction page (I will add that soon). It is building in the Brown and Manchester documentation. Kitware's documentation is out of date and has not been rebuilt for a year. REMINDER TO AMITHA: Kitware's nightly Doxygen and Book builds for VXL are still not running. > > > There are some other features of vpdfl/pdf1d that I haven't found in > bsta. A couple of the more important features are > > 1. Types that are independent of vector-length. We work with > vector-spaces of dimensions from 1 to 100000. Most of the time we > cannot > know the dimensionality until runtime, and even if we did, the > proliferation of types and object code would be prohibitive. > > 2. A base class (at least over the double scalars and vectors) set > up as > a strategy pattern, so that the actual distribution in use can be > selected at runtime. > > 3. The base class should have API to perform the following > a. mean and stddev (or throw for those types for which it is not > meaningful). > b. For double vectors and scalars we need p, grad_p, log_p (for > accuracy > and efficiency), threshold, nearest_plasusible, > c. Independent sampling (must store an RNG somewhere) > d. Estimating parameters from population. (Although there is no "best" > algorithm for many distributions) > BSTA takes a template/generic programming approach instead of the strategy pattern approach in vpdfl/pdf1d. I think both of these approaches are useful depending on the application. Let me explain our needs. BSTA was built around the need for very efficient processing of large arrays of distributions with fixed type and dimension. The original application was real-time video background modeling with mixtures of Gaussians. We wanted to reused the same code for both scalar and multivariate distributions, and our dimension is typically small (usually 1, 2, 3, or 4). We might need to choose between a few types at run-time, but once the type is chosen, the same type is used in every element of the array. To make this efficient we wanted to avoid run-time type checking at each array element and heap allocation as much as possible. In many ways the comparison between vpdfl/pdf1d and bsta is similar to vnl_vector and vnl_vector_fixed. Both approaches are usefully, but it depends on whether your application needs flexibility or efficiency. I think features 3b, 3c, and 3d should probably be added to bsta, but for the rest you need a design like vpdfl/pdf1d. Maybe, following the lead of vnl, there should be a single library with conversions between dynamic and templated variants? > > Well you would have to promote vpgl first before imesh_algo, and I'm > not > sure of the point of promoting imesh without imesh_algo. Only a small portion of imesh_algo uses vpgl for mesh projection into images. Most of imesh_algo relies on vnl and vnl_algo. Even imesh by itself is quite useful without imesh_algo. There are several "algorithms" in imesh itself that do not depend on vnl (very much like vgl and vgl_algo). However, I agree that it would be best to promote vpgl first and then promote imesh and imesh_algo afterwards. > > IIRC The last time we had the discussion about this, the promotion > should be supported by another group who was successfully using the > author's code. This does appear to be the case for vpgl. > I don't have a good sense of which groups use which libraries. Especially since there is a lot of private code not contributed back to VXL. I think GE might be using vpgl -- at least in projects we've work with them on (I don't know if that counts). I'm still interested in knowing which contrib libraries are actually being used. Would it be worth putting together a survey, maybe using something like surveymonkey.com, and putting a link out to the vxl users list? --Matt |
From: Amitha P. <ami...@us...> - 2009-01-06 23:56:56
|
Matthew Leotta wrote: > REMINDER TO AMITHA: Kitware's nightly Doxygen and Book builds for VXL > are still not running. Got it. :-) |
From: Ian S. <ian...@st...> - 2009-01-06 18:14:51
|
Matthew Leotta wrote: > Hi Ian, > > Thanks for your feedback. I'd like to address some of your concerns, > but first I'd like revise my ideas about promoting these libraries. I'm > thinking that BSTA and IMESH might not be mature enough or used widely > enough for immediate promotion to core. However, what my proposal is > really about is writing documentation. I'd like write the book chapters > BEFORE these libraries are promoted to core. My thinking is that these > libraries are general enough to go into core and better documentation > might foster more wide spread use and contribution to the code. > Eventually this could make these libraries suitable for core. > > So the question becomes: where do I put VXL book chapters for non-core > libraries? We put chapters for some MUL libraries in contrib/mul/book. That seems to work fine for me. > > > On Jan 6, 2009, at 7:56 AM, Ian Scott wrote: > >> A doxygen build of bsta would be helpful as a >> start, but isn't in our or kitware's documentation. > > BSTA is enabled in doxygen, but currently lacks a much needed > introduction page (I will add that soon). It is building in the Brown > and Manchester documentation. Kitware's documentation is out of date > and has not been rebuilt for a year. Thanks, I'll take a look. > > REMINDER TO AMITHA: Kitware's nightly Doxygen and Book builds for VXL > are still not running. > >> >> >> There are some other features of vpdfl/pdf1d that I haven't found in >> bsta. A couple of the more important features are >> >> 1. Types that are independent of vector-length. We work with >> vector-spaces of dimensions from 1 to 100000. Most of the time we cannot >> know the dimensionality until runtime, and even if we did, the >> proliferation of types and object code would be prohibitive. >> >> 2. A base class (at least over the double scalars and vectors) set up as >> a strategy pattern, so that the actual distribution in use can be >> selected at runtime. >> >> 3. The base class should have API to perform the following >> a. mean and stddev (or throw for those types for which it is not >> meaningful). >> b. For double vectors and scalars we need p, grad_p, log_p (for accuracy >> and efficiency), threshold, nearest_plasusible, >> c. Independent sampling (must store an RNG somewhere) >> d. Estimating parameters from population. (Although there is no "best" >> algorithm for many distributions) >> > > BSTA takes a template/generic programming approach instead of the > strategy pattern approach in vpdfl/pdf1d. I think both of these > approaches are useful depending on the application. Let me explain our > needs. BSTA was built around the need for very efficient processing of > large arrays of distributions with fixed type and dimension. The > original application was real-time video background modeling with > mixtures of Gaussians. We wanted to reused the same code for both > scalar and multivariate distributions, and our dimension is typically > small (usually 1, 2, 3, or 4). We might need to choose between a few > types at run-time, but once the type is chosen, the same type is used in > every element of the array. To make this efficient we wanted to avoid > run-time type checking at each array element and heap allocation as much > as possible. > > In many ways the comparison between vpdfl/pdf1d and bsta is similar to > vnl_vector and vnl_vector_fixed. Both approaches are usefully, but it > depends on whether your application needs flexibility or efficiency. I > think features 3b, 3c, and 3d should probably be added to bsta, but for > the rest you need a design like vpdfl/pdf1d. Maybe, following the lead > of vnl, there should be a single library with conversions between > dynamic and templated variants? If compile time unrolling of low-count loops is important to you, then I believe you are right. If you can tolerate non-unrolled loops, and a polymorphic virtual function calls, then we could have template <class T> class base_pdf<T> { virtual unsigned ndim()=0; virtual const class T & mean()=0; } template <class T, unsigned I> class gaussian_fixed<T, I>: public base_pdf<T> { vnl_vector_fixed<T::type, I> mean_; virtual unsigned ndim() { return I;} virtual const class T & mean() {return mean_;} } template <class T, unsigned I> class gaussian_var<T>: public base_pdf<T> { unsigned n_dims_; vnl_vector<T::type> mean_; virtual unsigned ndim() { return n_dims_;} virtual const class T & mean() {return mean_;} } It looks like you are already paying the virtual function cost in bsta, but I guess the loop unrolling does makes a difference for you. Ian. |
From: Matthew L. <mat...@gm...> - 2009-01-06 19:28:31
|
On Jan 6, 2009, at 1:14 PM, Ian Scott wrote: > > We put chapters for some MUL libraries in contrib/mul/book. That > seems to work fine for me. Ahh... I had forgotten about the MUL book. Thanks! I could probably make a BRL book without too much trouble. I just need to get texi2html running on one of our servers > > If compile time unrolling of low-count loops is important to you, > then I believe you are right. > > If you can tolerate non-unrolled loops, and a polymorphic virtual > function calls, then we could have > > template <class T> class base_pdf<T> > { > virtual unsigned ndim()=0; > virtual const class T & mean()=0; > } > > template <class T, unsigned I> class gaussian_fixed<T, I>: public > base_pdf<T> > { > vnl_vector_fixed<T::type, I> mean_; > virtual unsigned ndim() { return I;} > virtual const class T & mean() {return mean_;} > } > > template <class T, unsigned I> class gaussian_var<T>: public > base_pdf<T> > { > unsigned n_dims_; > vnl_vector<T::type> mean_; > virtual unsigned ndim() { return n_dims_;} > virtual const class T & mean() {return mean_;} > } > > > It looks like you are already paying the virtual function cost in > bsta, but I guess the loop unrolling does makes a difference for you. > What's important to me is speed, so both virtual function cost and loop unrolling factor in. Any virtual functions you seen in bsta are in older deprecated code or were mistakenly added. Polymorphism is not currently used. I had a previous implementation that was similar to what you are proposing. The abstract base class was templated over math type (double, float) and the derived classes were further templated over dimension. The current bsta implementation gave me about a 3 times improvement in processing speed over the old one. I'm not sure if it was the loop unrolling, the absence of virtual functions, or other optimizations that had the biggest effect. I changed them all at the same time, so I don't know. Another problem that arises in your proposed approach is the choice of data types in the API. Your mean() function should return a vector of length ndims(). The return type should be vnl_vector<T> for it to work in the general case. This means that gaussian_fixed<T, I> must convert or wrap its vnl_vector_fixed<T,I> every time mean() is called. Furthermore, when the dimension is 1, the function must wrap scalar in a one dimensional vector. I imagine this is why you have both univariate and multivariate libraries. This is yet another trade off between run-time flexibility and efficiency. Maybe an architecture like the one you propose could be used in parallel with the current bsta ideas and gaussian_fixed<T,I> could be a wrapper around a gaussian with my current design (much like vnl_vector_ref wraps vnl_vector_fixed into a vnl_vector)? --Matt |