From: Matthew L. <mat...@gm...> - 2008-12-22 16:45:32
|
Hi Everyone, As part of writing my thesis, Prof. Mundy has suggested that I write VXL book chapters for some of the supporting libraries I have written and try to promote them to core if there is community interest. The plan is to contribute book chapters both to VXL and as appendices in my thesis. One such library is vidl2 which is already in the works to move to core. I would like to gauge the interest in promoting a couple of others. Both of these would need to be renamed when moved, I'm open to suggestions on naming. 1) BSTA - Brown Statistical Library (brl/bbas/bsta) This is a template library for probability distributions which is templated over dimension and numerical type. The current code is focused on Gaussian distributions and mixtures of distributions, but the framework is designed to work with arbitrary distributions. There is also some old code in the library that does not fit in the framework and would need to be cleaned out. Currently implemented uses: video background modeling and probabilistic voxel world modeling. Dependencies: vnl and vnl_algo (deprecated code also uses vbl) 2) IMESH - Indexed Mesh Library (brl/bbas/imesh) This is a library for representing 2-d surface meshes in N-d (usually 2-d or 3-d) space. The mesh is represented by indexing into vectors of data. The basic data structures provide an index face mesh. An indexed half-edge representation can optionally be computed for fast traversal of manifold meshes. The half-edge data structure includes iterators for mesh surface traversal. There is code for reading and writing some mesh file formats and algorithms for basic mesh operations like triangulation and subdivision. There is also an imesh_algo library with more complex algorithms like PCA on mesh vertex positions, kd-trees for mesh faces, implicit surfaces from meshes, and projecting 3-d meshes into images. Currently implemented uses: deformable vehicle mesh models and meshing of buildings from lidar. Dependencies: limited use of vgl and vul (these could be removed if necessary to make imesh level 1 core). imesh_algo depends on vil, vgl_algo, vnl_algo, and vpgl. I think VPGL is also a good candidate for promotion to core, but probably not as part of my thesis work. I'd be interested to know if anyone is using these libraries and if there is any interest in promoting either of these to core. Thanks, Matt |
From: Antonio G. C. <A.G...@de...> - 2008-12-23 11:01:09
|
Hi, I have not used the libraries yet, but probably I will take a look. I do not know if it is necessary to take into account other libraries in contrib/mul related to statistic. Perhaps somebody from Manchester could better say something about this promotion. In any case, writing book chapters is a good idea. Moreover, these chapters would add interest in these libraries. Antonio. Matthew Leotta escribió: > Hi Everyone, > > As part of writing my thesis, Prof. Mundy has suggested that I write > VXL book chapters for some of the supporting libraries I have written > and try to promote them to core if there is community interest. The > plan is to contribute book chapters both to VXL and as appendices in > my thesis. One such library is vidl2 which is already in the works to > move to core. I would like to gauge the interest in promoting a > couple of others. Both of these would need to be renamed when moved, > I'm open to suggestions on naming. > > > 1) BSTA - Brown Statistical Library (brl/bbas/bsta) > This is a template library for probability distributions which is > templated over dimension and numerical type. The current code is > focused on Gaussian distributions and mixtures of distributions, but > the framework is designed to work with arbitrary distributions. There > is also some old code in the library that does not fit in the > framework and would need to be cleaned out. > Currently implemented uses: video background modeling and > probabilistic voxel world modeling. > Dependencies: vnl and vnl_algo (deprecated code also uses vbl) > > > 2) IMESH - Indexed Mesh Library (brl/bbas/imesh) > This is a library for representing 2-d surface meshes in N-d > (usually 2-d or 3-d) space. The mesh is represented by indexing into > vectors of data. The basic data structures provide an index face > mesh. An indexed half-edge representation can optionally be computed > for fast traversal of manifold meshes. The half-edge data structure > includes iterators for mesh surface traversal. There is code for > reading and writing some mesh file formats and algorithms for basic > mesh operations like triangulation and subdivision. There is also an > imesh_algo library with more complex algorithms like PCA on mesh > vertex positions, kd-trees for mesh faces, implicit surfaces from > meshes, and projecting 3-d meshes into images. > Currently implemented uses: deformable vehicle mesh models and meshing > of buildings from lidar. > Dependencies: limited use of vgl and vul (these could be removed if > necessary to make imesh level 1 core). imesh_algo depends on vil, > vgl_algo, vnl_algo, and vpgl. > > > I think VPGL is also a good candidate for promotion to core, but > probably not as part of my thesis work. > > I'd be interested to know if anyone is using these libraries and if > there is any interest in promoting either of these to core. > > Thanks, > Matt > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Vxl-maintainers mailing list > Vxl...@li... > https://lists.sourceforge.net/lists/listinfo/vxl-maintainers > |
From: Peter V. <pet...@ya...> - 2009-01-08 10:01:00
|
Interesting discussion! I believe we should (1) write out a "clean" design, essentially from scratch, indeed including several "points-of-view" (in the style of vnl_vector vs. vnl_vector_fixed), but fairly complete (i.e., including functionality which is currently either not used or not fully implemented). Then (2) gradually fill this framework with implementations from bsta, pdf1d and vpdfl (and possibly other places). Next (3) replace implementations in bsta, pdf1d and vpdfl by (inline) calls to the new library. And finally (4) gradually replace (in client code) all use of the then "old" libraries by directly accessing the new core library. We've had experience and good results with a similar approach, when we converted TargetJr into vxl. It took us then 6 days with 10 people (sitting together in Oxford) to have a complete and working set of core libraries; the first 2 days were mainly spent to just write out the design (and discuss choices to be made). For a new statistics library, I guess we'll need less than 50 man-hours to do a similar thing (which corresponds to steps 1 and 2 above). Thoughts? -- Peter. __________________________________________________________ Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. Sök och jämför priser hos Kelkoo. http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325 |
From: Joseph M. <mu...@le...> - 2009-01-08 13:21:09
|
I agree we should do something rather than nothing. It has been a long time since anything new has been promoted to core. The bsta libraries are in regular use by a number of our research funding organizations. Admittedly they don't actually program using the libraries but do carry out experiments on operational datasets using processes that are based on bsta. These applications require optimum speed and focus on processing of images and video which have up to 4 color bands. I agree that we are in a similar situation to vnl where there should be the option of fixed or variable dimension to the sample space. Joe -----Original Message----- From: Peter Vanroose [mailto:pet...@ya...] Sent: Thursday, January 08, 2009 5:01 AM To: vxl maintainers Subject: [Vxl-maintainers] bsta, pdf1d, vpdfl [was: Proposing other libraries to promote to VXL core] Interesting discussion! I believe we should (1) write out a "clean" design, essentially from scratch, indeed including several "points-of-view" (in the style of vnl_vector vs. vnl_vector_fixed), but fairly complete (i.e., including functionality which is currently either not used or not fully implemented). Then (2) gradually fill this framework with implementations from bsta, pdf1d and vpdfl (and possibly other places). Next (3) replace implementations in bsta, pdf1d and vpdfl by (inline) calls to the new library. And finally (4) gradually replace (in client code) all use of the then "old" libraries by directly accessing the new core library. We've had experience and good results with a similar approach, when we converted TargetJr into vxl. It took us then 6 days with 10 people (sitting together in Oxford) to have a complete and working set of core libraries; the first 2 days were mainly spent to just write out the design (and discuss choices to be made). For a new statistics library, I guess we'll need less than 50 man-hours to do a similar thing (which corresponds to steps 1 and 2 above). Thoughts? -- Peter. __________________________________________________________ Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. Sök och jämför priser hos Kelkoo. http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325 ---------------------------------------------------------------------------- -- Check out the new SourceForge.net Marketplace. It is the best place to buy or sell services for just about anything Open Source. http://p.sf.net/sfu/Xq1LFB _______________________________________________ Vxl-maintainers mailing list Vxl...@li... https://lists.sourceforge.net/lists/listinfo/vxl-maintainers |
From: Matthew L. <mat...@gm...> - 2009-01-12 14:47:19
|
Peter, I think there are a couple of different ways of moving forward with this statistical library. One is to bring both the template library (bsta) and the strategy patterned libraries (pdf1d and vpdfl) into one library. If we take this route then I think your strategy is a good one. However I'm not sure who is going to write out the clean design. I think we would need input from Brown and Manchester (and anyone else interested) to meet everyone's needs. We don't have all the interested parties sitting together. I suppose this could be done over e-mail, but someone would need to take the lead and propose an initial design. I'd be willing to help with fitting the bsta code into the design, but I don't think I could take the lead on this as I'm currently trying to write my Phd thesis. Another option is to consider keeping the libraries separate. bsta is really a generic template library in the style of STL. After cleaning out the old deprecated code you would find that the only compiled files are the .cxx files in the Templates subdirectory used to instantiate with some of the more common types. All of the data structures and algorithms are templated so that the code can be optimized at compile time for whichever data types and dimensionality are chosen. This is a bit more extreme than in vnl where the vnl_vector_fixed is almost always wrapped as a vnl_vector for use in algorithms. I think there may be some merit to keeping bsta as a pure template library. It might be useful to have a strategy patterned library wrap some of the templated distribution classes, but I don't see how the templated algorithms can make much use of strategy patterned distribution classes. Thoughts? My ulterior motive here is to document the design and use of some libraries I've written and used in my thesis. I'm already doing this with vidl2 and trying to see if I can do the same with bsta. The stipulations set forth by my advisor (Joe Mundy) are 1) The library is one I have designed (at least the initial framework) 2) The library is used in my thesis work. 3) I am to write a VXL book chapter to document the code and include a copy as an appendix in my thesis. 4) The library must be used by others and promoted to core. The "promoted to core" part is the most difficult since there has been nothing added to core in a long long time. I am willing to clean up, rename, move, and thoroughly document bsta on my own if it is accepted into core. I don't think I have the time to redesign it to merge with pdf1d and vpdfl while finishing my thesis. In that case, I would probably have to cut if from my thesis and put off writing a book chapter until some unforeseen future date. Matt On Jan 8, 2009, at 5:00 AM, Peter Vanroose wrote: > Interesting discussion! > > I believe we should (1) write out a "clean" design, essentially from > scratch, indeed including several "points-of-view" (in the style of > vnl_vector vs. vnl_vector_fixed), but fairly complete (i.e., > including functionality which is currently either not used or not > fully implemented). > Then (2) gradually fill this framework with implementations from > bsta, pdf1d and vpdfl (and possibly other places). > Next (3) replace implementations in bsta, pdf1d and vpdfl by > (inline) calls to the new library. > And finally (4) gradually replace (in client code) all use of the > then "old" libraries by directly accessing the new core library. > > We've had experience and good results with a similar approach, when > we converted TargetJr into vxl. It took us then 6 days with 10 > people (sitting together in Oxford) to have a complete and working > set of core libraries; the first 2 days were mainly spent to just > write out the design (and discuss choices to be made). > For a new statistics library, I guess we'll need less than 50 man- > hours to do a similar thing (which corresponds to steps 1 and 2 > above). > > Thoughts? > > -- Peter. > > > > > > > __________________________________________________________ > Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. > Sök och jämför priser hos Kelkoo. > http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325 > > ------------------------------------------------------------------------------ > Check out the new SourceForge.net Marketplace. > It is the best place to buy or sell services for > just about anything Open Source. > http://p.sf.net/sfu/Xq1LFB > _______________________________________________ > Vxl-maintainers mailing list > Vxl...@li... > https://lists.sourceforge.net/lists/listinfo/vxl-maintainers |
From: Peter V. <pet...@ya...> - 2009-01-12 19:07:51
|
OK, I see your point. I don't know what your time frame is; maybe we'll have time to first arrange some kind of "common brain storm" forum; I'm definitely interested in joining in (although I've not used any of the three libraries yet; but as a mathematician/statistician I'm certainly interested in the topic). Then someone could write out new class specs (any takers?), again with a few days time for others to suggest adaptations/additions/... Finally, we could divide work to do the actual "move", i.e., implement the new classes with the implementations from the three existing libraries. If we find (say) 3 or 4 people having time to do this, in a reasonable short time span, this should be doable. Your second path (keeping the libraries "as is") is easier, but would maybe not fulfill the requirement of "make sure others are using bsta". -- Peter. > I think there are a couple of different ways of moving > forward with this statistical library. > > One is to bring both the template library (bsta) and the > strategy patterned libraries (pdf1d and vpdfl) into one > library. If we take this route then I think your strategy > is a good one. However I'm not sure who is going to > write out the clean design. I think we would need input > from Brown and Manchester (and anyone else interested) to > meet everyone's needs. We don't have all the > interested parties sitting together. I suppose this could > be done over e-mail, but someone would need to take the lead > and propose an initial design. I'd be willing to help > with fitting the bsta code into the design, but I don't > think I could take the lead on this as I'm currently > trying to write my Phd thesis. > > Another option is to consider keeping the libraries > separate. bsta is really a generic template library in the > style of STL. After cleaning out the old deprecated code > you would find that the only compiled files are the .cxx > files in the Templates subdirectory used to instantiate with > some of the more common types. All of the data structures > and algorithms are templated so that the code can be > optimized at compile time for whichever data types and > dimensionality are chosen. This is a bit more extreme than > in vnl where the vnl_vector_fixed is almost always wrapped > as a vnl_vector for use in algorithms. I think there may be > some merit to keeping bsta as a pure template library. It > might be useful to have a strategy patterned library wrap > some of the templated distribution classes, but I don't > see how the templated algorithms can make much use of > strategy patterned distribution classes. Thoughts? > > My ulterior motive here is to document the design and use > of some libraries I've written and used in my thesis. > I'm already doing this with vidl2 and trying to see if I > can do the same with bsta. The stipulations set forth by my > advisor (Joe Mundy) are > 1) The library is one I have designed (at least the initial > framework) > 2) The library is used in my thesis work. > 3) I am to write a VXL book chapter to document the code > and include a copy as an appendix in my thesis. > 4) The library must be used by others and promoted to core. > > The "promoted to core" part is the most difficult > since there has been nothing added to core in a long long > time. I am willing to clean up, rename, move, and > thoroughly document bsta on my own if it is accepted into > core. I don't think I have the time to redesign it to > merge with pdf1d and vpdfl while finishing my thesis. In > that case, I would probably have to cut if from my thesis > and put off writing a book chapter until some unforeseen > future date. > > > Matt > > > On Jan 8, 2009, at 5:00 AM, Peter Vanroose wrote: > > > Interesting discussion! > > > > I believe we should (1) write out a "clean" > design, essentially from scratch, indeed including several > "points-of-view" (in the style of vnl_vector vs. > vnl_vector_fixed), but fairly complete (i.e., including > functionality which is currently either not used or not > fully implemented). > > Then (2) gradually fill this framework with > implementations from bsta, pdf1d and vpdfl (and possibly > other places). > > Next (3) replace implementations in bsta, pdf1d and > vpdfl by (inline) calls to the new library. > > And finally (4) gradually replace (in client code) all > use of the then "old" libraries by directly > accessing the new core library. > > > > We've had experience and good results with a > similar approach, when we converted TargetJr into vxl. It > took us then 6 days with 10 people (sitting together in > Oxford) to have a complete and working set of core > libraries; the first 2 days were mainly spent to just write > out the design (and discuss choices to be made). > > For a new statistics library, I guess we'll need > less than 50 man-hours to do a similar thing (which > corresponds to steps 1 and 2 above). > > > > Thoughts? > > > > -- Peter. __________________________________________________________ Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo. http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014 |
From: Matthew L. <mat...@gm...> - 2009-01-12 20:13:57
|
Peter, On Jan 12, 2009, at 2:07 PM, Peter Vanroose wrote: > OK, I see your point. > I don't know what your time frame is; maybe we'll have time to first > arrange some kind of "common brain storm" forum; I'm definitely > interested in joining in (although I've not used any of the three > libraries yet; but as a mathematician/statistician I'm certainly > interested in the topic). > Then someone could write out new class specs (any takers?), again > with a few days time for others to suggest adaptations/additions/... > Finally, we could divide work to do the actual "move", i.e., > implement the new classes with the implementations from the three > existing libraries. My time frame is relatively short. I'd like to do something over the next few months. I don't have time to put a lot of effort into a redesign right now. If someone wants to do this I'll contribute to the discussion to make sure the design meets the requirements of bsta. > > If we find (say) 3 or 4 people having time to do this, in a > reasonable short time span, this should be doable. > Your second path (keeping the libraries "as is") is easier, but > would maybe not fulfill the requirement of "make sure others are > using bsta". The requirement of "make sure other are using bsta" is already met for my purposes. I'm using it for video background modeling and there is another group independently using the code for probabilistic voxel world modeling. However, both groups are at Brown, so I might not be meeting this requirement for "moving into core" purposes. Matt > > > -- Peter. > > >> I think there are a couple of different ways of moving >> forward with this statistical library. >> >> One is to bring both the template library (bsta) and the >> strategy patterned libraries (pdf1d and vpdfl) into one >> library. If we take this route then I think your strategy >> is a good one. However I'm not sure who is going to >> write out the clean design. I think we would need input >> from Brown and Manchester (and anyone else interested) to >> meet everyone's needs. We don't have all the >> interested parties sitting together. I suppose this could >> be done over e-mail, but someone would need to take the lead >> and propose an initial design. I'd be willing to help >> with fitting the bsta code into the design, but I don't >> think I could take the lead on this as I'm currently >> trying to write my Phd thesis. >> >> Another option is to consider keeping the libraries >> separate. bsta is really a generic template library in the >> style of STL. After cleaning out the old deprecated code >> you would find that the only compiled files are the .cxx >> files in the Templates subdirectory used to instantiate with >> some of the more common types. All of the data structures >> and algorithms are templated so that the code can be >> optimized at compile time for whichever data types and >> dimensionality are chosen. This is a bit more extreme than >> in vnl where the vnl_vector_fixed is almost always wrapped >> as a vnl_vector for use in algorithms. I think there may be >> some merit to keeping bsta as a pure template library. It >> might be useful to have a strategy patterned library wrap >> some of the templated distribution classes, but I don't >> see how the templated algorithms can make much use of >> strategy patterned distribution classes. Thoughts? >> >> My ulterior motive here is to document the design and use >> of some libraries I've written and used in my thesis. >> I'm already doing this with vidl2 and trying to see if I >> can do the same with bsta. The stipulations set forth by my >> advisor (Joe Mundy) are >> 1) The library is one I have designed (at least the initial >> framework) >> 2) The library is used in my thesis work. >> 3) I am to write a VXL book chapter to document the code >> and include a copy as an appendix in my thesis. >> 4) The library must be used by others and promoted to core. >> >> The "promoted to core" part is the most difficult >> since there has been nothing added to core in a long long >> time. I am willing to clean up, rename, move, and >> thoroughly document bsta on my own if it is accepted into >> core. I don't think I have the time to redesign it to >> merge with pdf1d and vpdfl while finishing my thesis. In >> that case, I would probably have to cut if from my thesis >> and put off writing a book chapter until some unforeseen >> future date. >> >> >> Matt >> >> >> On Jan 8, 2009, at 5:00 AM, Peter Vanroose wrote: >> >>> Interesting discussion! >>> >>> I believe we should (1) write out a "clean" >> design, essentially from scratch, indeed including several >> "points-of-view" (in the style of vnl_vector vs. >> vnl_vector_fixed), but fairly complete (i.e., including >> functionality which is currently either not used or not >> fully implemented). >>> Then (2) gradually fill this framework with >> implementations from bsta, pdf1d and vpdfl (and possibly >> other places). >>> Next (3) replace implementations in bsta, pdf1d and >> vpdfl by (inline) calls to the new library. >>> And finally (4) gradually replace (in client code) all >> use of the then "old" libraries by directly >> accessing the new core library. >>> >>> We've had experience and good results with a >> similar approach, when we converted TargetJr into vxl. It >> took us then 6 days with 10 people (sitting together in >> Oxford) to have a complete and working set of core >> libraries; the first 2 days were mainly spent to just write >> out the design (and discuss choices to be made). >>> For a new statistics library, I guess we'll need >> less than 50 man-hours to do a similar thing (which >> corresponds to steps 1 and 2 above). >>> >>> Thoughts? >>> >>> -- Peter. > > > > > > __________________________________________________________ > Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo. > http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014 |
From: Matthew L. <mat...@gm...> - 2009-01-15 19:48:16
|
Peter, Ian, and anyone else interested, In an attempt to keep things moving on this combined probability library I'd like to make a proposal on how to integrate bsta, pdf1d, and vpdfl. I have a few general questions to be resolved first. If we can come to an agreement on the general way to proceed then I'll take some time to make a more detailed class specification. I've reviewed all three libraries in more detail, and I'm even more convinced that we can't completely merge them and still meet the requirements of both Brown and Manchester. However, I think we could adopt the strategy pattern of vpdfl as the primary design and then put the generic programming design of bsta in a subdirectory and make it a pure template library. The template distributions could mirror a subset of the strategy pattern distributions and have wrapper classes to use template distributions in the strategy pattern framework. For example (naming conventions subject to change): template <class T> class gaussian<T> : public base_distribution<T>; // uses virtual functions template <class T> class gaussian_ref<T> : public gaussian<T> // uses virtual functions and in the template library template <class T, unsigned N> class gaussian_fixed<T,N>; // no virtual functions The gaussian_fixed class is unrelated to base_distribution. It's data is represented in terms of vnl_vector_fixed and vnl_matrix_fixed while the gaussian_ref class contains vnl_vector_ref and vnl_matrix_ref with the same data representation. So the fixed size data can be used in the strategy pattern framework for functions that are not speed critical. The generic template part of the library could be used on it's own, but would only contain the basic data structures and algorithms that need to be optimized for speed or memory layout. What I think we would need: 1) A common set naming conventions and data representations for distributions found in both designs. Examples: a) both bsta and vpdfl have axis aligned and full covariance Gaussians (with different names), vpdlf also has a truncated principal components Gaussian while bsta also has a spherically symmetric Gaussian. b) vpdfl currently stores a Gaussian covariance matrix in eigenvalue decomposition, while bsta stores the original matrix and caches its inverse 2) The strategy pattern code (vpdfl) should be templated over numeric type (float or double). 3) Acceptance that in the scalar case (if num_dimensions is 1 at run time) some scalar values will have to be represented as vnl_vector of size 1 and vnl_matrix of size 1x1. I am most worried about number 3. This is why pdf1d exists. Can pdf1d be a special case of vpdfl? If someone is interested in working only in 1-d will they be put off by the need to use 1-d vectors and matrices? bsta solves this with template specialization that substitutes vnl_vector_fixed<double,N> with double when N==1. Other things I would like to see in the strategy pattern: 1) it would be nice if the distribution is more than just a density (pdf). I would also like to see cumulative calculations (cdf). Sometimes density is not enough and you want to know the actual probability integrated over some area. If we can evaluate the cdf then we can at least get the probability in an axis-aligned bounding box (by evaluating cdf at the box corners). some parts of bsta support this. 2) It would be nice if we could work out recursive estimation for the "builder" classes. It's nice that the mbl_data_wrapper doesn't require that you have all the data in memory, but it does require that you use all the data before you get the estimated distribution. If a builder supports recursive estimation it would be nice if you could feed the data points one at a time and stop at any point to get the distribution so far. 3) It might be good to clean up the vpdfl interface a bit. Some functions on vpdfl_pdf base seem a little to application or distribution specific for the general case. n_peaks() and peak(int) seem to make sense for mixtures of Gaussians with well separate components, but it might be misleading (and difficult to compute accurately) in the general case. Others like nearest_plausible seem a bit application specific and maybe should non-member functions to reduce clutter. Thoughts? Matt On Jan 12, 2009, at 2:07 PM, Peter Vanroose wrote: > OK, I see your point. > I don't know what your time frame is; maybe we'll have time to first > arrange some kind of "common brain storm" forum; I'm definitely > interested in joining in (although I've not used any of the three > libraries yet; but as a mathematician/statistician I'm certainly > interested in the topic). > Then someone could write out new class specs (any takers?), again > with a few days time for others to suggest adaptations/additions/... > Finally, we could divide work to do the actual "move", i.e., > implement the new classes with the implementations from the three > existing libraries. > > If we find (say) 3 or 4 people having time to do this, in a > reasonable short time span, this should be doable. > Your second path (keeping the libraries "as is") is easier, but > would maybe not fulfill the requirement of "make sure others are > using bsta". > > -- Peter. > > >> I think there are a couple of different ways of moving >> forward with this statistical library. >> >> One is to bring both the template library (bsta) and the >> strategy patterned libraries (pdf1d and vpdfl) into one >> library. If we take this route then I think your strategy >> is a good one. However I'm not sure who is going to >> write out the clean design. I think we would need input >> from Brown and Manchester (and anyone else interested) to >> meet everyone's needs. We don't have all the >> interested parties sitting together. I suppose this could >> be done over e-mail, but someone would need to take the lead >> and propose an initial design. I'd be willing to help >> with fitting the bsta code into the design, but I don't >> think I could take the lead on this as I'm currently >> trying to write my Phd thesis. >> >> Another option is to consider keeping the libraries >> separate. bsta is really a generic template library in the >> style of STL. After cleaning out the old deprecated code >> you would find that the only compiled files are the .cxx >> files in the Templates subdirectory used to instantiate with >> some of the more common types. All of the data structures >> and algorithms are templated so that the code can be >> optimized at compile time for whichever data types and >> dimensionality are chosen. This is a bit more extreme than >> in vnl where the vnl_vector_fixed is almost always wrapped >> as a vnl_vector for use in algorithms. I think there may be >> some merit to keeping bsta as a pure template library. It >> might be useful to have a strategy patterned library wrap >> some of the templated distribution classes, but I don't >> see how the templated algorithms can make much use of >> strategy patterned distribution classes. Thoughts? >> >> My ulterior motive here is to document the design and use >> of some libraries I've written and used in my thesis. >> I'm already doing this with vidl2 and trying to see if I >> can do the same with bsta. The stipulations set forth by my >> advisor (Joe Mundy) are >> 1) The library is one I have designed (at least the initial >> framework) >> 2) The library is used in my thesis work. >> 3) I am to write a VXL book chapter to document the code >> and include a copy as an appendix in my thesis. >> 4) The library must be used by others and promoted to core. >> >> The "promoted to core" part is the most difficult >> since there has been nothing added to core in a long long >> time. I am willing to clean up, rename, move, and >> thoroughly document bsta on my own if it is accepted into >> core. I don't think I have the time to redesign it to >> merge with pdf1d and vpdfl while finishing my thesis. In >> that case, I would probably have to cut if from my thesis >> and put off writing a book chapter until some unforeseen >> future date. >> >> >> Matt >> >> >> On Jan 8, 2009, at 5:00 AM, Peter Vanroose wrote: >> >>> Interesting discussion! >>> >>> I believe we should (1) write out a "clean" >> design, essentially from scratch, indeed including several >> "points-of-view" (in the style of vnl_vector vs. >> vnl_vector_fixed), but fairly complete (i.e., including >> functionality which is currently either not used or not >> fully implemented). >>> Then (2) gradually fill this framework with >> implementations from bsta, pdf1d and vpdfl (and possibly >> other places). >>> Next (3) replace implementations in bsta, pdf1d and >> vpdfl by (inline) calls to the new library. >>> And finally (4) gradually replace (in client code) all >> use of the then "old" libraries by directly >> accessing the new core library. >>> >>> We've had experience and good results with a >> similar approach, when we converted TargetJr into vxl. It >> took us then 6 days with 10 people (sitting together in >> Oxford) to have a complete and working set of core >> libraries; the first 2 days were mainly spent to just write >> out the design (and discuss choices to be made). >>> For a new statistics library, I guess we'll need >> less than 50 man-hours to do a similar thing (which >> corresponds to steps 1 and 2 above). >>> >>> Thoughts? >>> >>> -- Peter. > > > > > > __________________________________________________________ > Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo. > http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014 |
From: Matthew L. <mat...@gm...> - 2009-02-03 13:41:33
|
Ian, We may have some manpower now to start working on a probability library that unifies bsta, pdf1d, and vpdfl. It looks like I might have some funding to work on it, and Miguel has some students at the University of Puerto Rico who might be able to help. Peter also expressed interest at one point in getting involved. I'm not asking for you to contribute to this effort, but I am looking for your approval. The core of the library will be based on pdf1d and vpdfl. I'm hoping that we can create a library meets everyones needs. The biggest challenge I see is unifying pdf1d with vpdfl. We would need to use vnl_vector of size 1 and vnl_matrix of size 1x1 for the univariate case. My questions are: 1) Is this too much overhead for the univariate case? 2) Will this be too complicated for users who just want to work with scalars in the univariate distributions? We might be able to address number 2 by using proxy classes for the vector and matrix that can implicitly cast to and from a scalar. If this proposed design is acceptable to you I'll move to the next phase and start to create header files. Thanks, Matt On Jan 15, 2009, at 2:48 PM, Matthew Leotta wrote: > Peter, Ian, and anyone else interested, > > In an attempt to keep things moving on this combined probability > library I'd like to make a proposal on how to integrate bsta, pdf1d, > and vpdfl. I have a few general questions to be resolved first. If > we can come to an agreement on the general way to proceed then I'll > take some time to make a more detailed class specification. > > I've reviewed all three libraries in more detail, and I'm even more > convinced that we can't completely merge them and still meet the > requirements of both Brown and Manchester. However, I think we > could adopt the strategy pattern of vpdfl as the primary design and > then put the generic programming design of bsta in a subdirectory > and make it a pure template library. The template distributions > could mirror a subset of the strategy pattern distributions and have > wrapper classes to use template distributions in the strategy > pattern framework. For example (naming conventions subject to > change): > > template <class T> class gaussian<T> : public > base_distribution<T>; // uses virtual functions > template <class T> class gaussian_ref<T> : public gaussian<T> // > uses virtual functions > > and in the template library > > template <class T, unsigned N> class gaussian_fixed<T,N>; // no > virtual functions > > The gaussian_fixed class is unrelated to base_distribution. It's > data is represented in terms of vnl_vector_fixed and > vnl_matrix_fixed while the gaussian_ref class contains > vnl_vector_ref and vnl_matrix_ref with the same data > representation. So the fixed size data can be used in the strategy > pattern framework for functions that are not speed critical. The > generic template part of the library could be used on it's own, but > would only contain the basic data structures and algorithms that > need to be optimized for speed or memory layout. > > What I think we would need: > > 1) A common set naming conventions and data representations for > distributions found in both designs. > Examples: > a) both bsta and vpdfl have axis aligned and full covariance > Gaussians (with different names), > vpdlf also has a truncated principal components Gaussian while > bsta also has a spherically symmetric Gaussian. > b) vpdfl currently stores a Gaussian covariance matrix in > eigenvalue decomposition, > while bsta stores the original matrix and caches its inverse > > 2) The strategy pattern code (vpdfl) should be templated over > numeric type (float or double). > > 3) Acceptance that in the scalar case (if num_dimensions is 1 at run > time) some scalar values will have to be represented as vnl_vector > of size 1 and vnl_matrix of size 1x1. > > I am most worried about number 3. This is why pdf1d exists. Can > pdf1d be a special case of vpdfl? If someone is interested in > working only in 1-d will they be put off by the need to use 1-d > vectors and matrices? bsta solves this with template specialization > that substitutes vnl_vector_fixed<double,N> with double when N==1. > > > Other things I would like to see in the strategy pattern: > > 1) it would be nice if the distribution is more than just a density > (pdf). I would also like to see cumulative calculations (cdf). > Sometimes density is not enough and you want to know the actual > probability integrated over some area. If we can evaluate the cdf > then we can at least get the probability in an axis-aligned bounding > box (by evaluating cdf at the box corners). some parts of bsta > support this. > > 2) It would be nice if we could work out recursive estimation for > the "builder" classes. It's nice that the mbl_data_wrapper doesn't > require that you have all the data in memory, but it does require > that you use all the data before you get the estimated > distribution. If a builder supports recursive estimation it would > be nice if you could feed the data points one at a time and stop at > any point to get the distribution so far. > > 3) It might be good to clean up the vpdfl interface a bit. Some > functions on vpdfl_pdf base seem a little to application or > distribution specific for the general case. n_peaks() and peak(int) > seem to make sense for mixtures of Gaussians with well separate > components, but it might be misleading (and difficult to compute > accurately) in the general case. Others like nearest_plausible seem > a bit application specific and maybe should non-member functions to > reduce clutter. > > Thoughts? > > Matt > > > On Jan 12, 2009, at 2:07 PM, Peter Vanroose wrote: > >> OK, I see your point. >> I don't know what your time frame is; maybe we'll have time to >> first arrange some kind of "common brain storm" forum; I'm >> definitely interested in joining in (although I've not used any of >> the three libraries yet; but as a mathematician/statistician I'm >> certainly interested in the topic). >> Then someone could write out new class specs (any takers?), again >> with a few days time for others to suggest adaptations/additions/... >> Finally, we could divide work to do the actual "move", i.e., >> implement the new classes with the implementations from the three >> existing libraries. >> >> If we find (say) 3 or 4 people having time to do this, in a >> reasonable short time span, this should be doable. >> Your second path (keeping the libraries "as is") is easier, but >> would maybe not fulfill the requirement of "make sure others are >> using bsta". >> >> -- Peter. >> >> >>> I think there are a couple of different ways of moving >>> forward with this statistical library. >>> >>> One is to bring both the template library (bsta) and the >>> strategy patterned libraries (pdf1d and vpdfl) into one >>> library. If we take this route then I think your strategy >>> is a good one. However I'm not sure who is going to >>> write out the clean design. I think we would need input >>> from Brown and Manchester (and anyone else interested) to >>> meet everyone's needs. We don't have all the >>> interested parties sitting together. I suppose this could >>> be done over e-mail, but someone would need to take the lead >>> and propose an initial design. I'd be willing to help >>> with fitting the bsta code into the design, but I don't >>> think I could take the lead on this as I'm currently >>> trying to write my Phd thesis. >>> >>> Another option is to consider keeping the libraries >>> separate. bsta is really a generic template library in the >>> style of STL. After cleaning out the old deprecated code >>> you would find that the only compiled files are the .cxx >>> files in the Templates subdirectory used to instantiate with >>> some of the more common types. All of the data structures >>> and algorithms are templated so that the code can be >>> optimized at compile time for whichever data types and >>> dimensionality are chosen. This is a bit more extreme than >>> in vnl where the vnl_vector_fixed is almost always wrapped >>> as a vnl_vector for use in algorithms. I think there may be >>> some merit to keeping bsta as a pure template library. It >>> might be useful to have a strategy patterned library wrap >>> some of the templated distribution classes, but I don't >>> see how the templated algorithms can make much use of >>> strategy patterned distribution classes. Thoughts? >>> >>> My ulterior motive here is to document the design and use >>> of some libraries I've written and used in my thesis. >>> I'm already doing this with vidl2 and trying to see if I >>> can do the same with bsta. The stipulations set forth by my >>> advisor (Joe Mundy) are >>> 1) The library is one I have designed (at least the initial >>> framework) >>> 2) The library is used in my thesis work. >>> 3) I am to write a VXL book chapter to document the code >>> and include a copy as an appendix in my thesis. >>> 4) The library must be used by others and promoted to core. >>> >>> The "promoted to core" part is the most difficult >>> since there has been nothing added to core in a long long >>> time. I am willing to clean up, rename, move, and >>> thoroughly document bsta on my own if it is accepted into >>> core. I don't think I have the time to redesign it to >>> merge with pdf1d and vpdfl while finishing my thesis. In >>> that case, I would probably have to cut if from my thesis >>> and put off writing a book chapter until some unforeseen >>> future date. >>> >>> >>> Matt >>> >>> >>> On Jan 8, 2009, at 5:00 AM, Peter Vanroose wrote: >>> >>>> Interesting discussion! >>>> >>>> I believe we should (1) write out a "clean" >>> design, essentially from scratch, indeed including several >>> "points-of-view" (in the style of vnl_vector vs. >>> vnl_vector_fixed), but fairly complete (i.e., including >>> functionality which is currently either not used or not >>> fully implemented). >>>> Then (2) gradually fill this framework with >>> implementations from bsta, pdf1d and vpdfl (and possibly >>> other places). >>>> Next (3) replace implementations in bsta, pdf1d and >>> vpdfl by (inline) calls to the new library. >>>> And finally (4) gradually replace (in client code) all >>> use of the then "old" libraries by directly >>> accessing the new core library. >>>> >>>> We've had experience and good results with a >>> similar approach, when we converted TargetJr into vxl. It >>> took us then 6 days with 10 people (sitting together in >>> Oxford) to have a complete and working set of core >>> libraries; the first 2 days were mainly spent to just write >>> out the design (and discuss choices to be made). >>>> For a new statistics library, I guess we'll need >>> less than 50 man-hours to do a similar thing (which >>> corresponds to steps 1 and 2 above). >>>> >>>> Thoughts? >>>> >>>> -- Peter. >> >> >> >> >> >> __________________________________________________________ >> Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo. >> http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014 > |