From: Andy S. <and...@gm...> - 2014-10-22 15:16:48
|
Not sure what you mean by C api, I was talking about methods on the various spatial C++ classes to get a data buffer. If you want to add a Something::getData(std::vector<double>&); in addition to the buffer method, i.e. Something::getData(int len, double* data); that would be fine, but I'm just not sure how useful it would be. I don't think I've ever worked with any physics engine or realtime visualization library that ever used std::vector to hold vertex data. For example, take a look at DirectX or OpenGl. In all of them, all vertex data is handled in plain buffers. Even in higher level libraries like VTK, they provide classes to manage data buffers. In realtime physics or graphics systems, you're going to be performing a lot of matrix operations, and these are just not well suited to std::vector. Also, if you look at sort of the intent of libsbml: a library to read and write data stored in sbml format. You want to use it to read a model and store it in your own internal data structures, which chances are, are not going to be std::vector. I'm not saying std::vector is bad, in fact I use it *a lot*, just saying that its not really well suited for things such as vertex or connectivity buffers. On Oct 22, 2014, at 8:38 AM, Weatherby,Gerard wrote: > Better != only. The C api could remain. In fact the libsbml implementation could be as simple as: > > void getData(std::vector<double> & data) { > data.reserve(getArrayLen( ) ); > getArrayData(&data.front( )); > } > > (I think, haven’t tested it) > > > From: Andy Somogyi [mailto:and...@gm...] > Sent: Wednesday, October 22, 2014 8:21 AM > To: The SBML L3 Spatial Processes and Geometries package discussion list > Subject: Re: [sbml-spatial] API (was Compression) > > What if you're not using std vector to store your data? Say your data is stored in an Eigen or Boost matrix? Or even your own data structure. > > Every numeric data structure (including std vector) has a way of getting pointer to the data, then you just pass this pointer to whatever func you want to read/write to that data. > > On Wednesday, October 22, 2014, Weatherby,Gerard <gwe...@uc...> wrote: > From a C++ perspective, the better API would be > > void getData(std::vector<double> & data) > > where the implementation could reserve the necessary space (std::vector<>::reserve( ) ) and then fill the data. > > From: Frank T. Bergmann [mailto:fbe...@ca...] > Sent: Wednesday, October 22, 2014 1:51 AM > To: 'The SBML L3 Spatial Processes and Geometries package discussion list' > Subject: Re: [sbml-spatial] Compression > > Hello Andy, > > The API you suggest: > > int len = obj->getArrayLen(); > double* myData = new double[len]; > obj->getArrayData(myData); > > is indeed what is currently implemented in libSBML. > > Frank > > From: Andy Somogyi [mailto:and...@gm...] > Sent: Tuesday, October 21, 2014 8:00 PM > To: The SBML L3 Spatial Processes and Geometries package discussion list > Subject: Re: [sbml-spatial] Compression > > On the API side, I'm asking, please, please, please do not introduce a matrix or array class, and especially please don't return array data by value. > > What, I think would work the best is having simple methods to access the array data and have it copied into a user provided buffer, something like > > int len = obj->getArrayLen(); > double* myData = new double[len]; > obj->getArrayData(myData); > > If it were on the return by value, something like > > vector<double> data = obj->getArrayData(); > > this would result in a huge number of memory allocations and data copies that could easily be avoided if the data were just copied once into a user provided buffer. > > > On Oct 21, 2014, at 1:46 PM, Devin Sullivan wrote: > > > I will also voice a vote for option #2. > > On Fri, Oct 17, 2014 at 7:14 PM, Samuel Friedman <sam...@ca...> wrote: > I agree with what Paul has said. If you're going to do compression, you want to do it once and not multiple times so I would vote for path #2. There are three reasons why you really don't want to go down route #3: > > 1) Floating point numbers don't compress well generally because they usually have slightly different numbers and hence don't compress well as each one is different. > 2) Compression algorithms tend to work better on larger chunks of data because they have more data to look at when trying to figure out what to compress. > 3) If you go to compress your SBML file after you've inserted your compressed floating point numbers, you have done a double compression which is almost never worth your while. > > > Sam > > On Fri, Oct 17, 2014 at 10:13 AM, Paul Macklin <pau...@us...> wrote: > Parsing and postprocessing should be a lot easier and faster if the compression is within the XML (so the tags are still uncompressed and easy to parse), rather than enclosing the XML (so you have to decompress the whole thing prior to parsing and postprocessing / analysis). When the files are big and you have a lot of them to process, this becomes significant. > > Not that these are any of your 1-3 per se, but you do talk about sticking the whole thing into a zip file. We're shying away from that and looking towards HDF and/or XML + base64 because for 3D and multicell work, the files become pretty big and the wait for the zip/unzip process can be a pretty significant bottleneck to analyzing simulation outputs. > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Paul Macklin, Ph.D. > > Assistant Professor of Research Medicine > Center for Applied Molecular Medicine > Keck School of Medicine > University of Southern California > Los Angeles, CA > > Founder and Co-Lead of the MultiCellDS Project > MultiCellDS: http://MultiCellDS.org / @MultiCellDS > > email: Pau...@us... / Paul.Macklin@MathCancer.org > web: http://MathCancer.org > Twitter: @MathCancer > > mobile: +1 310-701-5785 > FAX: +1 323-442-2764 > > > On Fri, Oct 17, 2014 at 9:58 AM, Lucian Smith <luc...@gm...> wrote: > OK, so one of the options can obviously remain 'write the numbers as a string, store that in the XML' for readability. For compression, we have: > > 1) binary --> string (ftoa) --> compressed string (this is the existing scheme) > 2) binary --> base64 > 3) binary --> base64 --> compressed string > > Andy reports that base64 encoding of binary data is about 30% more efficient than string encoding of binary data (ftoa), and also has the advantage of being faster to process when decoding. Since ftoa results in a smaller character set (0-9,-,e,spaces), you'd recover some of that inefficiency if you compared 1) to 3), but probably not all of it. You'd also still have the slower decoding step. > > The disadvantage of 3) over 2) is that the resulting .zip file of the entire document would be slightly larger for 3) than for 2), so the question would become: what is the main purpose of encoding the data in the file this way? If it's 'smaller file size', you'd go with 2), but if it's 'less of the file I have to scroll through when reading it by hand', you'd want 3). > > Anyone have strong opinions either way? Is this worth an actual poll of the community? > > -Lucian > > On Wed, Oct 15, 2014 at 6:43 PM, Paul Macklin <pau...@us...> wrote: > Interesting! > > Perhaps a big improvement to use ieee and base64 for all numerical fields and get rid of atof? > > On Oct 15, 2014 6:29 PM, "Andy Somogyi" <and...@gm...> wrote: > A big part of the slowness comes parsing a string to float, I.e. atof. > > Plus atof does not even work the same on different platforms, and different locales throw in another complication. > > All modern processors use IEE 754 double format, so it's actually a much more stNdard format than textual formatted numbers. > > On Wednesday, October 15, 2014, Paul Macklin <pau...@us...> wrote: > Thanks, Andy. > > Out of curiosity, is that slowness from parsing complexity or from the disk read/write itself? Is it still the same bottleneck if reading/writing files on a solid state disk or ram disk? > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Paul Macklin, Ph.D. > > Assistant Professor of Research Medicine > Center for Applied Molecular Medicine > Keck School of Medicine > University of Southern California > Los Angeles, CA > > Founder and Co-Lead of the MultiCellDS Project > MultiCellDS: http://MultiCellDS.org / @MultiCellDS > > email: Pau...@us... / Paul.Macklin@MathCancer.org > web: http://MathCancer.org > Twitter: @MathCancer > > mobile: +1 310-701-5785 > FAX: +1 323-442-2764 > > > On Wed, Oct 15, 2014 at 6:08 PM, Andy Somogyi <and...@gm...> wrote: > Just store the binary array as a base64 encoded blob. > > Not only will the file size be about 30% the size of converting to strings, but it is an order of magnitude faster in terms of parsing and reading the data. > > In profiling our simulations, currently the slowest part is reading the sbml, so anything that would improve performance in this area would be very usefull. > > > On Wednesday, October 15, 2014, Paul Macklin <pau...@us...> wrote: > It sounds like #1 converts the numbers to strings in a sprintf-like fashion, and then compresses this string (to another string). > > It sounds like #2 would directly compress the numbers (in their native binary format), then encode the compressed output as text (e.g., via base64) > > I was wondering what you thought of a (#1/#2)': encode the doubles/floats/whatever to text via base64 first, compress this, then store the resulting text in the data field. > > Thanks -- Paul > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Paul Macklin, Ph.D. > > Assistant Professor of Research Medicine > Center for Applied Molecular Medicine > Keck School of Medicine > University of Southern California > Los Angeles, CA > > Founder and Co-Lead of the MultiCellDS Project > MultiCellDS: http://MultiCellDS.org / @MultiCellDS > > email: Pau...@us... / Paul.Macklin@MathCancer.org > web: http://MathCancer.org > Twitter: @MathCancer > > mobile: +1 310-701-5785 > FAX: +1 323-442-2764 > > > On Wed, Oct 15, 2014 at 4:23 PM, Lucian Smith <luc...@gm...> wrote: > OK, let me see if I can summarize the issues about compression, and ask people's opinions moving forward: > > As things stand right now, the spec itself is a little vague on how compression works. This obviously needs to be updated, but we should make sure we know what we want, first. > > The libsbml implementation of compression (and used by Frank and Jim) works by compressing a *string* of numbers into a format that can be written into an XML file safely (I still don't know which one, but let's assume that this, at least, doesn't need to be changed). This is why Frank is concerned about the delimiter or lack thereof: all spaces, delimiters, etc. are getting compressed along with everything else. > > The big advantage of this system is that it's implemented. > > The disadvantage of this system is that it's fairly inefficient, mostly because encoding a number as a string is inefficient to start with. > > So that's option #1: keep things as they are implemented now, with possible tweaks for delimiters, etc. > > > For option #2, we could compress the arrays of numbers directly, and encode that compression in the same way in the XML. This would have the advantage of being more compressed, but has the disadvantage of not being implemented yet. > > > For option #3, we could ditch compression entirely, and rely instead on our ability to compress the entire SBML document instead (libsbml has built-in features that let it read and write to compressed documents). This would actually result in smaller files if the numbers were all written out than if those number strings were compressed first a la option #1. This disadvantage of this system is that it makes the files really big, and therefore harder to read/debug the parts that *aren't* huge arrays of numbers. > > As far as delimiters go, it seemed to me that the simplest option would be to allow a ';' delimiter wherever people wanted it, and to remove it for compression. The order of numbers and their meaning would be precisely defined in the spec, so that special delimiters (besides the space between the numbers themselves) were not strictly needed, but could be provided for readability. > > Also, keep in mind that if the size of the file itself is an issue, the entire file can be compressed, not just these strings of numbers. The point of compressing the numbers inside the XML file is (I believe) so that the *rest* of the file is easier to view manually. > > -Lucian > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > sbml-spatial mailing list > sbm...@li... > https://lists.sourceforge.net/lists/listinfo/sbml-spatial > > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > sbml-spatial mailing list > sbm...@li... > https://lists.sourceforge.net/lists/listinfo/sbml-spatial > > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > sbml-spatial mailing list > sbm...@li... > https://lists.sourceforge.net/lists/listinfo/sbml-spatial > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > sbml-spatial mailing list > sbm...@li... > https://lists.sourceforge.net/lists/listinfo/sbml-spatial > > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > sbml-spatial mailing list > sbm...@li... > https://lists.sourceforge.net/lists/listinfo/sbml-spatial > > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > sbml-spatial mailing list > sbm...@li... > https://lists.sourceforge.net/lists/listinfo/sbml-spatial > > > > > -- > Dr. Samuel H. Friedman > University of Southern California Postdoctoral Scholar - Research Associate > Center for Applied Molecular Medicine Keck School of Medicine > Email: sam...@ca... Phone: 323-442-2531 > 2250 Alcazar St Rm 259 Los Angeles, CA 90033 > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > sbml-spatial mailing list > sbm...@li... > https://lists.sourceforge.net/lists/listinfo/sbml-spatial > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho_______________________________________________ > sbml-spatial mailing list > sbm...@li... > https://lists.sourceforge.net/lists/listinfo/sbml-spatial > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho_______________________________________________ > sbml-spatial mailing list > sbm...@li... > https://lists.sourceforge.net/lists/listinfo/sbml-spatial |