Menu

How to reduce file size

Anonymous
2012-04-20
2013-04-13
  • Anonymous

    Anonymous - 2012-04-20

    Hello!
    Very cool library. Thanks.
    I have notice some strange thing. If I create 13000 rows file it's size is 10Mb. But after resaving it in MS Excel it's size reduces to 4Mb. How Excel does this and how to do it in this library?

     
  • David Hoerl

    David Hoerl - 2012-04-21

    Did you create 13000 labels? The biggest hog are strings. We do support the Shared String Table, which Microsoft aggressively uses. So, for instance,in Excel if you write the string "Hello World" to each cell, MS figures out its the same string, and only puts it once in the shared string table. Later, if someone changes one cell, it will create a second string in the table, etc.

    We don't do that. I don't know if offhand we let you specifically write the the SST one string, then link to if using Shared String objects (instead of Labels).

    That said, do you really need to have 13000 rows of strings?

    If you are curious what the differences are, you should use the libxls project also on Sourceforge, enable debugging, then dump the first xlslib .xls file, open/save with Excel, then dump again and look for differences.

    If your really do need 13000 rows of the same text - well, I probably could expose ways to create such things (if its not there now - I don't really know). The reason I added SST support was a normal text label can only have 255 characters in it (glyphs really). One user needed several thousand, so if a Label string is really long the label automatically switches to SST.

     
  • Andrey Gorbatov

    Andrey Gorbatov - 2012-04-23

    I'm topic starter. Thanks for your answer. I really need to save more than 30K different string rows.

     
  • David Hoerl

    David Hoerl - 2012-04-23

    But you didn't answer my question: are the strings all the same? If the strings are all different there should not be all that much difference in the file file sizes. If you have say 5 strings and you write then in random order, Excel will be much smaller as it will only save 5 strings then use a reference to those 5 as opposed to copying them over and over as the library does.

     
  • Andrey Gorbatov

    Andrey Gorbatov - 2012-04-23

    Some cells are different, some are the same. As I undestood excel optimized equal strings using sst.

     
  • David Hoerl

    David Hoerl - 2012-04-23

    Make sure you are re-using formats etc - you can share those. Other than that, if you want to dig deeper with libxls to find out exactly where the additional file size is coming from, and give me some direction, I'd be glad to address any real bugs. But as it stands xlslib compresses strings to 8 bit characters whenever possible, and other than SST savings know of no other deficiency in the library.

    Replicating what MS does with the SST - to dynamically add/delete strings, then update every reference, is a bit more than I want to do right now - this would be a very specialized usage. If you want to do it and give me a patch request, I'm OK with that.

     

Log in to post a comment.