Re: [Algorithms] sparse bitset compression From: Colt McAnlis - 2014-03-08 15:13:23 Attachments: Message as HTML ```Why not just use industry standard Arithmetic Compression? If you calculate the estimated entropy for that data set ( http://planetcalc.com/2476/) you end up with about H=0.06 for 7k set bits in a 1024*1024 stream, and H=0.01 for 1k set bits. That being entropy (or minimum bits per symbol) should yield 70kb and 10k after compression, respectively. Or, with AC, and those sparse values, you can get around 94% compression and 99% compression. (if my early morning math is right) AC is a known algorithm, easy examples to find on the web. And as far a serialization is concerned, it's pretty straight forward to just encode your bits, then decode; no extra special data structures needed. ~Main On Fri, Jan 31, 2014 at 3:27 PM, Richard Fabian wrote: > there was a lot of similarity, but with no processng at the destination, > we decided that the Golomb-rice algorithm was good enough. Dropped our data > by 70%. That was enough of a saving for this one area. > > > On 31 January 2014 21:32, Jon Watte wrote: > >> Encode a single starting value at full bandwidth then send a stream of >>> differences: >> >> >> Which is equivalent to a wavelet "lift" transform :-) >> >> Back to the original question: If you have a previous save game, is there >> lots of similarity in the next save? If so, can you save a delta instead? >> If not, what solution did you end up choosing? >> >> Sincerely, >> >> jw >> >> >> >> >> >> >> Sincerely, >> >> Jon Watte >> >> >> -- >> "I find that the harder I work, the more luck I seem to have." -- Thomas >> Jefferson >> >> >> On Tue, Jan 21, 2014 at 12:08 AM, Robin Green wrote: >> >>> >>> Here's a modern Run Length version of Golomb-Rice encoding designed for >>> compressing general data with a Generalized Gaussian distribution. Works >>> best if you can prove there are, in most cases, small differences between >>> adjacent data (but handily you get to define what "adjacent" means to your >>> stream). Encode a single starting value at full bandwidth then send a >>> stream of differences: >>> >>> https://research.microsoft.com/pubs/102069/malvar_dcc06.pdf >>> >>> As a bonus, if you're encoding depth images, you can remap the values to >>> leave zero as an out-of-band value: >>> >>> http://www.charlesneedham.com/pubs/153971/depthcode-final.pdf >>> >>> Yes, I know that wasn't the question you asked but at least it's an >>> algorithm. :-) >>> >>> - Robin Green >>> >>> >>> >>> On Fri, Jan 17, 2014 at 10:04 AM, Alex Walters wrote: >>> >>>> Its on the fringe of being useful, but one thing you could look at to >>>> reduce your data size is Exponential Golomb coding ( >>>> http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in >>>> H.264 entropy encoding among other things. >>>> >>>> Instead of writing a full n-bit values for every number you store, it >>>> produces a bit stream of codes, using far less bits for small numbers - >>>> take a look at the wiki page, its pretty simple to see whats going on when >>>> you look at the example. It can reduce the amount of data you have to move, >>>> and from what I remember from when I worked with video, it produces much >>>> more predictable (and compressible) stream of bits for the next compression >>>> scheme along (variable length encoding, or arithmetic encoding in the case >>>> of H.264) - I've not looked into the details but could probably improve the >>>> compression of the stream you get through gzip. >>>> >>>> Regards, >>>> >>>> Alex >>>> >>>> >>>> >>> >>> ------------------------------------------------------------------------------ >>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>> Critical Workloads, Development Environments & Everything In Between. >>> Get a Quote or Start a Free Trial Today. >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> GDAlgorithms-list mailing list >>> GDAlgorithms-list@... >>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >>> Archives: >>> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >>> >> >> >> >> ------------------------------------------------------------------------------ >> WatchGuard Dimension instantly turns raw network data into actionable >> security intelligence. It gives you real-time visual feedback on key >> security issues and trends. Skip the complicated setup - simply import >> a virtual appliance and go from zero to informed in seconds. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDAlgorithms-list@... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > > -- > fabs(); > "The fact that an opinion has been widely held is no evidence whatever > that it is not utterly absurd." - Bertrand Russell > > > ------------------------------------------------------------------------------ > WatchGuard Dimension instantly turns raw network data into actionable > security intelligence. It gives you real-time visual feedback on key > security issues and trends. Skip the complicated setup - simply import > a virtual appliance and go from zero to informed in seconds. > > http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk > _______________________________________________ > GDAlgorithms-list mailing list > GDAlgorithms-list@... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- == Colt "MainRoach" McAnlis Graphics Programmer ```