Re: [Algorithms] sparse bitset compression
Brought to you by:
vexxed72
From: Richard F. <ra...@gm...> - 2014-03-09 16:40:57
|
AC (and now Finite State Entropy https://github.com/Cyan4973/FiniteStateEntropy) are really good, but carry the weight of spending more time than we had to do the task. Goulomb coding the distance between high bits was a great, simple to implement solution for us. Another issue was the compression time. It needed to be quick, and AC is not known to be fast. In future, I hope to use FSE for almost all my lossless coders. On 8 March 2014 15:13, Colt McAnlis <mai...@gm...> wrote: > Why not just use industry standard Arithmetic Compression? > > If you calculate the estimated entropy for that data set ( > http://planetcalc.com/2476/) you end up with about H=0.06 for 7k set bits > in a 1024*1024 stream, and H=0.01 for 1k set bits. > That being entropy (or minimum bits per symbol) should yield 70kb and 10k > after compression, respectively. Or, with AC, and those sparse values, you > can get around 94% compression and 99% compression. (if my early morning > math is right) > > AC is a known algorithm, easy examples to find on the web. And as far a > serialization is concerned, it's pretty straight forward to just encode > your bits, then decode; no extra special data structures needed. > > ~Main > > > On Fri, Jan 31, 2014 at 3:27 PM, Richard Fabian <ra...@gm...> wrote: > >> there was a lot of similarity, but with no processng at the destination, >> we decided that the Golomb-rice algorithm was good enough. Dropped our data >> by 70%. That was enough of a saving for this one area. >> >> >> On 31 January 2014 21:32, Jon Watte <jw...@gm...> wrote: >> >>> Encode a single starting value at full bandwidth then send a stream of >>>> differences: >>> >>> >>> Which is equivalent to a wavelet "lift" transform :-) >>> >>> Back to the original question: If you have a previous save game, is >>> there lots of similarity in the next save? If so, can you save a delta >>> instead? >>> If not, what solution did you end up choosing? >>> >>> Sincerely, >>> >>> jw >>> >>> >>> >>> >>> >>> >>> Sincerely, >>> >>> Jon Watte >>> >>> >>> -- >>> "I find that the harder I work, the more luck I seem to have." -- >>> Thomas Jefferson >>> >>> >>> On Tue, Jan 21, 2014 at 12:08 AM, Robin Green <rob...@gm...>wrote: >>> >>>> >>>> Here's a modern Run Length version of Golomb-Rice encoding designed for >>>> compressing general data with a Generalized Gaussian distribution. Works >>>> best if you can prove there are, in most cases, small differences between >>>> adjacent data (but handily you get to define what "adjacent" means to your >>>> stream). Encode a single starting value at full bandwidth then send a >>>> stream of differences: >>>> >>>> https://research.microsoft.com/pubs/102069/malvar_dcc06.pdf >>>> >>>> As a bonus, if you're encoding depth images, you can remap the values >>>> to leave zero as an out-of-band value: >>>> >>>> http://www.charlesneedham.com/pubs/153971/depthcode-final.pdf >>>> >>>> Yes, I know that wasn't the question you asked but at least it's an >>>> algorithm. :-) >>>> >>>> - Robin Green >>>> >>>> >>>> >>>> On Fri, Jan 17, 2014 at 10:04 AM, Alex Walters <ajw...@gm...>wrote: >>>> >>>>> Its on the fringe of being useful, but one thing you could look at to >>>>> reduce your data size is Exponential Golomb coding ( >>>>> http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in >>>>> H.264 entropy encoding among other things. >>>>> >>>>> Instead of writing a full n-bit values for every number you store, it >>>>> produces a bit stream of codes, using far less bits for small numbers - >>>>> take a look at the wiki page, its pretty simple to see whats going on when >>>>> you look at the example. It can reduce the amount of data you have to move, >>>>> and from what I remember from when I worked with video, it produces much >>>>> more predictable (and compressible) stream of bits for the next compression >>>>> scheme along (variable length encoding, or arithmetic encoding in the case >>>>> of H.264) - I've not looked into the details but could probably improve the >>>>> compression of the stream you get through gzip. >>>>> >>>>> Regards, >>>>> >>>>> Alex >>>>> >>>>> >>>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>>> Critical Workloads, Development Environments & Everything In Between. >>>> Get a Quote or Start a Free Trial Today. >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> GDAlgorithms-list mailing list >>>> GDA...@li... >>>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >>>> Archives: >>>> >>>> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> WatchGuard Dimension instantly turns raw network data into actionable >>> security intelligence. It gives you real-time visual feedback on key >>> security issues and trends. Skip the complicated setup - simply import >>> a virtual appliance and go from zero to informed in seconds. >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk >>> >>> _______________________________________________ >>> GDAlgorithms-list mailing list >>> GDA...@li... >>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >>> Archives: >>> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >>> >> >> >> >> -- >> fabs(); >> "The fact that an opinion has been widely held is no evidence whatever >> that it is not utterly absurd." - Bertrand Russell >> >> >> ------------------------------------------------------------------------------ >> WatchGuard Dimension instantly turns raw network data into actionable >> security intelligence. It gives you real-time visual feedback on key >> security issues and trends. Skip the complicated setup - simply import >> a virtual appliance and go from zero to informed in seconds. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > > -- > == > Colt "MainRoach" McAnlis > Graphics Programmer > > > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to > Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN optimization and > the > freedom to use Git, Perforce or both. Make the move to Perforce. > > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- fabs(); "The fact that an opinion has been widely held is no evidence whatever that it is not utterly absurd." - Bertrand Russell |