Re: [Algorithms] sparse bitset compression

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Another Idea that might be useful is to think of your positions
dataset as a black-and white 1024*1024 bitmap (This gets harder if two
entities are allowed to occupy the same position) and compress it with
some sort of run-length-encoding, i.e. you go through each pixel line
from left to right and instead of saving 0s (for empty space) and 1s
(for occupied pixels), you save how many 0s there are until the next
1.

On 1/17/14, Alex Walters <ajw...@gm...> wrote:
> Its on the fringe of being useful, but one thing you could look at to
> reduce your data size is Exponential Golomb coding (
> http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in H.264
> entropy encoding among other things.
>
> Instead of writing a full n-bit values for every number you store, it
> produces a bit stream of codes, using far less bits for small numbers -
> take a look at the wiki page, its pretty simple to see whats going on when
> you look at the example. It can reduce the amount of data you have to move,
> and from what I remember from when I worked with video, it produces much
> more predictable (and compressible) stream of bits for the next compression
> scheme along (variable length encoding, or arithmetic encoding in the case
> of H.264) - I've not looked into the details but could probably improve the
> compression of the stream you get through gzip.
>
> Regards,
>
> Alex
>
>
> On Fri, Jan 17, 2014 at 10:14 AM, Richard Fabian <ra...@gm...> wrote:
>
>> I would question this assumption:
>>>
>>> This is too much for sending over the network regularly as part of a
>>> save.
>>>
>>>
>>>
>>
>> Well, we've got our play-through logs showing how much data was being
>> sent
>> per day per user, and this part of the save was the bit being updated the
>> most. Over a day of play it was adding up to about 5mb I think (can't
>> remember precisely now) and this made it into the top three we needed to
>> compress better to reduce the client bandwidth cost (mobile carrier data
>> limits and all that).
>>
>>
>>
>>> If I had to compress the data you talk about, I might want to look into
>>> some implicit representation, like a quad tree with filled/not nodes.
>>> Something like:
>>> "Does the current sub-area have entities? If not, store 0, and
>>> terminate.
>>> Else store 1. If the size of the sub-area is greater than one,
>>> subdivide,
>>> and recurse for each sub-quadrant."
>>> Depending on how clustered the entities are, this may compress well or
>>> poorly (but with a max 7% fill rate, it ought to at least compress
>>> somewhat.)
>>>
>>
>> I had thought about a quad tree, but I think I chose to not go that way
>> because my previous experience with them was that they're not quite as
>> efficient in practice as the should be. I can't remember why I think
>> this,
>> but I remember working through some code and thinking something along the
>> lines of "Oh, yeah. Of course" but that might have been because the data
>> involved wasn't quite as clumpy.
>>
>>
>>> Apply gzip on top of any encoding you come up with for possible bonus
>>> gains.
>>>
>>> that's part of the network layer, so no point in doing it explicitly.
>>
>>
>>> Sincerely,
>>>
>>> jw
>>>
>>>
>>> Thanks, I might try the quad tree idea again.
>>
>>
>> ------------------------------------------------------------------------------
>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> Critical Workloads, Development Environments & Everything In Between.
>> Get a Quote or Start a Free Trial Today.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>> _______________________________________________
>> GDAlgorithms-list mailing list
>> GDA...@li...
>> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
>> Archives:
>> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
>>
>