Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#2 make gzip compression level configurable

open
nobody
None
5
2002-09-19
2002-09-19
Michael Schröpl
No

(originally posted on the mod_gzip mailing list,
http://lists.over.net/pipermail/mod_gzip/2001-August/005492.html\)

[Mod_gzip] mod_gzip: "Level" of gzip algorithm /
trade-off between cpu load and bandwidth savings

This is about the effect of the numerical level of the gzip algorithm, which can be something from 0 (no
compression) to 9 (slow). mod_gzip uses level 6, as I read in the mailing list archive.

My simple question would be: "Why?"

I took a random, but typical HTML document x (some online manual
piece of our product with lots of redundant CSS ...) of 10097
bytes, and typed in via commandline:
gzip -1 x -> compressed to 1698 bytes (83.18%)
gzip -d x; gzip -2 x -> compressed to 1693 bytes (83,23%)
gzip -d x; gzip -3 x -> compressed to 1684 bytes (83,32%)
gzip -d x; gzip -4 x -> compressed to 1684 bytes (83,32%)
gzip -d x; gzip -5 x -> compressed to 1685 bytes (83,31%)
gzip -d x; gzip -6 x -> compressed to 1685 bytes (83,31%)
gzip -d x; gzip -7 x -> compressed to 1676 bytes (83,40%)
gzip -d x; gzip -8 x -> compressed to 1676 bytes (83,40%)
gzip -d x; gzip -9 x -> compressed to 1676 bytes (83,40%)

Just to be sure that I do understand what I'm doing, a look into
my mod_gzip log:
153.46.90.209 - - [23/Aug/2001:17:26:34 +0200] "finxs-dhtml GET
/hilfe/translationuserstylemanagement.html HTTP/1.0" 200 1685
mod_gzip: OK In:10097 -> Out:1685 = 84 Prozent text/html
^^^^^^^^^^ ^^^^^^^^
So at least I am in the right universe.
(Oops - mod_gzip is rounding up the percentage numbers ... you rather shouldn't do this, to be accurate.)

In this special case I would have been better off with level 3, which might also consume less CPU time
(as I would believe). Even the results of level 1 look impressive (everything beyond this just grabs for one
more percent). But mod_gzip does not allow a level below 3, so you must have reasons for this.

You could do several nice things for me:
a) post some URL of a description what these levels mean. Some RFCs 1951/2 describe the file format
only, but not the algorithm. I did Google around for a while but didn't get the right search terms.
b) write something about why level 6 seems to be the one you like more than others, and maybe some
guess about CPU time consumption as function of compression level (you seem to have made tests
about this?). I remember the archive entry where someone made a performance test and got factor 10 of
requests between mod_gzip online compression and mod_gzip content negotiation with compressed
static files ... something like these numbers, maybe?
c) write something about which types of documents might profit most from which level of compression
how much, if possible. What must a document look like to need gzip level 6 instead of just 3?

mod_gzip allows a degree of freedom here which would allow the user to trade in more or less
compression for less or more CPU power - provided he knows what he is doing. Exactly this is the
goal of my post - trying to understand the meaning of this gzip level within the context of mod_gzip, i. e.
in a Web Content environment (mostly HTML, maybe XML later).
I wonder why there is no configuration parameter "mod_gzip_level" (integer, 3-9, default 6), which might
have been easy to implement - you see no need for it? If there were one, and some traffic analysis tool
(like the thingy I have started to write), one might simply set some compression level in httpd.conf, run
the site for a day or two and look at the statistics whether it might be worth to stay at this level. It's
mostly about usability once again. ;-)

If I had some easy way of modifying the compression level I might even write a site specific benchmark,
maybe convert some existing access_log into a script requesting these URLs via HTTP/1.1 and
Accept-Encoding, to have comparable results.
Having to modify the source code for this one might be the way *I* could still use, but tell this to the
Win32 users with DLL only ... after all, Win32 doesn't just ship with a C compiler.

I am asking basicly because a friend of mine tried to use mod_gzip on a rather small server (AMD
K6/700) producing a *lot* of traffic (100+GB a month) and running time consuming CGI scripts (like XML
parsing or full text phrase search), but got into CPU trouble when turning on compression (the server is
too small anyway, but the problem is here and now).
I told him to try to recompile mod_gzip with a level of 3 and test ... but it is only guessing from my side
until now, I just rembered that 3 is the minimum value but could not tell why.

Discussion

  • Logged In: NO

    It would really be a nice configuration option, I don't
    understand why this hasn't already been implemented..

     
  • Logged In: NO

    Why can't you save the hole webpage already compressed?
    why you have to recompress the page by mod-gzip for each
    user?