We have a shiny new receive-buffer-size config option that sets the size of the buffer Privoxy uses to receive data from the operating system.
But how to decide what size to use? Maybe by keeping track of how much data was read at a time.
note well - this is supposed to be a conversation starter, not a final patch set.
http://config.privoxy.org/show-status
has a new section for the stats - eg:
TCP read buffer size statistics: Range count cum % run length 0- 999: 1497 0.34 218 1,000- 1,999: 123384 28.65 22215 2,000- 2,999: 127019 57.78 26377 3,000- 3,999: 54 57.80 0 4,000- 4,999: 87966 77.97 12485 5,000- 5,999: 43747 88.01 209 6,000- 6,999: 52 88.02 0 7,000- 7,999: 42051 97.67 12 8,000- 8,999: 9573 99.86 8 9,000- 9,999: 1 99.86 0 10,000- 19,999: 61 99.88 0 20,000- 29,999: 12 99.88 0 30,000- 39,999: 24 99.89 0 40,000- 49,999: 24 99.89 0 50,000- 59,999: 12 99.89 0 60,000- 69,999: 25 99.90 0 70,000- 79,999: 17 99.90 0 80,000- 89,999: 17 99.91 0 90,000- 99,999: 3 99.91 0 93440: 403 100.00 251
count is how many times <Range> bytes were read
cum % is the cumulative percentage - in the above example, over 99% of the reads returned less than 10,000 bytes.
run length is how many consecutive times the same number of bytes were read; in this case, there were 251 consecutive calls to read_socket that returned 93,440 bytes.
Maybe I should set receive-buffer-size larger.
Lee
This seems overly complicated to me.
How many users do you expect to interpret these statistics correctly?
As previously mentioned on the mailing list I think the buffer
size should eventually be auto-tuned within specified limits.
In the meantime we could simply use a value that is expected to
work reasonably well.
curl's buffer size was recently changed to 100k and we could
use the same value until somebody comes up with data that
shows that another value is superior.
For most Privoxy setups 100k is probably more than necessary
to get the highest throughput, but it seems unlikely to cause any
problems.
On 8/16/17, Fabian Keil fabiankeil@users.sf.net wrote:
Possibly none - which includes me :)
Correctly interpreting the stats requires knowledge of what came in
'over the wire' which means clearing the stats, starting
tcpdump/wireshark/whatever and then doing the download/browsing/etc.,
looking at the stats, looking at the packet capture ... which gets old
real fast, but is the only way I know of to tell the difference
between privoxy not being able to keep up & having selective acks
enabled, a lossy connection & a single packet that fills in a hole all
of a sudden creating 100KB (or more!) of data for the o/s to hand off
to privoxy.
What does seem generally useful is the run length stat for the max
buffer size. If privoxy reads the max buffersize bytes for more than
[50? 100? pick a suitably large number] times in a row that seems to
be a pretty good indication the receive-buffer-size should be
increased.
Reporting a single number (# consecutive max buffersize reads) might
be the better thing to show users instead of the whole table, but the
table is kind of interesting so I left it in :)
I finally was able to put my finger on why I didn't like the analogy
to add_to_iob.
By definition, the buffer scaling in add_to_iob will never create a
uselessly large buffer.
The user says 'I want to filter up to X number of bytes' and privoxy
will dutifully filter up to X number of bytes. So increasing the iob
buffer near the end of a page isn't a waste of time, but increasing
the read buffer near the end of a download might very well be.
Every so often I wish we could talk face to face. Telling users to
use an unencrypted channel to download code or programs is something I
think should be fixed. But this.. this was just a fun project for me.
I learned a few things, had a good time playing coder & ended up with
something I thought might be useful to others. If anything like this
patch goes into privoxy - fine. If it doesn't - also fine ... or
maybe even better. I wasn't looking forward to writing the
documentation for TCP read buffer size statistics :)
Lee
Related
Patches: #139