(yes, I tried trimming the quote, but it almost all seems vaguely
At risk of embarrassing myself (again) by showing my ignorance ...
Mike, you said this: "Disk becomes the major bottleneck on the faster
Doesn't FTP know the filesize of file-to-be-downloaded ahead of time?
If so, then perhaps you can try Eric's idea:
* create/open, seek to end-1 (in the empty output file), write a
single byte, close, reopen
This apparently avoids having to update the FAT over and over again
redundantly. See the following thread for some examples (esp. nidud's
comment about Doszip: "This reduced the compression time from 455 to
I may be seriously off-base here (no surprise), but I felt I should
mention it "just in case"!
On 7/7/11, Michael B. Brutman <mbbrutman@...> wrote:
> On 7/6/2011 7:10 PM, Bernd Blaauw wrote:
>> Op 7-7-2011 1:32, Michael B. Brutman schreef:
>>> mTCP FTP compares poorly to the native stack and FireFox there, but FTP
>>> is working in a very limited environment:
>>> * The TCP/IP socket receive buffer is tiny compared to the native
>>> network stack
>>> * You are doing filesystem writes 8KB at a time
>>> * You have a small virtualization penalty
>>> * The packet interface wasn't designed for high performance; every
>>> incoming packet requires a hardware interrupt and two software
>> 8KB filesystem writes? odd. So it's:
>> 1) download/transfer 8KB (8KB transfer buffer)
>> 2) halt download, dump transfer buffer to disk and clear it
>> 3) continue downloading.
> Not so odd. All comm code fills buffers and then processes the
> buffers. Unless you have a multi-core system you are always halting the
> processing of TCP/IP protocol handling to do your disk writes. Modern
> OSes with DMA support hide some of that by letting the DMA controller of
> the disk (and possibly the Ethernet controller if so equipped) do the
> byte copying work. But in the absence of DMA the host CPU does
> everything, and does it in a single threaded manner.
>> Easier at least compared to having a 8KB transfer buffer plus a 'huge'
>> receive buffer (nearly size of all of machine's conventional memory, a
>> multiple of 8KB?) followed by only clearing the buffer if it's full or a
>> file has been downloaded completely (whichever comes first). Your single
>> buffer might be more efficient compared to transfer buffer plus receive
> In this environment we are entirely single threaded, except for the
> hardware buffering that happens on the Ethernet card. To receive a
> packet the path looks like this:
> - the card receives and buffers the frame from the wire
> - the card signals a hardware interrupt
> - the packet driver responds and either interrogates the card or copies
> the contents of the frame
> - the packet driver makes an upcall to the TCP/IP code
> - the TCP/IP code either provides a buffer or says 'no room'
> - the packet driver makes a second upcall to let the TCP/IP code know
> the frame is copied
> - the interrupt ends and the interrupted code resumes
> - the packet must now go through IP and TCP protocol processing
> The buffering scheme works at three levels:
> - Raw packet buffers (20 at 1514 bytes)
> - TCP receive buffering (8KB)
> - File read/write chunk size (8KB)
> Raw packet buffers are used by the packet driver directly. They are the
> critical resource; if you run out of those you start dropping frames
> coming in from the wire. TCP buffering is designed to pull data from
> those packet buffers as quickly as possible so that they may be
> recycled. (In the case where you have a lot of small incoming packets
> that is really critical because every incoming packet is allocated 1514
> bytes whether it needs it or not.) The TCP buffer is organized as a
> ring buffer so it is more space efficient.
> The application reads from the TCP buffer and writes to the filesystem.
> All of this is still single threaded and for most systems the bottle
> neck is the disk access time, not the copying of data from multiple
> buffers. At a minimum all reads and writes to the filesystem should be
> done in multiples of 512 bytes; anything less requires DOS to do a
> read/modify/write as it writes data to the blocks of the filesystem.
> 1KB reads and writes were very inefficient to do - after some
> experimenting I found that 8KB was good. 16 or 32KB are marginally better.
> The buffer sizes generally are not larger because larger does not make
> that much of a difference in the performance of the filesystem writes
> and does have a negative impact on buffering. Long writes delay TCP
> protocol processing, causing incoming buffers to run low and delaying
> the sending of ACK packets for the received data. The major opportunity
> for improving performance is to send the ACK for the packet as quickly
> as possible, right after TCP goes through protocol processing but before
> the application tries to read the receive buffer and empty it. Getting
> that ACK out early keeps the flow of data constant and hides some of the
> latency the 'receive data/write data' cycle that is occurring.
> Another optimization I could make to this would be to have the FTP
> application handle the raw packet buffers directly. TCP would continue
> to do protocol processing, but instead of copying the data to a TCP
> receive buffer it would just give FTP the raw packets and let FTP do the
> copying. This was the original design and the first netcat code used
> this technique. The technique removes some memcpy overhead, but the
> largest overhead comes from the filesystem write. It made the end
> application code (FTP) more complex and error prone, and had a nasty
> habit of starving the packet driver for buffers if the disk write was
> too large.
> Most of my testing is on lower-end machines, like a 386-40 and the
> various 8088 machines that I have. The performance of memcpy is far
> better on the newer processors due to pipeline efficiency and levels of
> caching. (Even the 386-40 has a 128K L2 cache.) Disk becomes the major
> bottleneck on the faster machines.