|
From: Ronciak, J. <joh...@in...> - 2007-07-17 20:57:23
|
Multiple DMAs are going to require multiple transactions on both the memory bus as well as the PCI-e bus. I'm still not clear what you are trying to do or how you are judging that the ring filling up is a problem. With fast processors these days it's not hard to overrun the transmit ring regardless of how many segments the packet is in. The buffer (and descriptors) will only drain as fast as the wire speed and topology can support. Faster link, quicker drain of packets in the buffer which frees the descriptors faster. The driver also probably has interrupt moderation on (on by default) which can also have an impact on how fast descriptors are recycled. The RX traffic can also have an impact since TX is also cleaned up on a RX interrupt (or NAPI poll). So I'm not sure what you are getting at. Cheers, John ----------------------------------------------------------- "Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety.", Benjamin Franklin 1755=20 =20 >-----Original Message----- >From: Elad Lahav [mailto:el...@uw...]=20 >Sent: Tuesday, July 17, 2007 9:10 AM >To: Ronciak, John >Cc: e10...@li... >Subject: Re: [E1000-devel] Intel GbE DMA > >Thanks for replying, John. >According to your comments, there should be no penalty for using=20 >scatter-gather I/O for sending packets, that is, for splitting the=20 >header and payload over two TX descriptors. >Nonetheless, I experience problems that seem to be attributed=20 >exactly to=20 >that: using S-G I/O slows down something (I'm not sure what),=20 >resulting=20 >in the TX ring being filled up very quickly and packets dropped. >I've collected some statistics, which suggest that the number of TX=20 >descriptors handled between two consecutive interrupts is relatively=20 >lower with S-G I/O. That is, if there was no penalty, the=20 >number should=20 >have adhered to the 2:3 ratio (2 TX descriptors if the data and header=20 >are in consecutive memory, 3 descriptors if they are split, the extra=20 >descriptor being used for HW checksumming). But the actual number is=20 >lower, which explains why the TX ring buffer fills up. > >Any thoughts? >Elad > >Ronciak, John wrote: >> Please see my comments below. >>=20 >> Cheers, >> John >> ----------------------------------------------------------- >> "Those who would give up essential Liberty, to purchase a little >> temporary Safety, deserve neither Liberty nor Safety.", Benjamin >> Franklin 1755=20 >> =20 >>=20 >>> -----Original Message----- >>> From: e10...@li...=20 >>> [mailto:e10...@li...] On Behalf=20 >>> Of Elad Lahav >>> Sent: Tuesday, July 17, 2007 6:03 AM >>> To: e10...@li... >>> Subject: [E1000-devel] Intel GbE DMA >>> >>> My previous post on scatter-gather I/O might have been a=20 >bit vague, so=20 >>> let me try again. >>> I'd like to get some information on DMA transaction performed=20 >>> by Intel's=20 >>> GbE NICs. I tried looking up the technical documentation,=20 >but couldn't=20 >>> find any details. >>> >>> 1. How much data can be transferred on a single transaction? >> I assume you mean when our controller is bus mastering the=20 >DMA? It can >> be as big as the size field can support. Since we only have buffers >> that can be as big as a jumbo frame (old drivers) or with the new >> drivers it's the size of a page since we use pages for jumbo=20 >frame. For >> normal frames it won't be bigger than 1500 bytes. >>=20 >>> 2. Is a transaction limited to a single <address, size> pair? >> No, the descriptor for most of our modern controllers can=20 >have multiple >> buffers. Like for jumbos or in the case of header split. >>=20 >>> 3. Is there a per-transaction overhead, i.e., does it matter=20 >>> if the same=20 >>> amount of data is transferred in a single transaction or in=20 >multiple=20 >>> transactions? >> Yes there is. There is always bus overhead involved in=20 >getting the DMA >> setup and started. >>=20 >>> 4. What is the effect on the CPU? I assume that the=20 >(memory?) bus is=20 >>> locked, but shouldn't that have only negligible results on=20 >modern CPUs=20 >>> with large caches? >> Not sure what you mean here. DMA's do not interact with the=20 >CPU at all. >> If the processor has the memory that is being DMA'd to in=20 >it's cache, it >> is invalidated. This is on a cache-coherent system. >>=20 >>> 5. Is there a better place to post these questions? >> Nope this is the place. >>> Thanks, >>> Elad >>> >>> >>> --------------------------------------------------------------- >>> ---------- >>> This SF.net email is sponsored by DB2 Express >>> Download DB2 Express C - the FREE version of DB2 express and take >>> control of your XML. No limits. Just data. Click to get it now. >>> http://sourceforge.net/powerbar/db2/ >>> _______________________________________________ >>> E1000-devel mailing list >>> E10...@li... >>> https://lists.sourceforge.net/lists/listinfo/e1000-devel >>> > |