Adi,

Thank you for the response.

Yeah, I'm fully aware of the WEP problems with 802.11 (my former life was developing the first 802.11b chipsets... which ended up being used to crack WEP).

Is the SATA 32bit CRC also transmitted in the AoE packet over the Ethernet? Or are they just used for the immediate SATA cable error detection? Looking at vblade code makes me think that there is no additional SATA CRC being sent over the Ethernet. 

To be honest, I'm not worried about the errors that the Ethernet card detects. It is the errors that the card doesn't detect. I agree that CRC32 is fairly powerful but there are classes of errors that it won't be able to detect or will have a reduced chance of detecting and this is where a protocol level protection comes into play. Remember that TCP even has a simple checksum which in the past folks tried to turn off but quickly discovered that it was a bad idea.

Let me clarify my question a bit. There are points on the network where errors can be introduced to the packet stream. These includes switches and routers and such and may be simply hardware errors that introduce the errors to the packet. If the error gets introduced at this point (not maliciously but by a hardware error) then it can be outside of the application of the Ethernet CRC and thus not caught by the Ethernet hardware. This is where the underlying protocol being carried should include some sort of error detection. The current definition of AoE does not include any error detection so I'm trying to understand how this protocol can be used in an enterprise style environment if there is a chance of corrupted data being written to the target?

See the paper I referenced earlier about various sources of network errors: http://portal.acm.org/citation.cfm?id=347561&dl=ACM&coll=DL&CFID=110304004&CFTOKEN=23505194 . 

I'm hoping maybe someone from Coraid could respond to this query since they seem to be the only company selling enterprise quality AoE gear.

David

On Mon, Nov 29, 2010 at 4:19 AM, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote:
David,

> I've been looking at AoE and I'm trying to understand what affect the Ethernet
> CRC-32 data integrity checking has on the AoE communications? Particularly when
> going to GbE and jumbo frames support there seems to be some data out there
> that there is a chance that the CRC32 won't detect the error in the frame and
> with a protocol like AoE that error would most probably end up being written to
> the target disk.
First of, it is important to state that checksums like CRC32 are only
helpful in detecting "wire" errors -- they do not protect against
intentional modification (this was one of the design problems with WEP --
for such purposes cryptographic hash functions are required).

Now to the powers of CRC32: a 32bit CRC is able to detect any single error
burst that is no longer than 32bit. The length of the datagram does not
matter in this case. For all error bursts longer than 32bit it will only
detect a fraction of them: 99.999999976716936% (1-2^-32).

The ATA protocol uses checksums as well: Ultra-ATA introduced a checksum on
the data transmission and SATA introduced a 32bit CRC for all bits
transmitted over the wire. Besides that, for every block written on a disk,
a parity will be stored as well.

To sum it up: check your network cards for frame errors and your switches
as well. When you see errors, act. The risk is IMHO low.

> I think that there is a further problem to understand and that is with network
> connection points. AoE is not routable but that doesn't mean you can't use
> network switches to interconnect initiators with their AoE targets and at these
> switches there seems to be a possible error point introduced which AoE isn't
> protecting against? Are there best practices for AoE installations to protect
> against these error points?
I think I did not get the question. What error point do you introduce using
a switch?

-- Adi