Thread: [Aoetools-discuss] (probably stupid) support&data integrity questions
Brought to you by:
ecashin,
elcapitansam
From: Killer{R} <su...@ki...> - 2014-05-10 19:19:26
|
Hello aoetools-discuss, 1) Do you have forum or support is only this maillist 2) Data integrity - do somebody had any data cossuption due to AoE seems doesn't have anything to detect from-wire errors? I mean at first corrupted packets themselves (I know that in hardware ethernet check it.. usually but...) and there'is another possible problem that seems is not addressed by AoE protocol: possible packets duplication on ethernet level that can be caused for example by bad network topology, failing hardware (I've seen semi-dead network switches that flooded ethernet segment with duplicated packets) and so on? Because there possible data corruption if write-from-the-past-packet overwrites some data region. Theoretically this could be easily be fixed by using some sequence numbers, that could be known by server to be mononically increasing. Futhermore there'is 'tag' field in ATA command that seems can be used for such purpose but unfortunatelly according to documentation and vblade sources it completely ignored by server:( -- Best regards, Killer{R} mailto:su...@ki... |
From: Jesse B. <bec...@ma...> - 2014-05-11 03:16:46
|
On Sat, May 10, 2014 at 10:19:19PM +0300, Killer{R} wrote: >2) Data integrity - do somebody had any data cossuption due to AoE >seems doesn't have anything to detect from-wire errors? I mean at >first corrupted packets themselves (I know that in hardware ethernet For the record, I've run 800+ TB worth of storage using AoE, and never lost data due to an AoE problem[1]. The hardware was commodity gear up to the storage boxes: those were SR- series hardware from Coraid. While quite old now, they have served very well over the years. [1] Stupid user mistakes? yes. Filesystem corruption from power failure? yes. Neither of those are the fault of AoE. -- Jesse Becker (Contractor) |
From: Hilko B. <be...@hi...> - 2014-05-11 22:20:02
|
* Killer{R}: > 1) Do you have forum or support is only this maillist > 2) Data integrity - do somebody had any data cossuption due to AoE > seems doesn't have anything to detect from-wire errors? I mean at > first corrupted packets themselves (I know that in hardware ethernet > check it.. usually but...) A few years back I had some data corruption issues for which I blamed insufficient network equipment (Linksys SLM-2008 gigabit switches, IIRC). Under high load, the switches would send garbage and then reboot. Every client would get its own VLAN -- untagged on the client side, tagged on the server side, so that clients didn't share a broadcast domain. Since the switch had to recalculate Ethernet checksums after tagging/untagging frames, checking the checksums would not even have helped. Replacing the switches seems to have helped, but protocol-wise, there is no protection against switches munging packets. I think that adding a checksum, at least for the payload would be an appropriate addition to the AoE protocol. > and there'is another possible problem that seems is not addressed by > AoE protocol: possible packets duplication on ethernet level that can > be caused for example by bad network topology, failing hardware (I've > seen semi-dead network switches that flooded ethernet segment with > duplicated packets) and so on? I don't see how duplicate frames could become a problem as their content is left intact. Cheers -Hilko |
From: Killer{R} <su...@ki...> - 2014-05-11 22:09:27
|
Hello Hilko, Monday, May 12, 2014, 12:41:51 AM, you wrote: >> and there'is another possible problem that seems is not addressed by >> AoE protocol: possible packets duplication on ethernet level that can >> be caused for example by bad network topology, failing hardware (I've >> seen semi-dead network switches that flooded ethernet segment with >> duplicated packets) and so on? HB> I don't see how duplicate frames could become a problem as their content HB> is left intact. Assume we have two ATA write commands, first sent at time 0, second at time 1. Both writes data at overlapping locations, but data bytes at overlapped region in second command are not same as in first one. This can happen - data on HDD not only written once and forever - it gets modified sometimes. And lets suppose we have some network hardware that will decide to send first packet one more time a bit later - say at time moment 2. So finally we will have a data corruption. Note that AFAIK ethernet standard doesn't guarantee nor delivery, nor ordering, nor counting.. Actually it doesn't guarantee anything :) So my myself I'm going to implement tags tracking since WinAoE uses monotonic incrementing that allows to leverage risk of such a problem. However AoE protocol docs says that clients can use 'tag' field to whatever they wish, so its not a generic solution - just a workaround (for my paranoia:) ) that bases on particular WinAoE driver implementation. -- Best regards, Killer{R} mailto:su...@ki... |