From: WK <wk...@bn...> - 2018-05-21 18:51:55
|
On 5/21/2018 11:07 AM, Gandalf Corvotempesta wrote: > Il giorno lun 21 mag 2018 alle ore 19:05 WK <wk...@bn...> ha scritto: >> We switched to Gluster for the VMs and have been happy with that. > Brrrrrrr > There was a well-known corruption bug open for about 2 years! yeah, but only if you are using shards, distributed and add capacity requiring a rebalance. We use simple replication and do rip and replace upgrades if we need more capacity because its really easy to live migrate to the newer storage from the older. Yes, it was annoying that they (RedHat) seemed to ignore the issue and their fix was a "dont do that" added to the notes, when people complained. I'm not even sure if the issue is fixed yet. Last I read, they think it is fixed but haven't fully tested it yet. > >> I don't know if the newer versions fixed that, but it was ugly. >> and again it was OUR fault. > How would be possible for MFS to fix this kind of issue ? I'm just > thinking... > It's more a network issue that should be fixed on network layer rather than > on Moose. > > Maybe by adding an IP/MAC address mapping in MooseFS could be a workaround I don't know. Yes it is a network issue not a Moose issue. Your idea has merit. Maybe MFS could also detect if there is something bad happening and go RO. But its a bad thing to do with or without MFS. Its just that in MFS, there are more serious consequences then a webserver simply not being available to the outside. Again, in our case the tech fumbled fingered the IP and should have tested/reviewed things better before turning on the chunkserver daemon. We now have that box on the chunkserver checklist. |