Re: [MooseFS-Users] Real experiences from real users

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> On 05/22/2018 02:36 PM, Gandalf Corvotempesta wrote:
> > Il giorno mar 22 mag 2018 alle ore 19:28 Marin Bernard <lists@oliva
> > rim.com>
> > ha scritto:
> > > So does Proxmox VE.
> > 
> > Not all server are using proxmox.
> > Proxmox repackage ZFS on every release, because they support it.
> > If you have to mantain multiple different system, using DKMS is
> > more prone
> > to error
> > than without. A small kernel upgrade could break everything.
> > 

Yes. You may use Proxmox, Ubuntu, FreeBSD or even build your own
kernel.

> > > That's a myth. ZFS never required ECC RAM, and I run it on boxes
> > > with
> > > as little as 1GB RAM. Every bit of it can be tuned, including the
> > > size
> > > of the ARC.
> > 
> > Is not a myth. Is the truth. ECC RAM is not required to run ZFS,
> > but you
> > won't be sure
> > that what are you writing to disks (and checksumming) is exactly
> > the same
> > you received.
> > 
> > In other words, without ECC RAM you could experience in memory data
> > corruption and then
> > you will write corrupted data (with a proper checksum), so that ZFS
> > will
> > reply with corrupted data.
> > 
> > ECC is not mandatory, but highly suggested.
> > Without ECC you'll fix the bit-rot, but you are still subject to
> > in-memory
> > corruption,
> > so, the original issue (data corruption) is still unfixed and ZFS
> > can't do
> > nothing if data is
> > corrupted before ZFS.

Yes, I know that. However, you seemed to imply that ECC was a
requirement. I'm sorry if I misunderstood.

Of course, ECC memory is a must-have; I see no reason for not using it.

> > > Checksumming and duplication (ditto blocks) of pool metadata are
> > > NOT
> > > provided by the master. This is a much appreciated feature when
> > > you
> > > come from an XFS background where a single urecoverable read can
> > > crash
> > > an entire filesystem. I've been there before; never ever!
> > 
> > At which pool metadata are you referring to ?

All. ZFS stores double or triple copies of each metadata block (it
depends on the type of the metadata). Corrupted metadata blocks *will*
be corrected, even in single-disk setups.

> > Anyway, I hate XFS :-) I had multiple failures......
> > 
> > > MooseFS background verification may take months to check the
> > > whole
> > > dataset
> > 
> > True.
> > 
> > > ZFS does scrub a whole chunkserver within a few hours, with
> > > adaptive, tunable throughput to minimize the impact on the
> > > cluster.
> > 
> > Is not the same.
> > When ZFS detect a corruption, it does nothing without a RAID. it
> > simply
> > discard data
> > during a read. But if you are reading a file, MooseFS will check
> > the
> > checksum automatically
> > and does the same.

Actually, ZFS keeps a list of damaged files. So in case of damaged
blocks, you may:

 * Stop the chunkserver
 * List and remove damaged chunk files
 * Restart the chunkserver

The mfschunkserver daemon will rescan chunk files and the master will
soon be aware that a chunk is missing, and trigger a replication.

This is easy to automate with a simple script.

> Assuming that you have minimum of 2 copies in MooseFS, it will read,
> detect
> and read from second copy and will heal the first copy.
> So, I don't know what you mean exactly by "does the same" but
> it is not the *same*
> 
> 
> > 
> > Anyway, even if you scrub the whole ZFS pool, you won't get any
> > advantage,
> > ZFS is unable
> > to recover by itself (without raid) and MooseFS is still unaware of
> > corruption.
> 
> MooseFS will be *aware* of the corruption during the read and will
> self heal
> as I explained above. (Or during the checksum checking (native scrub)
> loop,
> whichever comes first.)
> 
> > 
> > Ok, chunk1 is corrupted, ZFS detected it during a scrub. And now ?
> > ZFS doesn't have any replica to rebuild from.
> > MooseFS is unaware of this because their native scrub takes months
> > and
> > no one is reading that file from a client (forcing the checksum
> > verification).
> 
> You seem to be making these constant claims about "native scrub
> taking months",
> but I believe it was explained in earlier emails that this will
> depend on your
> hardware configuration.

AFAIK, you can't scrub faster than 1 chunk/sec. per chunkserver. I you
own 12 servers, they'll 12 chunks/sec. = 720 chunks/min. = 43,200
chunks/hour = 1,036,800 chunks/day.

If you have 50,000,000 chunks, it would take roughly 50 days to have
them checked at this rate, which would probably put the cluster on its
knees. If you scan at a more reasonable rate of 3 chunks/sec, it rises
to 150 days. So that's not a claim; that's a fact.

> I believe there was another email which basically said
> this "native scrub speed" was much improved in version 4.
> So I think it is fair to say that you should stop repeating this
> "native scrub takes months" claim,
> or if you are not going to stop repeating it, at least put some
> qualifiers around it.
> Or download v4, and see if the speed improved...
> 

I do know that v4 improves on this point, but it not yet production
ready. I won't be mentioning it until it is released.

Re: [MooseFS-Users] Real experiences from real users

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [MooseFS-Users] Real experiences from real users