From: Krzysztof K. <krz...@mo...> - 2014-10-31 17:54:53
|
Hi, In MooseFS CE there is no Follower server, that's why this fix only applies to Pro version. // ****************** Best Regards, Krzysztof Kielak Dnia 31 paź 2014 o godz. 18:52 Elliot Finley <efi...@gm...> napisał(a): > Hi Krzysztof, > > In your changlog you have lines like: > > - (master) fixed missing chunks CHUNKDEL LEADER/FOLLOWER race condition (pro version only) > > and there are others marked "pro version only". Does this mean these bugs still exist in the CE version? > > Thanks, > Elliot > > >> On Tue, Oct 28, 2014 at 12:56 AM, Krzysztof Kielak <krz...@mo...> wrote: >> Hi, >> >> I would strongly recommend to update your instance to the latest version of MooseFS, which is much better suited for VM workload. >> >> There is a number of changes and bug fixes in versions subsequent to 1.6.20 - some of them are critical from data integrity and performance point of view, especially changes regarding different classes of priorities for replication, new algorithms for chunkserver reblancing, improved data distribution algorithms and better I/O handling with read-ahead and caching. >> >> Just for your convinience I attach the list of changes from 1.6.20 to 2.0.40-1, the latest stable version available at http://get.moosefs.com >> >> * MooseFS 2.0.40-1 (2014-10-15) >> >> - (cs) fixed serious bug (new chunks were created on disks 'marked for removal') - introduced in 1.7.31 >> - (cs) fixed bug (segfault on very rare ocassions during disk scanning just after startup - caused by damaged chunk files) >> - (mount) fixed memory leak (supplementary group tabs) - intr. in 2.0.1 >> - (rpm) fixed systemd scripts >> - (daemons) added increasing limit of core dump size (debugging purposes) >> >> * MooseFS 2.0.39-1 (2014-10-06) >> >> - (mount) added ability to mount from /etc/fstab on FreeBSD >> >> * MooseFS 2.0.38-1 (2014-10-02) >> >> - (mount) fixed memory leak in workaround on FreeBSD early-release bug (FreeBSD only) >> >> * MooseFS 2.0.37-1 (2014-09-23) >> >> - (cs) added internal open files limit >> - (master) changed chunkserver choosing algorithm >> >> * MooseFS 2.0.36-1 (2014-09-18) >> >> - (cgi) fixed sorting masters by ip >> - (master) fixed acl posix compatibility (two tests from pjd) >> - (cs) fixed compatibility issues with master 1.6 >> >> * MooseFS 2.0.35-1 (2014-09-12) >> >> - (freebsd) fixed cgiserv and cli port >> - (mount) fixed setting acl >> >> * MooseFS 2.0.34-1 (2014-08-21) >> >> - (mount) workaround for bug in FreeBSD Fuse version >> - (master) added setting correct metaid in "new" followers (pro version only) >> >> * MooseFS 2.0.33-1 (2014-08-18) >> >> - (cs,supervisor) more reliable network code in supervisor module (pro version only) >> - (cs,master) added check for metaid in chunkservers (also hard drives) and secondary masters >> >> * MooseFS 2.0.32-1 (2014-08-13) >> >> - (cs) changed delay handling method (delayed file close,free crc block etc.) >> - (mount) fixed truncate loop (fixed bug intr. in 2.0.26) >> >> * MooseFS 2.0.31-1 (2014-08-12) >> >> - (cs) fixed bug: write data error not causing exit from write loop - sometimes could lead to segfault. >> - (master) fixed setting data cache drop bit. >> - (cli) fixed printed options for mfssupervisor >> >> * MooseFS 2.0.30-1 (2014-07-09) >> >> - (daemons) added writing pid to lockfile >> - (all) added freebsd port-maker >> >> * MooseFS 2.0.29-1 (2014-07-03) >> >> - (cs) moved fsync to background (improved write performance) >> >> * MooseFS 2.0.28-1 (2014-06-30) >> >> - (master) fixed syslog message about metaloggers in masters (pro version only) >> >> * MooseFS 2.0.27-1 (2014-06-27) >> >> - (tools) added to mfsfileinfo checksum digest and file signature calculation >> >> * MooseFS 2.0.26-1 (2014-06-26) >> >> - (cs) fixed bug in replicator >> - (mount) fixed truncate loop >> - (mount) fixed potential descriptor leak >> - (master) fixed bug with duplicated chunkserver id >> >> * MooseFS 2.0.25-1 (2014-06-23) >> >> - (mount) better handling error counters >> - (master) fixed replication status handling >> >> * MooseFS 2.0.24-1 (2014-06-23) >> >> - (master) added unfinished replications detection >> - (master) added source ip and port in replication status message >> - (cs) fixed handling job buffers >> - (daemons) added ignoring profiling signals >> >> * MooseFS 2.0.23-1 (2014-06-18) >> >> - (mount) changed chunkserver choosing algorithm >> - (master) added checks for busy/operation mismatch >> - (master) added checks for busy status during duplicate/duplicate+truncate operations >> - (cs) added ignoring 'marked for removal' disks in internal rebalance >> >> * MooseFS 2.0.22-1 (2014-06-17) >> >> - (master) add chunks to priority queues after chunk damage, chunk lost and set goal >> >> * MooseFS 2.0.21-1 (2014-06-17) >> >> - (supervisor+cstool) fixed version shown >> - (supervisor) fixed 'DEAD' master recognition routine >> - (master) fixed segfault introduced in 2.0.20 >> >> * MooseFS 2.0.20-1 (2014-06-16) >> >> - (cs) fixed handling unfinished jobs on socket timeout >> - (master) fixed missing chunks CHUNKDEL LEADER/FOLLOWER race condition (pro version only) >> - (master) postpone write to undergoal chunks during registration process (wait for all cs) >> - (master) undergoal replications have higher priority than write chunk >> - (all) added check for EINTR status after poll in sockets module >> >> * MooseFS 2.0.19-1 (2014-06-13) >> >> - (master+cs) added support for 'get version' packet >> - (all) fixed licence text in source files >> >> * MooseFS 2.0.18-1 (2014-06-12) >> >> - (cs) fixed bandwidth charts >> >> * MooseFS 2.0.17-1 (2014-06-12) >> >> - (daemons) added 'info' handler (for future use) >> - (master) changed median to arithmetic mean in 'new chunk' algorithm >> >> * MooseFS 2.0.16-1 (2014-06-11) >> >> - (master+cs) added 'maxentries' to chart data packet >> - (master) improved median usage calculations >> >> * MooseFS 2.0.15-1 (2014-06-10) >> >> - (master) fixed floating replication limits >> >> * MooseFS 2.0.14-1 (2014-06-10) >> >> - (master) replication limits can be defined as floating point numbers >> - (master+cgi+cli) changed store status (background,download and foreground) >> - (master+cgi+cli) changed option "canchangequota" to "admin" >> - (master) added rename/unlink "metadata.mfs" in case of using '-a' option >> - (master+cgi+cli) last save duration changed from seconds to miliseconds >> >> * MooseFS 2.0.13-1 (2014-06-09) >> >> - (master) added separate replication limits for different cases >> - (cs) deleting already deleted chunk returns status ok. >> >> * MooseFS 2.0.12-1 (2014-06-06) >> >> - (mount) changed readworker behaviour in case of having no data to read >> - (master) added block version check during metadata read >> - (metadump) added support for newest metadata format >> - (cs) changed format of reported ip in replicator module >> >> * MooseFS 2.0.11-1 (2014-06-03) >> >> - (cgi) fixed missing master condition >> - (netdump) new binary - mfsnetdump >> - (cs) new binary - mfschunktool >> - (master+cgi+cli) added maintenance mode to chunkservers >> - (cli) fixed cs-commands bug >> >> * MooseFS 2.0.10-1 (2014-06-02) >> >> - (cgi) added colors for versions >> - (master) fixed dirinfo counters update on write in master-follower (pro version only) >> >> * MooseFS 2.0.9-1 (2014-05-30) >> >> - (all) fixed compatibility with very old compilers >> - (cgi) added avg line and tooltips to '%used' bars >> >> * MooseFS 2.0.8-1 (2014-05-29) >> >> - (master) even better replication priority algorithm >> >> * MooseFS 2.0.7-1 (2014-05-28) >> >> - (master) improved replication priority algorithm >> >> * MooseFS 2.0.6-1 (2014-05-27) >> >> - (master) added replication priority "pseudomissing->endangered->undergoal->rebalance" >> - (master) added option for setting metadata download frequency (pro version only) >> >> * MooseFS 2.0.5-1 (2014-05-26) >> >> - (master) fixed finding best metadata file (-a option) >> - (cs) fixed infinite loop condition in sockets module >> >> * MooseFS 2.0.4-1 (2014-05-22) >> >> - (master+mount) fixed "chunk lost" condition during cs registration >> - (mount) fixed support for "tail -f" >> - (master) fixed xattr snapshot bug (attr name length not copied) >> >> * MooseFS 2.0.3-1 (2014-05-20) >> >> - (master) autorepair chunk version for standard scenarios >> - (master) fixed dealing with chunks that doesn't exist in metadata >> - (master) fixed status returned by CHUNKADD/CHUNKDEL >> - (all) faster endian conversions >> >> * MooseFS 2.0.2-1 (2014-05-15) >> >> - (mount) turn on posix-ACL's by default >> - (mount) add release number to version reporting (in df -i) >> >> * MooseFS 2.0.1-1 (2014-05-14) >> >> - (master) download meta file from followers >> - (master) added "get paths" and "get parents" commands >> - (master) added ability to send directories in parts >> - (master) keep network connections alive during long operations >> - (all) keep CL-CS and CS-CS connections alive between operations >> - (mount) fixed race-condition in read module >> - (mount) fixed local buffer lock in 'oplog' >> - (cs) fixed race condition in resolver >> - (mount) fixed truncate in read module >> - (master) fixed new chunk/delete chunk constants >> - (master+mount) implementing full support for posix-ACL's >> - (master+mount+tools) move permission checking from kernel to master >> >> * MooseFS 1.7.31 (2014-04-01) >> >> - (cs) fixed choosing disk algorithm >> - (cs) removed (some) unimportant syslog messages >> - (cs) tuning glibc malloc arenas >> - (cs) fixed protocol autonegotiation >> - (cs) fixed CS-CS and CS-CL communication charts >> >> * MooseFS 1.7.30 (2014-03-26) >> >> - (cs) fixed compatibility with mfs 1.6 >> - (mount) fixed "no xattr" bit bug (intr. in 1.7.25) >> >> * MooseFS 1.7.29 (2014-03-26) >> >> - (cs) fixed descriptor leak (intr. in 1.7.25) >> >> * MooseFS 1.7.28 (2014-03-11) >> >> - (all) fixed thread creation routine (introduced in 1.7.24) >> >> * MooseFS 1.7.27 (2014-03-11) >> >> - (cs) added memory chart >> - (master+cs) added virtual memory usage to memory charts >> - (cs) fixed race condition in charts (introduced in 1.7.25) >> - (cgi+cli) fixed cs charts >> >> * MooseFS 1.7.26 (2014-03-05) >> >> - (master+mount) create file using one command >> >> * MooseFS 1.7.25 (2014-02-27) >> >> - (all) fixed another bug in new network module (introduced in 1.7.21 and 1.7.22) >> - (cs) new I/O model >> - (master) added xattr flag >> - (master) added copying xattrs during snapshot (cp-like version) >> - (master) changed metadata version handling >> - (master) changed node structure (RAM and HDD) >> - (master+cs) added chunkserver id (identyfying cs in master) >> - (master+cs) new registration algorithm (vers. 6) >> - (master+mount) "no xattr" bit in file attr >> >> * MooseFS 1.7.24 (2014-02-19) >> >> - (mount) fixed xattr cache bug >> - (master) fixed ENOSPC condition after switching >> - (cgi+cli) fixed sortkeys for hdd write bandwidth >> - (master) turn off internal supervisor if there are connections from external supervisor >> - (cs) added common routine for thread creation >> >> * MooseFS 1.7.23 (2014-02-06) >> >> - (master) bug fix version (bugs introduced in 1.7.21 and 1.7.22) >> >> * MooseFS 1.7.22 (2014-02-05) >> >> - (master) fixed cs disconnection (make master responsive during disconnection) >> >> * MooseFS 1.7.21 (2014-02-05) >> >> - (master) fixed master<->chunkserver network module (make master responsive during chunk registration) >> >> * MooseFS 1.7.20 (2014-02-03) >> >> - (mount) added negative lookup cache >> >> * MooseFS 1.7.19 (2014-01-30) >> >> - (mount) added xattr cache >> >> * MooseFS 1.7.18 (2014-01-29) >> >> - (mount) setting FUSE capabilities (improved efficiency) >> - (cs) fixed disk choosing algorithm >> >> * MooseFS 1.7.17 (2014-01-28) >> >> - (cs) fixed mutex double-lock error (introduced in 1.7.14) >> - (master) improved calculation of init hash size >> >> * MooseFS 1.7.16 (2014-01-22) >> >> - (master+cgi+cli) memory usage detailed report >> - (master+cs+ml) fixed starvation scenario in network modules >> >> * MooseFS 1.7.15 (2014-01-15) >> >> - (master+cs) new charts prepared for mfs 2.0 >> - (cs) options for transfering data between disks >> >> * MooseFS 1.7.14 (2014-01-08) >> >> - (master) added simple supervisor (allow start masters without manual supervisor) >> - (cs) fixed internal rebalance algorithm >> - (cgi+cli) new class for mfs communication >> - (cgi+cli) introduced pro/ce version (for forthcoming mfs 2.0) >> >> * MooseFS 1.7.13 (2013-12-12) >> >> - (cs) added rebalance utilization limit option >> - (cs) added error tolerance options >> >> * MooseFS 1.7.12 (2013-12-11) >> >> - (master) fixed metadata synchronization bug (HUP detected to early) >> - (master) fixed metadata synchronization bug (settrashtime,setgoal,seteattr) >> - (master+cs) changed cs/disk choosing algorithm (random->rr and using only servers/disks close to median) >> - (cs) added rebalancing of disks usage >> >> * MooseFS 1.7.11 (2013-11-22) >> >> - (mount) fixed read/write bug (status EAGAIN on read/write caused EIO) >> - (cli) added user defined separator >> - (cgi+cli) display licence data >> >> * MooseFS 1.7.10 (2013-11-21) >> >> - (master) added licences for version pro >> - (mount) fixed very rare race condition (fuse context) >> >> * MooseFS 1.7.9 (2013-11-12) >> >> - (master) fixed chunk deletions during chunkserver disconnection >> - (daemons) fixed 'stop' status (ok when daemon is already stopped) >> - (daemons) added 'try-restart' action >> - (all) fixed status returned from socket operations >> - (mount) add ignoring SIGPIPE >> - (all) improved crc32 calculation (1.7 times faster) >> >> * MooseFS 1.7.8 (2013-11-05) >> >> - (master) new 'open files' module (also fixed some bugs) >> - (master) better management of chunks deletion in master >> - (master) fixed bug (adding chunkservers by ELECT) >> - (cgi+cli) added displaying disconnected sessions >> - (master+cgi+cli) added option to remove disconnected sessions >> >> * MooseFS 1.7.7 (2013-10-22) >> >> - (master+cs) added ability to change leader on demand >> - (mount) fixed couple of small bugs in new "read data" module >> - (supervisor) fixed connecting bugs >> - (master) added progress info during metadata restoring >> - (master) fixed net timeouts (fixing time after long operations) >> - (mount) fixed connection problems when one master is dead >> >> * MooseFS 1.7.6 (2013-09-20) >> >> - (master) imporoved chunk hashmap >> >> * MooseFS 1.7.5 (2013-09-12) >> >> - (master) fixed network timeouts for data transfer >> - (master) changed hashmaps >> - (cgi) BUSY state detection (mainly during data transfer) >> >> * MooseFS 1.7.4 (2013-08-14) >> >> - (mount) read procedures have been rewriten (with read ahead support) >> - (mount) read+write dynamic number of workers >> - (cs) dynamic number of workers >> - (cs) fixed segfault condition during closing application >> - (master) improved memory allocation >> - (master) improved hashmaps >> - (master) fixed bug (sending changelogs to desynchronized followers) >> - (master) fixed bug (cs added to metadata in desynchronized followers) >> - (master) imporoved sending old changelogs (sending in groups) >> >> * MooseFS 1.7.3 (2013-07-30) >> >> - (man) automatically set release date and version >> - (cgiserv) fixed lock-file issue (early truncate - introduced in 1.7.1) >> - (master+mount) when chunk is definitely lost then do not wait for it in clients >> - (cs) reduced amount of space reports to master (inspired by Davies Liu) >> - (mount) imporved write efficiency (inspired by Davies Liu) >> - (master) fixed 'rename/mv' bug in metarestore >> - (master) fixed csdb cleanup bug >> - (master) added SYNC/DESYNC to state description in syslog >> - (supervisor) imporved connecting to masters >> >> * MooseFS 1.7.2 (2013-07-12) >> >> - (cgi+cli) fixed recognition of using wrong DNS name >> - (cgi+cli) fixed commands bug in cli mode >> - (cgi+cli) added inputs for master host name and ports >> - (all) fixed distclean (automake strange behavior) >> >> * MooseFS 1.7.1 (2013-07-11) >> >> - (cgi+cli) simple support for master-ha >> - (cgi+cli) fixed path bug introduced in 1.7.0 >> - (all) added default-mastername and default-portbase options to configure script >> - (mount) fixed reconnecting to ELECT >> - (cgiserv) fixed pidfile bug >> >> * MooseFS 1.7.0 (2013-07-05) >> >> - (cgi+cli) mfs.cgi can be used also in CLI mode (for monitoring scripts etc.) >> - (dist) CGI permissions fix in RPM packaging >> - (dist) preliminary systemd support inspired by Arch Linux >> - (master+tools) added building files from particular chunks (manual file recovery) >> - (cs) join master tasks queue and client tasks queue into one tasks queue. >> - (cgi+cli) added chunkserver 'load' charts. >> - (cgi+cli) added 'load' column in chunkservers tabble >> - (master+cs) simple chunkserver overload detection >> - (master) 'grace' state for chunkservers - after overload detection given chunkserver is in 'grace' state for some time - new chunks are not created on that server in this state >> - (cgi+cli+master) turn off 'grace' state >> - (master) in 'noowner' i-nodes mode copy 'owner' bits to 'group' and 'other' bits >> - (common) increase time events frequency >> - (master) increase freqency of chunk-loop task (1 per second to 50 per seconds) >> - (mount) fixed invalidate open-dir dentry cache >> - (master) fixed sustaining old sessions by clients who can't properly register >> - (cs) improved efficiency (removed lock in network thread) >> - (master) fixed chunkserver disconnection handling (remove server data form chunks 'in background') >> - (master) changed directory size (from GB to 'floating point' numbers, size nxxxxyy = xxxx.yy * 2^(10*n), f.e. 2053189 = 531.89MB) >> - (master+tools) added mfssnapshot option (cp-like mode) - set files ownership to user who makes snapshot. >> - (cgi+cli) python scripts where made Python 3 compatible >> - (all) added support for quota (per directory) >> - (all) added support for xattr's (without special ACL treatment) >> - (master) fixed dirinfo calculation bug (symbolic link snapshots) >> - (all) added support for high availability >> - (master) added meterestore option (mfsmetarestore is now included in mfsmaster) >> >> * MooseFS 1.6.27 (2012-08-09) >> >> - (metarestore) fixed bug - freeing filenames memory too early >> - (all) added initial support for extra attributes (xattr), which will be introduced in upcoming version 1.7 >> - (master+metalogger) better change log synchronization (storage in master memory and sending expected version in metalogger - inspired by Davies Liu) >> - (master) acceptable difference of percent of hdd usage added to configuration (up to this version this parameter was constantly set to 0.01% - patch contributed by Davies Liu) >> - (master) added emergency store metadata to some other places on errors during standard hourly store (inspired by Davies Liu) >> - (cs) default space to be left (256MB) moved to config file (inspired by Davies Liu) >> - (cs) added extra limits in mfshdd.cfg (minimum space to be left, maximum space to be used) >> - (cs) fixed charts overflow issue (overflow in net transfer on about 575Mbps and hdd transfer on about 77MBps) >> - (metalogger) fixed issue: file variable was not clear after close (on rare occasions might cause segfault after disconnecting from master) >> - (all) cfg files moved form PREFIX/etc/ to PREFIX/etc/mfs/ >> - (cgiserv) improved CGI handle (added support for custom http responses, such as "302 Found") >> - (master+cgi) showing disconnected chunkservers in "Servers" tab. >> - (deb+rpm) mfscgiserv moved from mfscgi to separate package, changes in init scripts >> - (mount) added option 'mfsdelayedinit' - for being run from fstab/init scripts >> - (master) optimized goal management in chunks >> - (master) fixed rare race-condition on clear/preserve cache during open in mount >> - (mount) fixed compiling problems on Mac OS X >> - (all) changed code to be more compatible with new gcc 4.7 (gcc 4.7 has too strong optimizations - it can generate unpredictable code) >> - (master) sustain session time could be defined in mfsmaster.cfg >> >> * MooseFS 1.6.26 (2012-02-01) >> >> - (all) fixed signal handling in multithreaded modules >> - (master) added goal and trashtime limits to mfsexport.cfg >> - (metalogger) added simple check for downloaded metadata file (inspired by Davies Liu) >> - (master) better handle disk full (inspired by Davies Liu) >> - (master+metalogger) added keeping previous copies of metadata (inspired by Davies Liu) >> - (all) reload all settings on "reload" (SIGHUP) >> - (cs) disk scanning in background >> - (cs) fixed long termination issue (found by Davies Liu) >> - (master) fixed modify/open cache race >> >> * MooseFS 1.6.25 (2011-12-30) >> >> - (metadump) fixed dumping big files (>2TiB) >> - (metarestore) fixed bug: nonexisting changelog file caused segv >> - (master+mount) added 'sugidclearmode' and 'mkdircopysgid' compatibility options >> - (master) improved chunk deletion algorithm (soft/hard limits per server) >> - (all) ready for new metadata file format, which will be introduced in upcomoing version 1.7 >> - (all) ready for quota handling, which will be introduced in upcoming version 1.7 >> >> * MooseFS 1.6.24 (2011-12-06) >> >> - (master+mount) proxy in mount for mfstools (fixes problems with frequent connect to master) >> >> * MooseFS 1.6.23 (2011-11-08) >> >> - (master+mount) removed directory cache (didn't increase performance as expected and caused many troubles) >> - (metarestore) added option (-i) - ignore some structure inconsistencies >> - (metarestore) added option (-b) - in case of errors save the best metadata file >> - (mount) more dynamic write cache management (changed condition ib<tb/5 to ib<3*fb where: ib - inode blocks in cache, tb - total blocks in cache, fb - free block in cache) >> - (master) save metadata file to alternative locations in case of error >> - (all) increased file length limit from 2TiB to 128PiB >> - (cgiserv) fixed directory traversal vulnerability >> - (cgiserv) added lockfile/pidfile and actions such as 'start', 'stop', 'restart' and 'test'. >> - (mount) fixed parsing file with defaults >> >> * MooseFS 1.6.22 (2011-05-09) >> >> - (mount) added resolving master hostname whenever connection has failed >> - (cs) removed getting attributes of every chunk during initialization - speeds up starting of chunkserver >> - (cs) changed calculating of total/used space. Superuser space now is substracted from total space instead of adding it to used space >> - (master+mount) fixed directory cache. >> - (debian) rewritten init scripts to use mfscommon commands (start/stop/restart/reload) instead of start-stop-daemon (where stop caused killing all instances of daemon) >> - (debian) changed init scripts to bail out early if MFS*_ENABLE is not true (i.e. given daemon is not scripts-controlled) >> >> * MooseFS 1.6.21 (2011-04-14) >> >> - (mount) added support of default config file (mfsmount.cfg) >> - (metarestore) fixed snapshot bug >> - (metarestore) improved tolerance for damaged changelog files >> - (master,mount) added full directory (with attributes) cache on client (mfsmount) side >> - (mount) added symlink cache on client (mfsmount) side >> - (mount) added hidden files '.oplog' and '.ophistory' with detailed info about current/historical operations performed by mfsmount >> - (master) added simple net topology support >> - (all) added -D_POSIX_PTHREAD_SEMANTICS to CFLAGS - needed by Solaris-like OSes >> - (cs) fixed detection of 'damaged disk' condition >> - (mount) fixed error caused segfaults during umount on certain conditions >> - (daemon) added 'test' command - checks if process is running and returns its PID >> >> >> >> Best Regards, >> Krzysztof Kielak >> Director of Operations and Customer Support >> Mobile: +48 601 476 440 >> >> Od: WK >> Wysłano: wtorek, 28 października 2014 01:06 >> Do: krz...@mo..., moo...@li... >> >> >> On 10/27/2014 2:28 PM, Krzysztof Kielak wrote: >> > Dear WK, >> > >> > Could you please provide some more details? Which version of MooseFS you were testing? On which Hypervisor? How many chunkserver? How many spindles etc? >> >> 1.6.20 at the time. About 4-5 CS and a goal of 3. Each CS had 2-4 >> drives. We have a Mix of KVM and Oracles Virtbox (Vagrant). >> >> > >> > You could control the acceptable replication overhead within MooseFS starting from versions before 1.6 but current version 2.0 has a lot of improvements for replication configuration and chunkserver rebalancing in case of failure. >> >> Yes, we could slow down the replication speed in later versions, but you >> would still get stress if it was doing several things at once. For >> example, you lost a chunkserver and replaced it at the same time you >> were adding Hard Drives to other CS and/or a particular VM got pounded >> with Disk I/O and/or a tech decided to upload a new VM image without >> rate limiting the rsync. >> >> Again things generally ran great with reasonable performance, but every >> once in a while you could get these multiple incident scenarios and then >> you were asking for the VMs to go read only. We did play with different >> elevators and that seemed to help (I can't recall which worked best) and >> we did tune the OS's timeout within the VM, so they weren't so quick to >> go VM, but if you did that too aggressively, you could end up with a >> corrupt image due to the aggressive MFS caching. >> >> To be fair it wasn't the greatest kit either with older SATA/SCSI >> Drives, since it was a dev bench situation. We also were constantly >> swapping in and out drives within CS as kit was freed up from production >> upgrades. Note that is what is so cool about MFS, you can play around >> with the Chunkservers and it just adapts and auto-magically does the >> right thing. Kind of like an open source DROBO. >> >> Other observations, >> >> The smaller the VM the less likely you were going to have a problem >> (i.e. under 2-3 GB almost never even under load, as you got closer to >> 50GB image, the chances of a RO increased). >> >> We simply came to the conclusion that since the VM image was composed of >> numerous MFS chunks, that EXT3/4 didn't like having the rug pulled out >> from under it. You could predict that something was going to go RO by >> looking at /var/log/messages. As those retry errors began to quickly >> scroll up the screen (normally not a big deal), you knew what was coming. >> >> Perhaps if we have time, we may play with the newer versions, I was just >> reporting our observations. Clearly with newer software, better kit, >> less goals and SSDs you could probably mitigate the problem even with >> the older software, but we were concerned that there WAS a point where >> you could get in trouble even though that situation was clearly more remote. >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users > |