Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


#20 chunkserver fails to Write() when single filesystem is full

chunkserver (8)
Lamont Granquist

> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/cciss/c0d0p3 269G 4.6G 251G 2% /
/dev/cciss/c0d0p1 289M 22M 253M 8% /boot
tmpfs 4.0G 0 4.0G 0% /dev/shm
/dev/cciss/c0d1p1 275G 275G 0 100% /chunkspace1
/dev/cciss/c0d2p1 275G 112G 150G 43% /chunkspace2
/dev/cciss/c0d3p1 275G 112G 150G 43% /chunkspace3
/dev/cciss/c0d4p1 275G 112G 150G 43% /chunkspace4
/dev/cciss/c0d5p1 275G 112G 150G 43% /chunkspace5

machine.cfg looks like this:

node: localhost
rundir: /kfs/meta
baseport: 20000
clusterkey: test-cluster
node: localhost
rundir: /kfs/chunk1
baseport: 30000
space: 1374 G
chunkdir: /chunkspace1 /chunkspace2 /chunkspace3 /chunkspace4 /chunkspace5

initially i filled up ~163GB into /chunkspace1 and then added the other 4 partitions at which point it started uniformly doing writes across all the partitions. it started to fail completely to Write() to chunks when /chunkspace1 filled up.

the allocation algorithm should also probably try to better balance writes to partitions that have more free space.


  • Actually this is a substantially worse bug -- the chunkserver simply crashes after trying to write to a full filesystem.

  • And i think i see my problem. KFS doesn't support this kind of operation. The IDs are hashed with % n where n is the number of partitions to create the pathname. So, i never tried reading back the data that i had loaded into /chunkspace1 before adding the other 4 partitions, and that clearly wouldn't have worked based on the code.

    This probably will still cause issues in edge cases since one of the partitions will fill up first while the chunkserver will still be reporting space available on the server. If you keep your KFS filestores below 90% utilization in the aggregate then you're probably fine, but if you're down to the wire playing chicken between getting the business/finance people to buy you some new fileserver space and with the thing filling up, this behavior could give you a bad day as you wind up with chunkservers that have some space free which you can't write to.

  • sriramsrao

    • assigned_to: nobody --> sriramsrao
  • sriramsrao

    The JBOD code has been re-worked. The chunkserver will not try to write to a drive that is full. We do a round-robin on the drives when placing a chunk, tracking how much space is used on a given drive.

  • sriramsrao

    • status: open --> open-fixed
  • sriramsrao

    • status: open-fixed --> closed-fixed