[ext2resize] Re: Deadlock help
Status: Inactive
Brought to you by:
adilger
From: Andreas D. <ad...@cl...> - 2002-10-09 08:42:02
|
On Oct 03, 2002 10:58 -0700, Robert Walsh wrote: > > I could probably do this in a couple of hours if you would be willing > > to do the testing, or is the description enough? Of course, if you > > are not running multiple resizers at one time (by accident normally, > > of course) then the only place you actually need the sb lock is at > > the critical region previously mentioned in ext3_group_add() because > > we are updating the free counts, which are also updated by other parts > > of the code. > > Running multiple resizers at once sounds like a weird situation - > probably a mistake on the users behalf, right? However, if it's > possible to do it, then it shouldn't result in corrupt data or an oops, > so it should be accounted for. > > Sounds like you've got a better idea about how to do this than I do, so > why don't you go ahead and I'll definitely give it a good testing. Sorry for the delay - the following patch is a new version of the kernel patch. (We have some 1000-node acceptance tests we are supposed to run for Lustre, but are hitting some bugs as we've never had access to that many nodes before ;-). It is basically just an edited of patches/online-ext3-2.4.18.diff so it is possible that it doesn't even compile, but it should be close. The diff against that patch is fairly small, if you want to see just the changes. The lock_super() is moved to be strictly after journal_start() (to avoid the oops you were having), and is not held for cases where it is not needed. The critical area for adding groups ended up larger than I originally thought, since we need to keep anything from changing in the superblock after we have deleted the pointers to the backup group descriptors, but I don't think that is a big deal. As a side benefit, the moving of the superblock lock removes a bit of unpleasantness around ext3_free_blocks() (dropping the lock and getting it again). Cheers, Andreas ========================================================================= diff -rNu linux-2.4.18-orig/Documentation/Configure.help linux-2.4.18/Documentation/Configure.help --- linux-2.4.18-orig/Documentation/Configure.help Mon Feb 25 11:37:51 2002 +++ linux-2.4.18/Documentation/Configure.help Tue Sep 10 11:18:06 2002 @@ -14078,6 +14078,20 @@ generated. To turn debugging off again, do "echo 0 > /proc/sys/fs/jbd-debug". +Online resize for ext3 filesystems +CONFIG_EXT3_RESIZE + This option gives you the ability to increase the size of an ext3 + filesystem while it is mounted (in use). In order to do this, you + must also be able to resize the underlying disk partition, probably + via a Logical Volume Manager (LVM), metadevice (MD), or hardware + RAID device - none of that capability is included in this feature. + If you do not know what any of these things are, or you have not + configured your kernel for them, you should probably say N here. If + you choose Y, then your kernel will be about 3k larger, and you need + to get some more software (http://ext2resize.sourceforge.net/) in + order to actually resize your filesystem, otherwise this feature + will just sit unused inside the kernel. + Buffer Head tracing (DEBUG) CONFIG_BUFFER_DEBUG If you are a kernel developer working with file systems or in the diff -rNu linux-2.4.18-orig/fs/Config.in linux-2.4.18/fs/Config.in --- linux-2.4.18-orig/fs/Config.in Mon Feb 25 11:38:07 2002 +++ linux-2.4.18/fs/Config.in Tue Sep 10 11:18:06 2002 @@ -27,6 +27,7 @@ # dep_tristate ' Journal Block Device support (JBD for ext3)' CONFIG_JBD $CONFIG_EXT3_FS define_bool CONFIG_JBD $CONFIG_EXT3_FS dep_mbool ' JBD (ext3) debugging support' CONFIG_JBD_DEBUG $CONFIG_JBD +dep_mbool ' Online ext3 resize support (DANGEROUS)' CONFIG_EXT3_RESIZE $CONFIG_EXT3_FS $CONFIG_EXPERIMENTAL # msdos file systems tristate 'DOS FAT fs support' CONFIG_FAT_FS diff -rNu linux-2.4.18-orig/fs/ext3/Makefile linux-2.4.18/fs/ext3/Makefile --- linux-2.4.18-orig/fs/ext3/Makefile Fri Dec 21 09:41:55 2001 +++ linux-2.4.18/fs/ext3/Makefile Tue Sep 10 11:18:06 2002 @@ -11,6 +11,7 @@ obj-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \ ioctl.o namei.o super.o symlink.o +obj-$(CONFIG_EXT3_RESIZE) += resize.o obj-m := $(O_TARGET) include $(TOPDIR)/Rules.make diff -rNu linux-2.4.18-orig/fs/ext3/balloc.c linux-2.4.18/fs/ext3/balloc.c --- linux-2.4.18-orig/fs/ext3/balloc.c Mon Feb 25 11:38:08 2002 +++ linux-2.4.18/fs/ext3/balloc.c Tue Sep 10 11:18:06 2002 @@ -423,7 +423,7 @@ error_return: ext3_std_error(sb, err); unlock_super(sb); - if (dquot_freed_blocks) + if (dquot_freed_blocks && !(EXT3_I(inode)->i_state & EXT3_STATE_RESIZE)) DQUOT_FREE_BLOCK(inode, dquot_freed_blocks); return; } @@ -821,13 +821,13 @@ unsigned long ext3_count_free_blocks (struct super_block * sb) { -#ifdef EXT3FS_DEBUG struct ext3_super_block * es; unsigned long desc_count, bitmap_count, x; int bitmap_nr; struct ext3_group_desc * gdp; int i; - + + if (test_opt(sb, DEBUG)) { lock_super (sb); es = sb->u.ext3_sb.s_es; desc_count = 0; @@ -848,13 +848,12 @@ i, le16_to_cpu(gdp->bg_free_blocks_count), x); bitmap_count += x; } - printk("ext3_count_free_blocks: stored = %lu, computed = %lu, %lu\n", + printk(__FUNCTION__": stored = %u, computed gdt = %lu, bitmap = %lu\n", le32_to_cpu(es->s_free_blocks_count), desc_count, bitmap_count); unlock_super (sb); return bitmap_count; -#else + } else return le32_to_cpu(sb->u.ext3_sb.s_es->s_free_blocks_count); -#endif } static inline int block_in_use (unsigned long block, diff -rNu linux-2.4.18-orig/fs/ext3/inode.c linux-2.4.18/fs/ext3/inode.c --- linux-2.4.18-orig/fs/ext3/inode.c Mon Feb 25 11:38:08 2002 +++ linux-2.4.18/fs/ext3/inode.c Tue Sep 10 11:18:06 2002 @@ -1984,36 +1984,33 @@ int ext3_get_inode_loc (struct inode *inode, struct ext3_iloc *iloc) { struct buffer_head *bh = 0; + struct super_block *sb = inode->i_sb; + struct ext3_sb_info *sbi = EXT3_SB(inode->i_sb); + unsigned long ino = inode->i_ino; unsigned long block; unsigned long block_group; unsigned long group_desc; unsigned long desc; unsigned long offset; struct ext3_group_desc * gdp; - - if ((inode->i_ino != EXT3_ROOT_INO && - inode->i_ino != EXT3_ACL_IDX_INO && - inode->i_ino != EXT3_ACL_DATA_INO && - inode->i_ino != EXT3_JOURNAL_INO && - inode->i_ino < EXT3_FIRST_INO(inode->i_sb)) || - inode->i_ino > le32_to_cpu( - inode->i_sb->u.ext3_sb.s_es->s_inodes_count)) { - ext3_error (inode->i_sb, "ext3_get_inode_loc", - "bad inode number: %lu", inode->i_ino); + + if ((ino != EXT3_ROOT_INO && ino != EXT3_ACL_IDX_INO && + ino != EXT3_ACL_DATA_INO && ino != EXT3_JOURNAL_INO && + ino != EXT3_RESIZE_INO && ino < EXT3_FIRST_INO(sb)) || + ino > le32_to_cpu(sbi->s_es->s_inodes_count)) { + ext3_error(sb, __FUNCTION__, "bad inode number: %lu", ino); goto bad_inode; } - block_group = (inode->i_ino - 1) / EXT3_INODES_PER_GROUP(inode->i_sb); - if (block_group >= inode->i_sb->u.ext3_sb.s_groups_count) { - ext3_error (inode->i_sb, "ext3_get_inode_loc", - "group >= groups count"); + block_group = (ino - 1) / sbi->s_inodes_per_group; + if (block_group >= sbi->s_groups_count) { + ext3_error(sb, __FUNCTION__, "group >= groups count"); goto bad_inode; } - group_desc = block_group >> EXT3_DESC_PER_BLOCK_BITS(inode->i_sb); - desc = block_group & (EXT3_DESC_PER_BLOCK(inode->i_sb) - 1); - bh = inode->i_sb->u.ext3_sb.s_group_desc[group_desc]; + group_desc = block_group >> sbi->s_desc_per_block_bits; + desc = block_group & (sbi->s_desc_per_block - 1); + bh = sbi->s_group_desc[group_desc]; if (!bh) { - ext3_error (inode->i_sb, "ext3_get_inode_loc", - "Descriptor not loaded"); + ext3_error(sb, __FUNCTION__, "Descriptor not loaded"); goto bad_inode; } @@ -2021,17 +2018,16 @@ /* * Figure out the offset within the block group inode table */ - offset = ((inode->i_ino - 1) % EXT3_INODES_PER_GROUP(inode->i_sb)) * - EXT3_INODE_SIZE(inode->i_sb); + offset = ((ino - 1) % sbi->s_inodes_per_group) * sbi->s_inode_size; block = le32_to_cpu(gdp[desc].bg_inode_table) + - (offset >> EXT3_BLOCK_SIZE_BITS(inode->i_sb)); - if (!(bh = sb_bread(inode->i_sb, block))) { - ext3_error (inode->i_sb, "ext3_get_inode_loc", - "unable to read inode block - " - "inode=%lu, block=%lu", inode->i_ino, block); + (offset >> EXT3_BLOCK_SIZE_BITS(sb)); + if (!(bh = sb_bread(sb, block))) { + ext3_error(sb, __FUNCTION__, + "unable to read inode block - inode=%lu, block=%lu", + ino, block); goto bad_inode; } - offset &= (EXT3_BLOCK_SIZE(inode->i_sb) - 1); + offset &= EXT3_BLOCK_SIZE(sb) - 1; iloc->bh = bh; iloc->raw_inode = (struct ext3_inode *) (bh->b_data + offset); diff -rNu linux-2.4.18-orig/fs/ext3/ioctl.c linux-2.4.18/fs/ext3/ioctl.c --- linux-2.4.18-orig/fs/ext3/ioctl.c Fri Nov 9 14:25:04 2001 +++ linux-2.4.18/fs/ext3/ioctl.c Tue Sep 10 11:19:00 2002 @@ -11,6 +11,8 @@ #include <linux/jbd.h> #include <linux/ext3_fs.h> #include <linux/ext3_jbd.h> +#include <linux/locks.h> +#include <linux/smp_lock.h> #include <linux/sched.h> #include <asm/uaccess.h> @@ -140,6 +142,51 @@ ext3_journal_stop(handle, inode); return err; } +#ifdef CONFIG_EXT3_RESIZE + case EXT3_IOC_GROUP_EXTEND: { + unsigned long n_blocks_count; + struct super_block *sb = inode->i_sb; + int err; + + if (!capable(CAP_SYS_RESOURCE)) + return -EACCES; + + if (sb->s_flags & MS_RDONLY) + return -EROFS; + + if (get_user(n_blocks_count, (__u32 *)arg)) + return -EFAULT; + + lock_kernel(); + err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count); + unlock_kernel(); + journal_flush(EXT3_SB(sb)->s_journal); + + return err; + } + case EXT3_IOC_GROUP_ADD: { + struct ext3_new_group_data input; + struct super_block *sb = inode->i_sb; + int err; + + if (!capable(CAP_SYS_RESOURCE)) + return -EACCES; + + if (inode->i_sb->s_flags & MS_RDONLY) + return -EROFS; + + if (copy_from_user(&input, (struct ext3_new_group_input *)arg, + sizeof(input))) + return -EFAULT; + + lock_kernel(); + err = ext3_group_add(sb, &input); + unlock_kernel(); + journal_flush(EXT3_SB(sb)->s_journal); + + return err; + } +#endif /* CONFIG_EXT3_RESIZE */ #ifdef CONFIG_JBD_DEBUG case EXT3_IOC_WAIT_FOR_READONLY: /* diff -rNu linux-2.4.18-orig/fs/ext3/resize.c linux-2.4.18/fs/ext3/resize.c --- linux-2.4.18-orig/fs/ext3/resize.c Wed Dec 31 16:00:00 1969 +++ linux-2.4.18/fs/ext3/resize.c Tue Sep 10 11:19:04 2002 @@ -0,0 +1,958 @@ +/* + * linux/fs/ext3/resize.c + * + * Support for resizing an ext3 filesystem while it is mounted. + * + * Copyright (C) 2001, 2002 Andreas Dilger <ad...@cl...> + * + * This could probably be made into a module, because it is not often in use. + */ + +#include <linux/config.h> + +#define EXT3FS_DEBUG + +#include <linux/sched.h> +#include <linux/smp_lock.h> +#include <linux/ext3_jbd.h> + +#include <linux/errno.h> +#include <linux/locks.h> +#include <linux/slab.h> + + +#define outside(b, first, last) ((b) < (first) || (b) >= (last)) +#define inside(b, first, last) ((b) >= (first) && (b) < (last)) + +static int verify_group_input(struct super_block *sb, + struct ext3_new_group_data *input) +{ + struct ext3_sb_info *sbi = EXT3_SB(sb); + struct ext3_super_block *es = sbi->s_es; + unsigned start = le32_to_cpu(es->s_blocks_count); + unsigned end = start + input->blocks_count; + unsigned group = input->group; + unsigned itend = input->inode_table + EXT3_SB(sb)->s_itb_per_group; + unsigned overhead = ext3_bg_has_super(sb, group) ? + (1 + ext3_bg_num_gdb(sb, group) + + le16_to_cpu(es->s_reserved_gdt_blocks)) : 0; + unsigned metaend = start + overhead; + struct buffer_head *bh; + int free_blocks_count; + int err = -EINVAL; + + input->free_blocks_count = free_blocks_count = + input->blocks_count - 2 - overhead - sbi->s_itb_per_group; + + if (test_opt(sb, DEBUG)) + printk("EXT3-fs: adding %s group %u: %u blocks " + "(%d free, %u reserved)\n", + ext3_bg_has_super(sb, input->group) ? "normal" : + "no-super", input->group, input->blocks_count, + free_blocks_count, input->reserved_blocks); + + if (group != sbi->s_groups_count) + ext3_warning(sb, __FUNCTION__, + "Cannot add at group %u (only %lu groups)", + input->group, sbi->s_groups_count); + else if ((start - le32_to_cpu(es->s_first_data_block)) % + EXT3_BLOCKS_PER_GROUP(sb)) + ext3_warning(sb, __FUNCTION__, "Last group not full"); + else if (input->reserved_blocks > input->blocks_count / 5) + ext3_warning(sb, __FUNCTION__, "Reserved blocks too high (%u)", + input->reserved_blocks); + else if (free_blocks_count < 0) + ext3_warning(sb, __FUNCTION__, "Bad blocks count %u", + input->blocks_count); + else if (!(bh = sb_bread(sb, end - 1))) + ext3_warning(sb, __FUNCTION__, "Cannot read last block (%u)", + end - 1); + else if (outside(input->block_bitmap, start, end)) + ext3_warning(sb, __FUNCTION__, + "Block bitmap not in group (block %u)", + input->block_bitmap); + else if (outside(input->inode_bitmap, start, end)) + ext3_warning(sb, __FUNCTION__, + "Inode bitmap not in group (block %u)", + input->inode_bitmap); + else if (outside(input->inode_table, start, end) || + outside(itend - 1, start, end)) + ext3_warning(sb, __FUNCTION__, + "Inode table not in group (blocks %u-%u)", + input->inode_table, itend - 1); + else if (input->inode_bitmap == input->block_bitmap) + ext3_warning(sb, __FUNCTION__, + "Block bitmap same as inode bitmap (%u)", + input->block_bitmap); + else if (inside(input->block_bitmap, input->inode_table, itend)) + ext3_warning(sb, __FUNCTION__, + "Block bitmap (%u) in inode table (%u-%u)", + input->block_bitmap, input->inode_table, itend-1); + else if (inside(input->inode_bitmap, input->inode_table, itend)) + ext3_warning(sb, __FUNCTION__, + "Inode bitmap (%u) in inode table (%u-%u)", + input->inode_bitmap, input->inode_table, itend-1); + else if (inside(input->block_bitmap, start, metaend)) + ext3_warning(sb, __FUNCTION__, + "Block bitmap (%u) in GDT table (%u-%u)", + input->block_bitmap, start, metaend - 1); + else if (inside(input->inode_bitmap, start, metaend)) + ext3_warning(sb, __FUNCTION__, + "Inode bitmap (%u) in GDT table (%u-%u)", + input->inode_bitmap, start, metaend - 1); + else if (inside(input->inode_table, start, metaend) || + inside(itend - 1, start, metaend)) + ext3_warning(sb, __FUNCTION__, + "Inode table (%u-%u) overlaps GDT table (%u-%u)", + input->inode_table, itend - 1, start, metaend - 1); + else { + brelse(bh); + err = 0; + } + + return err; +} + +static struct buffer_head *bclean(handle_t *handle, struct super_block *sb, + unsigned long blk) +{ + struct buffer_head *bh; + int err; + + bh = sb_getblk(sb, blk); + mark_buffer_uptodate(bh, 1); + if ((err = ext3_journal_get_write_access(handle, bh))) { + brelse(bh); + bh = ERR_PTR(err); + } else + memset(bh->b_data, 0, sb->s_blocksize); + + return bh; +} + +/* + * To avoid calling the atomic setbit hundreds or thousands of times, we only + * need to use it within a single byte (to ensure we get endianness right). + * We can use memset for the rest of the bitmap as there are no other users. + */ +static void mark_bitmap_end(int start_bit, int end_bit, char *bitmap) +{ + int i; + + if (start_bit >= end_bit) + return; + + ext3_debug("mark end bits +%d through +%d used\n", start_bit, end_bit); + for (i = start_bit; i < ((start_bit + 7) & ~7UL); i++) + ext3_set_bit(i, bitmap); + if (i < end_bit) + memset(bitmap + (i >> 3), 0xff, (end_bit - i) >> 3); +} + +/* + * Set up the block and inode bitmaps, and the inode table for the new group. + * This doesn't need to be part of the main transaction, since we are only + * changing blocks outside the actual filesystem. We still do journaling to + * ensure the recovery is correct in case of a failure just after resize. + * If any part of this fails, we simply abort the resize. + * + * We only pass inode because of the ext3 journal wrappers. + */ +static int setup_new_group_blocks(struct super_block *sb, struct inode *inode, + struct ext3_new_group_data *input) +{ + struct ext3_sb_info *sbi = EXT3_SB(sb); + unsigned long start = input->group * sbi->s_blocks_per_group + + le32_to_cpu(sbi->s_es->s_first_data_block); + int reserved_gdb = ext3_bg_has_super(sb, input->group) ? + le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) : 0; + unsigned long gdblocks = ext3_bg_num_gdb(sb, input->group); + struct buffer_head *bh; + handle_t *handle; + unsigned long block; + int bit; + int i; + int err = 0, err2; + + handle = ext3_journal_start(inode, reserved_gdb + gdblocks + + 2 + sbi->s_itb_per_group); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + lock_super(sb); + if (input->group != sbi->s_groups_count) { + err = -EBUSY; + goto exit_journal; + } + + if (IS_ERR(bh = bclean(handle, sb, input->block_bitmap))) { + err = PTR_ERR(bh); + goto exit_journal; + } + + if (ext3_bg_has_super(sb, input->group)) { + ext3_debug("mark backup superblock %#04lx (+0)\n", start); + ext3_set_bit(0, bh->b_data); + } + + /* Copy all of the GDT blocks into the backup in this group */ + for (i = 0, bit = 1, block = start + 1; + i < gdblocks; i++, block++, bit++) { + struct buffer_head *gdb; + + ext3_debug("update backup group %#04lx (+%d)\n", block, bit); + + gdb = sb_getblk(sb, block); + mark_buffer_uptodate(gdb, 1); + if ((err = ext3_journal_get_write_access(handle, gdb))) { + brelse(gdb); + goto exit_bh; + } + memcpy(gdb->b_data, sbi->s_group_desc[i], bh->b_size); + ext3_journal_dirty_metadata(handle, gdb); + ext3_set_bit(bit, bh->b_data); + brelse(gdb); + } + + /* Zero out all of the reserved backup group descriptor table blocks */ + for (i = 0, bit = gdblocks + 1, block = start + bit; + i < reserved_gdb; i++, block++, bit++) { + struct buffer_head *gdb; + + ext3_debug("clear reserved block %#04lx (+%d)\n", block, bit); + + if (IS_ERR(gdb = bclean(handle, sb, block))) { + err = PTR_ERR(bh); + goto exit_bh; + } + ext3_journal_dirty_metadata(handle, gdb); + ext3_set_bit(bit, bh->b_data); + brelse(gdb); + } + ext3_debug("mark block bitmap %#04x (+%ld)\n", input->block_bitmap, + input->block_bitmap - start); + ext3_set_bit(input->block_bitmap - start, bh->b_data); + ext3_debug("mark inode bitmap %#04x (+%ld)\n", input->inode_bitmap, + input->inode_bitmap - start); + ext3_set_bit(input->inode_bitmap - start, bh->b_data); + + /* Zero out all of the inode table blocks */ + for (i = 0, block = input->inode_table, bit = block - start; + i < sbi->s_itb_per_group; i++, bit++, block++) { + struct buffer_head *it; + + ext3_debug("clear inode block %#04x (+%ld)\n", block, bit); + if (IS_ERR(it = bclean(handle, sb, block))) { + err = PTR_ERR(it); + goto exit_bh; + } + ext3_journal_dirty_metadata(handle, it); + brelse(it); + ext3_set_bit(bit, bh->b_data); + } + mark_bitmap_end(input->blocks_count, EXT3_BLOCKS_PER_GROUP(sb), + bh->b_data); + ext3_journal_dirty_metadata(handle, bh); + brelse(bh); + + /* Mark unused entries in inode bitmap used */ + ext3_debug("clear inode bitmap %#04x (+%ld)\n", + input->inode_bitmap, input->inode_bitmap - start); + if (IS_ERR(bh = bclean(handle, sb, input->inode_bitmap))) { + err = PTR_ERR(bh); + goto exit_journal; + } + + mark_bitmap_end(EXT3_INODES_PER_GROUP(sb), EXT3_BLOCKS_PER_GROUP(sb), + bh->b_data); + ext3_journal_dirty_metadata(handle, bh); +exit_bh: + brelse(bh); + +exit_journal: + unlock_super(sb); + if ((err2 = ext3_journal_stop(handle, inode)) && !err) + err = err2; + + return err; +} + +/* + * Iterate through the groups which hold BACKUP superblock/GDT copies in an + * ext3 filesystem. The counters should be initialized to 1, 5, and 7 before + * calling this for the first time. In a sparse filesystem it will be the + * sequence of powers of 3, 5, and 7: 1, 3, 5, 7, 9, 25, 27, 49, 81, ... + * For a non-sparse filesystem it will be every group: 1, 2, 3, 4, ... + */ +unsigned ext3_list_backups(struct super_block *sb, unsigned *three, + unsigned *five, unsigned *seven) +{ + unsigned *min = three; + int mult = 3; + unsigned ret; + + if (!EXT3_HAS_RO_COMPAT_FEATURE(sb, + EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)) { + ret = *min; + *min += 1; + return ret; + } + + if (*five < *min) { + min = five; + mult = 5; + } + if (*seven < *min) { + min = seven; + mult = 7; + } + + ret = *min; + *min *= mult; + + return ret; +} + +/* + * Check that all of the backup GDT blocks are held in the primary GDT block. + * It is assumed that they are stored in group order. Returns the number of + * groups in current filesystem that have BACKUPS, or -ve error code. + */ +static int verify_reserved_gdb(struct super_block *sb, + struct buffer_head *primary) +{ + const unsigned long blk = primary->b_blocknr; + const unsigned long end = EXT3_SB(sb)->s_groups_count; + unsigned three = 1; + unsigned five = 5; + unsigned seven = 7; + unsigned grp; + __u32 *p = (__u32 *)primary->b_data; + int gdbackups = 0; + + while ((grp = ext3_list_backups(sb, &three, &five, &seven)) < end) { + if (le32_to_cpu(*p++) != grp * EXT3_BLOCKS_PER_GROUP(sb) + blk){ + ext3_warning(sb, __FUNCTION__, + "reserved GDT %ld missing grp %d (%ld)\n", + blk, grp, + grp * EXT3_BLOCKS_PER_GROUP(sb) + blk); + return -EINVAL; + } + if (++gdbackups > EXT3_ADDR_PER_BLOCK(sb)) + return -EFBIG; + } + + return gdbackups; +} + +/* + * Called when we need to bring a reserved group descriptor table block into + * use from the resize inode. The primary copy of the new GDT block currently + * is an indirect block (under the double indirect block in the resize inode). + * The new backup GDT blocks will be stored as leaf blocks in this indirect + * block, in group order. Even though we know all the block numbers we need, + * we check to ensure that the resize inode has actually reserved these blocks. + * + * Don't need to update the block bitmaps because the blocks are still in use. + * + * We get all of the error cases out of the way, so that we are sure to not + * fail once we start modifying the data on disk, because JBD has no rollback. + */ +static int add_new_gdb(handle_t *handle, struct inode *inode, + struct ext3_new_group_data *input, + struct buffer_head **primary) +{ + struct super_block *sb = inode->i_sb; + struct ext3_super_block *es = EXT3_SB(sb)->s_es; + unsigned long gdb_num = input->group / EXT3_DESC_PER_BLOCK(sb); + unsigned long gdb_off = input->group % EXT3_DESC_PER_BLOCK(sb); + unsigned long gdblock = EXT3_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num; + struct buffer_head **o_group_desc, **n_group_desc; + struct buffer_head *dind; + int gdbackups; + struct ext3_iloc iloc; + __u32 *data; + int err; + + if (test_opt(sb, DEBUG)) + printk("EXT3-fs: ext3_add_new_gdb: adding group block %lu\n", + gdb_num); + + /* + * If we are not using the primary superblock/GDT copy don't resize, + * because the user tools have no way of handling this. Probably a + * bad time to do it anyways. + */ + if (EXT3_SB(sb)->s_sbh->b_blocknr != + le32_to_cpu(EXT3_SB(sb)->s_es->s_first_data_block)) { + ext3_warning(sb, __FUNCTION__, + "won't resize using backup superblock at %lu\n", + EXT3_SB(sb)->s_sbh->b_blocknr); + return -EPERM; + } + + *primary = sb_bread(sb, gdblock); + if (!*primary) + return -EIO; + + if ((gdbackups = verify_reserved_gdb(sb, *primary)) < 0) { + err = gdbackups; + goto exit_bh; + } + + data = EXT3_I(inode)->i_data + EXT3_DIND_BLOCK; + dind = sb_bread(sb, le32_to_cpu(*data)); + if (!dind) { + err = -EIO; + goto exit_bh; + } + + data = (__u32 *)dind->b_data; + if (le32_to_cpu(data[gdb_num % EXT3_ADDR_PER_BLOCK(sb)]) != gdblock) { + ext3_warning(sb, __FUNCTION__, + "new group %u GDT block %lu not reserved\n", + input->group, gdblock); + err = -EINVAL; + goto exit_dind; + } + + if ((err = ext3_journal_get_write_access(handle, EXT3_SB(sb)->s_sbh))) + goto exit_dind; + + if ((err = ext3_journal_get_write_access(handle, *primary))) + goto exit_sbh; + + if ((err = ext3_journal_get_write_access(handle, dind))) + goto exit_primary; + + /* ext3_reserve_inode_write() gets a reference on the iloc */ + if ((err = ext3_reserve_inode_write(handle, inode, &iloc))) + goto exit_dindj; + + n_group_desc = (struct buffer_head **)kmalloc((gdb_num + 1) * + sizeof(struct buffer_head *), GFP_KERNEL); + if (!n_group_desc) { + err = -ENOMEM; + ext3_warning (sb, __FUNCTION__, + "not enough memory for %lu groups", gdb_num + 1); + goto exit_inode; + } + + /* + * Finally, we have all of the possible failures behind us... + * + * Remove new GDT block from inode double-indirect block and clear out + * the new GDT block for use (which also "frees" the backup GDT blocks + * from the reserved inode). We don't need to change the bitmaps for + * these blocks, because they are marked as in-use from being in the + * reserved inode, and will become GDT blocks (primary and backup). + */ + /* + printk("removing block %d = %ld from dindir %ld[%ld]\n", + ((__u32 *)(dind->b_data))[gdb_off], gdblock, dind->b_blocknr, + gdb_num); */ + data[gdb_off] = 0; + ext3_journal_dirty_metadata(handle, dind); + brelse(dind); + inode->i_blocks -= (gdbackups + 1) * sb->s_blocksize >> 9; + ext3_mark_iloc_dirty(handle, inode, &iloc); + memset((*primary)->b_data, 0, sb->s_blocksize); + ext3_journal_dirty_metadata(handle, *primary); + + o_group_desc = EXT3_SB(sb)->s_group_desc; + memcpy(n_group_desc, o_group_desc, + EXT3_SB(sb)->s_gdb_count * sizeof(struct buffer_head *)); + n_group_desc[gdb_num] = *primary; + EXT3_SB(sb)->s_group_desc = n_group_desc; + EXT3_SB(sb)->s_gdb_count++; + kfree(o_group_desc); + + es->s_reserved_gdt_blocks = + cpu_to_le16(le16_to_cpu(es->s_reserved_gdt_blocks) - 1); + ext3_journal_dirty_metadata(handle, EXT3_SB(sb)->s_sbh); + + return 0; + +exit_inode: + //ext3_journal_release_buffer(handle, iloc.bh); + brelse(iloc.bh); +exit_dindj: + //ext3_journal_release_buffer(handle, dind); +exit_primary: + //ext3_journal_release_buffer(handle, *primary); +exit_sbh: + //ext3_journal_release_buffer(handle, *primary); +exit_dind: + brelse(dind); +exit_bh: + brelse(*primary); + + ext3_debug("leaving with error %d\n", err); + return err; +} + +/* + * Called when we are adding a new group which has a backup copy of each of + * the GDT blocks (i.e. sparse group) and there are reserved GDT blocks. + * We need to add these reserved backup GDT blocks to the resize inode, so + * that they are kept for future resizing and not allocated to files. + * + * Each reserved backup GDT block will go into a different indirect block. + * The indirect blocks are actually the primary reserved GDT blocks, + * so we know in advance what their block numbers are. We only get the + * double-indirect block to verify it is pointing to the primary reserved + * GDT blocks so we don't overwrite a data block by accident. The reserved + * backup GDT blocks are stored in their reserved primary GDT block. + */ +static int reserve_backup_gdb(handle_t *handle, struct inode *inode, + struct ext3_new_group_data *input) +{ + struct super_block *sb = inode->i_sb; + int reserved_gdb =le16_to_cpu(EXT3_SB(sb)->s_es->s_reserved_gdt_blocks); + struct buffer_head **primary; + struct buffer_head *dind; + struct ext3_iloc iloc; + unsigned long blk; + __u32 *data, *end; + int gdbackups = 0; + int res, i; + int err; + + primary = kmalloc(reserved_gdb * sizeof(*primary), GFP_KERNEL); + if (!primary) + return -ENOMEM; + + data = EXT3_I(inode)->i_data + EXT3_DIND_BLOCK; + dind = sb_bread(sb, le32_to_cpu(*data)); + if (!dind) { + err = -EIO; + goto exit_free; + } + + blk = EXT3_SB(sb)->s_sbh->b_blocknr + 1 + EXT3_SB(sb)->s_gdb_count; + data = (__u32 *)dind->b_data + EXT3_SB(sb)->s_gdb_count; + end = (__u32 *)dind->b_data + EXT3_ADDR_PER_BLOCK(sb); + + /* Get each reserved primary GDT block and verify it holds backups */ + for (res = 0; res < reserved_gdb; res++, blk++) { + if (le32_to_cpu(*data) != blk) { + ext3_warning(sb, __FUNCTION__, + "reserved block %lu not at offset %d\n", + blk, data - (__u32 *)dind->b_data); + err = -EINVAL; + goto exit_bh; + } + primary[res] = sb_bread(sb, blk); + if (!primary[res]) { + err = -EIO; + goto exit_bh; + } + if ((gdbackups = verify_reserved_gdb(sb, primary[res])) < 0) { + brelse(primary[res]); + err = gdbackups; + goto exit_bh; + } + if (++data >= end) + data = (__u32 *)dind->b_data; + } + + for (i = 0; i < reserved_gdb; i++) { + if ((err = ext3_journal_get_write_access(handle, primary[i]))) { + /* + int j; + for (j = 0; j < i; j++) + ext3_journal_release_buffer(handle, primary[j]); + */ + goto exit_bh; + } + } + + if ((err = ext3_reserve_inode_write(handle, inode, &iloc))) + goto exit_bh; + + /* + * Finally we can add each of the reserved backup GDT blocks from + * the new group to its reserved primary GDT block. + */ + blk = input->group * EXT3_BLOCKS_PER_GROUP(sb); + for (i = 0; i < reserved_gdb; i++) { + int err2; + data = (__u32 *)primary[i]->b_data; + /* printk("reserving backup %lu[%u] = %lu\n", + primary[i]->b_blocknr, gdbackups, + blk + primary[i]->b_blocknr); */ + data[gdbackups] = cpu_to_le32(blk + primary[i]->b_blocknr); + err2 = ext3_journal_dirty_metadata(handle, primary[i]); + if (!err) + err = err2; + } + inode->i_blocks += reserved_gdb * sb->s_blocksize >> 9; + ext3_mark_iloc_dirty(handle, inode, &iloc); + +exit_bh: + while (--res >= 0) + brelse(primary[res]); + brelse(dind); + +exit_free: + kfree(primary); + + return err; +} + +/* + * Update the backup copies of the ext3 metadata. These don't need to be part + * of the main resize transaction, because e2fsck will re-write them if there + * is a problem (basically only OOM will cause a problem). However, we + * _should_ update the backups if possible, in case the primary gets trashed + * for some reason and we need to run e2fsck from a backup superblock. The + * important part is that the new block and inode counts are in the backup + * superblocks, and the location of the new group metadata in the GDT backups. + * + * We do not need lock_super() for this, because these blocks are not + * otherwise touched by the filesystem code when it is mounted. We don't + * need to worry about last changing from sbi->s_groups_count, because the + * worst that can happen is that we do not copy the full number of backups + * at this time. The resize which changed s_groups_count will backup again. + * + * We only pass inode because of the ext3 journal wrappers. + */ +static void update_backups(struct super_block *sb, struct inode *inode, + int blk_off, char *data, int size) +{ + struct ext3_sb_info *sbi = EXT3_SB(sb); + const unsigned long last = sbi->s_groups_count; + const int bpg = EXT3_BLOCKS_PER_GROUP(sb); + unsigned three = 1; + unsigned five = 5; + unsigned seven = 7; + unsigned group; + int rest = sb->s_blocksize - size; + handle_t *handle; + int err = 0, err2; + + handle = ext3_journal_start(inode, EXT3_MAX_TRANS_DATA); + if (IS_ERR(handle)) { + group = 1; + err = PTR_ERR(handle); + goto exit_err; + } + + while ((group = ext3_list_backups(sb, &three, &five, &seven)) < last) { + struct buffer_head *bh; + + /* Out of journal space, and can't get more - abort - so sad */ + if (handle->h_buffer_credits == 0 && + ext3_journal_extend(handle, EXT3_MAX_TRANS_DATA) && + (err = ext3_journal_restart(handle, EXT3_MAX_TRANS_DATA))) + break; + + bh = sb_getblk(sb, group * bpg + blk_off); + mark_buffer_uptodate(bh, 1); + ext3_debug(sb, __FUNCTION__, "update metadata backup %#04lx\n", + bh->b_blocknr); + if ((err = ext3_journal_get_write_access(handle, bh))) + break; + memcpy(bh->b_data, data, size); + if (rest) + memset(bh->b_data + size, 0, rest); + ext3_journal_dirty_metadata(handle, bh); + brelse(bh); + } + if ((err2 = ext3_journal_stop(handle, inode)) && !err) + err = err2; + + /* + * Ugh! Need to have e2fsck write the backup copies. It is too + * late to revert the resize, we shouldn't fail just because of + * the backup copies (they are only needed in case of corruption). + * + * However, if we got here we have a journal problem too, so we + * can't really start a transaction to mark the superblock. + * Chicken out and just set the flag on the hope it will be written + * to disk, and if not - we will simply wait until next fsck. + */ +exit_err: + if (err) { + ext3_warning(sb, __FUNCTION__, + "can't update backup for group %d (err %d), " + "forcing fsck on next reboot\n", group, err); + sbi->s_mount_state &= ~EXT3_VALID_FS; + sbi->s_es->s_state &= ~cpu_to_le16(EXT3_VALID_FS); + mark_buffer_dirty(sbi->s_sbh); + } +} + +/* Add group descriptor data to an existing or new group descriptor block. + * Ensure we handle all possible error conditions _before_ we start modifying + * the filesystem, because we cannot abort the transaction and not have it + * write the data to disk. + * + * If we are on a GDT block boundary, we need to get the reserved GDT block. + * Otherwise, we may need to add backup GDT blocks for a sparse group. + * + * We only need to hold the superblock lock while we are actually adding + * in the new group's counts to the superblock. Prior to that we have + * not really "added" the group at all. We re-check that we are still + * adding in the last group in case things have changed since verifying. + */ +int ext3_group_add(struct super_block *sb, struct ext3_new_group_data *input) +{ + struct ext3_sb_info *sbi = EXT3_SB(sb); + struct ext3_super_block *es = sbi->s_es; + int reserved_gdb = ext3_bg_has_super(sb, input->group) ? + le16_to_cpu(es->s_reserved_gdt_blocks) : 0; + struct buffer_head *primary = NULL; + struct ext3_group_desc *gdp; + struct inode *inode = NULL; + struct inode bogus; + handle_t *handle; + int gdb_off, gdb_num; + int err, err2; + + gdb_num = input->group / EXT3_DESC_PER_BLOCK(sb); + gdb_off = input->group % EXT3_DESC_PER_BLOCK(sb); + + if (gdb_off == 0 && !EXT3_HAS_RO_COMPAT_FEATURE(sb, + EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)) { + ext3_warning(sb, __FUNCTION__, + "Can't resize non-sparse filesystem further\n"); + return -EPERM; + } + + if (reserved_gdb || gdb_off == 0) { + if (!EXT3_HAS_COMPAT_FEATURE(sb, + EXT3_FEATURE_COMPAT_RESIZE_INODE)){ + ext3_warning(sb, __FUNCTION__, + "No reserved GDT blocks, can't resize\n"); + return -EPERM; + } + inode = iget(sb, EXT3_RESIZE_INO); + if (!inode || is_bad_inode(inode)) { + ext3_warning(sb, __FUNCTION__, + "Error opening resize inode\n"); + iput(inode); + return -ENOENT; + } + } else { + /* Used only for ext3 journal wrapper functions to get sb */ + inode = &bogus; + bogus.i_sb = sb; + } + + if ((err = verify_group_input(sb, input))) + goto exit_put; + + if ((err = setup_new_group_blocks(sb, inode, input))) + goto exit_put; + + /* + * We will always be modifying at least the superblock and a GDT + * block. If we are adding a group past the last current GDT block, + * we will also modify the inode and the dindirect block. If we + * are adding a group with superblock/GDT backups we will also + * modify each of the reserved GDT dindirect blocks. + */ + handle = ext3_journal_start(inode, ext3_bg_has_super(sb, input->group) ? + 3 + reserved_gdb : 4); + if (IS_ERR(handle)) { + err = PTR_ERR(handle); + goto exit_put; + } + + lock_super(sb); + if (input->group != EXT3_SB(sb)->s_groups_count) { + ext3_warning(sb, __FUNCTION__, + "multiple resizers run on filesystem!\n"); + err = -EBUSY; + goto exit_journal; + } + + if ((err = ext3_journal_get_write_access(handle, sbi->s_sbh))) + goto exit_journal; + + /* + * We will only either add reserved group blocks to a backup group + * or remove reserved blocks for the first group in a new group block. + * Doing both would be mean more complex code, and sane people don't + * use non-sparse filesystems anymore. This is already checked above. + */ + if (gdb_off) { + primary = sbi->s_group_desc[gdb_num]; + if ((err = ext3_journal_get_write_access(handle, primary))) + goto exit_journal; + + if (reserved_gdb && ext3_bg_num_gdb(sb, input->group) && + (err = reserve_backup_gdb(handle, inode, input))) + goto exit_journal; + } else if ((err = add_new_gdb(handle, inode, input, &primary))) + goto exit_journal; + + /* Finally update group descriptor block for new group */ + gdp = (struct ext3_group_desc *)primary->b_data + gdb_off; + + gdp->bg_block_bitmap = cpu_to_le32(input->block_bitmap); + gdp->bg_inode_bitmap = cpu_to_le32(input->inode_bitmap); + gdp->bg_inode_table = cpu_to_le32(input->inode_table); + gdp->bg_free_blocks_count = cpu_to_le16(input->free_blocks_count); + gdp->bg_free_inodes_count = cpu_to_le16(EXT3_INODES_PER_GROUP(sb)); + + EXT3_SB(sb)->s_groups_count++; + ext3_journal_dirty_metadata(handle, primary); + + /* Update superblock with new block counts */ + es->s_blocks_count = cpu_to_le32(le32_to_cpu(es->s_blocks_count) + + input->blocks_count); + es->s_free_blocks_count = + cpu_to_le32(le32_to_cpu(es->s_free_blocks_count) + + input->free_blocks_count); + es->s_r_blocks_count = cpu_to_le32(le32_to_cpu(es->s_r_blocks_count) + + input->reserved_blocks); + es->s_inodes_count = cpu_to_le32(le32_to_cpu(es->s_inodes_count) + + EXT3_INODES_PER_GROUP(sb)); + es->s_free_inodes_count = + cpu_to_le32(le32_to_cpu(es->s_free_inodes_count) + + EXT3_INODES_PER_GROUP(sb)); + ext3_journal_dirty_metadata(handle, EXT3_SB(sb)->s_sbh); + sb->s_dirt = 1; + +exit_journal: + unlock_super(sb); + handle->h_sync = 1; + if ((err2 = ext3_journal_stop(handle, inode)) && !err) + err = err2; + if (!err) { + update_backups(sb, inode, sbi->s_sbh->b_blocknr, (char *)es, + sizeof(struct ext3_super_block)); + update_backups(sb, inode, primary->b_blocknr, primary->b_data, + primary->b_size); + } +exit_put: + if (inode != &bogus) + iput(inode); + return err; +} /* ext3_group_add */ + +/* Extend the filesystem to the new number of blocks specified. This entry + * point is only used to extend the current filesystem to the end of the last + * existing group. It can be accessed via ioctl, or by "remount,resize=<size>" + * for emergencies (because it has no dependencies on reserved blocks). + * + * If we _really_ wanted, we could use default values to call ext3_group_add() + * allow the "remount" trick to work for arbitrary resizing, assuming enough + * GDT blocks are reserved to grow to the desired size. + */ +int ext3_group_extend(struct super_block *sb, struct ext3_super_block *es, + unsigned long n_blocks_count) +{ + unsigned long o_blocks_count; + unsigned long o_groups_count; + unsigned long last; + int add; + struct inode *inode; + struct buffer_head * bh; + handle_t *handle; + int err; + + o_blocks_count = le32_to_cpu(es->s_blocks_count); + o_groups_count = EXT3_SB(sb)->s_groups_count; + + if (test_opt(sb, DEBUG)) + printk("EXT3-fs: extending last group from %lu to %lu blocks\n", + o_blocks_count, n_blocks_count); + + if (n_blocks_count == 0 || n_blocks_count == o_blocks_count) + return 0; + + if (n_blocks_count < o_blocks_count) { + ext3_warning(sb, __FUNCTION__, + "can't shrink FS - resize aborted"); + return -EBUSY; + } + + /* Handle the remaining blocks in the last group only. */ + last = (o_blocks_count - le32_to_cpu(es->s_first_data_block)) % + EXT3_BLOCKS_PER_GROUP(sb); + + if (last == 0) { + ext3_warning(sb, __FUNCTION__, + "need to use ext2online to resize further\n"); + return -EPERM; + } + + add = EXT3_BLOCKS_PER_GROUP(sb) - last; + + if (o_blocks_count + add > n_blocks_count) + add = n_blocks_count - o_blocks_count; + + if (o_blocks_count + add < n_blocks_count) + ext3_warning(sb, __FUNCTION__, + "will only finish group (%lu blocks, %u new)", + o_blocks_count + add, add); + + /* See if the device is actually as big as what was requested */ + bh = sb_bread(sb, o_blocks_count + add -1); + if (!bh) { + ext3_warning(sb, __FUNCTION__, + "can't read last block, resize aborted"); + return -ENOSPC; + } + brelse(bh); + + if (!(inode = get_empty_inode())) { + ext3_warning(sb, __FUNCTION__, + "error getting dummy resize inode"); + return -ENOMEM; + } + + /* Fake out an inode to "free" the new blocks in this group. */ + inode->i_sb = sb; + inode->i_ino = 0; + EXT3_I(inode)->i_state = EXT3_STATE_RESIZE; + + /* We will update the superblock, one block bitmap, and + * one group descriptor via ext3_free_blocks(). + */ + handle = ext3_journal_start(inode, 3); + if (IS_ERR(handle)) { + err = PTR_ERR(handle); + ext3_warning(sb, __FUNCTION__, "error %d on journal start",err); + goto exit_put; + } + + lock_super(sb); + if (o_blocks_count != le32_to_cpu(es->s_blocks_count)) { + ext3_warning(sb, __FUNCTION__, + "multiple resizers run on filesystem!\n"); + goto exit_put; + } + + if ((err = ext3_journal_get_write_access(handle, + EXT3_SB(sb)->s_sbh))) { + ext3_warning(sb, __FUNCTION__, + "error %d on journal write access", err); + unlock_super(sb); + ext3_journal_stop(handle, inode); + goto exit_put; + } + es->s_blocks_count = cpu_to_le32(o_blocks_count + add); + ext3_journal_dirty_metadata(handle, EXT3_SB(sb)->s_sbh); + sb->s_dirt = 1; + unlock_super(sb); + ext3_debug("freeing blocks %ld through %ld\n", o_blocks_count, + o_blocks_count + add); + ext3_free_blocks(handle, inode, o_blocks_count, add); + ext3_debug("freed blocks %ld through %ld\n", o_blocks_count, + o_blocks_count + add); + if ((err = ext3_journal_stop(handle, inode))) + goto exit_put; + if (test_opt(sb, DEBUG)) + printk("EXT3-fs: extended group to %u blocks\n", + le32_to_cpu(es->s_blocks_count)); + update_backups(sb, inode, EXT3_SB(sb)->s_sbh->b_blocknr, (char *)es, + sizeof(struct ext3_super_block)); +exit_put: + iput(inode); + + return err; +} /* ext3_group_extend */ diff -rNu linux-2.4.18-orig/fs/ext3/super.c linux-2.4.18/fs/ext3/super.c --- linux-2.4.18-orig/fs/ext3/super.c Mon Feb 25 11:38:08 2002 +++ linux-2.4.18/fs/ext3/super.c Tue Sep 10 11:18:06 2002 @@ -496,6 +496,7 @@ static int parse_options (char * options, unsigned long * sb_block, struct ext3_sb_info *sbi, unsigned long * inum, + unsigned long *n_blocks_count, int is_remount) { unsigned long *mount_options = &sbi->s_mount_opt; @@ -566,6 +567,27 @@ else if (!strcmp (this_char, "nogrpid") || !strcmp (this_char, "sysvgroups")) clear_opt (*mount_options, GRPID); +#ifdef CONFIG_EXT3_RESIZE + else if (!strcmp(this_char, "resize")) { + printk("EXT3-fs: parse_options: resize=%s\n", value); + if (!n_blocks_count) { + printk("EXT3-fs: resize option only available " + "for remount\n"); + return 0; + } + if (!value || !*value) { + printk("EXT3-fs: resize requires number of " + "blocks\n"); + return 0; + } + *n_blocks_count = simple_strtoul(value, &value, 0); + if (*value) { + printk("EXT3-fs: invalid resize option: %s\n", + value); + return 0; + } + } +#endif /* CONFIG_EXT3_RESIZE */ else if (!strcmp (this_char, "resgid")) { unsigned long v; if (want_numeric(value, "resgid", &v)) @@ -921,7 +943,8 @@ sbi->s_mount_opt = 0; sbi->s_resuid = EXT3_DEF_RESUID; sbi->s_resgid = EXT3_DEF_RESGID; - if (!parse_options ((char *) data, &sb_block, sbi, &journal_inum, 0)) { + if (!parse_options ((char *) data, &sb_block, sbi, &journal_inum, + NULL, 0)) { sb->s_dev = 0; goto out_fail; } @@ -1621,6 +1644,7 @@ { struct ext3_super_block * es; struct ext3_sb_info *sbi = EXT3_SB(sb); + unsigned long n_blocks_count = 0; unsigned long tmp; clear_ro_after(sb); @@ -1628,7 +1652,7 @@ /* * Allow the "check" option to be passed as a remount option. */ - if (!parse_options(data, &tmp, sbi, &tmp, 1)) + if (!parse_options(data, &tmp, sbi, &tmp, &n_blocks_count, 1)) return -EINVAL; if (sbi->s_mount_opt & EXT3_MOUNT_ABORT) @@ -1636,7 +1660,8 @@ es = sbi->s_es; - if ((*flags & MS_RDONLY) != (sb->s_flags & MS_RDONLY)) { + if ((*flags & MS_RDONLY) != (sb->s_flags & MS_RDONLY) || + n_blocks_count > le32_to_cpu(es->s_blocks_count)) { if (sbi->s_mount_opt & EXT3_MOUNT_ABORT) return -EROFS; @@ -1675,6 +1700,8 @@ */ ext3_clear_journal_err(sb, es); sbi->s_mount_state = le16_to_cpu(es->s_state); + if ((ret = ext3_group_extend(sb, es, n_blocks_count))) + return ret; if (!ext3_setup_super (sb, es, 0)) sb->s_flags &= ~MS_RDONLY; } diff -rNu linux-2.4.18-orig/include/linux/ext3_fs.h linux-2.4.18/include/linux/ext3_fs.h --- linux-2.4.18-orig/include/linux/ext3_fs.h Mon Feb 25 11:38:13 2002 +++ linux-2.4.18/include/linux/ext3_fs.h Tue Sep 10 11:18:06 2002 @@ -213,20 +213,50 @@ */ #define EXT3_STATE_JDATA 0x00000001 /* journaled data exists */ #define EXT3_STATE_NEW 0x00000002 /* inode is newly created */ +#define EXT3_STATE_RESIZE 0x00000004 /* fake inode for resizing */ /* * ioctl commands */ -#define EXT3_IOC_GETFLAGS _IOR('f', 1, long) -#define EXT3_IOC_SETFLAGS _IOW('f', 2, long) -#define EXT3_IOC_GETVERSION _IOR('f', 3, long) -#define EXT3_IOC_SETVERSION _IOW('f', 4, long) -#define EXT3_IOC_GETVERSION_OLD _IOR('v', 1, long) -#define EXT3_IOC_SETVERSION_OLD _IOW('v', 2, long) + +/* Used to pass group descriptor data when online resize is done */ +struct ext3_new_group_input { + __u32 group; /* Group number for this data */ + __u32 block_bitmap; /* Absolute block number of block bitmap */ + __u32 inode_bitmap; /* Absolute block number of inode bitmap */ + __u32 inode_table; /* Absolute block number of inode table start */ + __u32 blocks_count; /* Total number of blocks in this group */ + __u16 reserved_blocks; /* Number of reserved blocks in this group */ + __u16 unused; +}; + +/* The struct ext3_new_group_input in kernel space, with free_blocks_count */ +struct ext3_new_group_data { + __u32 group; + __u32 block_bitmap; + __u32 inode_bitmap; + __u32 inode_table; + __u32 blocks_count; + __u16 reserved_blocks; + __u16 unused; + __u32 free_blocks_count; +}; + +#define EXT3_IOC_GETFLAGS _IOR('f', 1, long) +#define EXT3_IOC_SETFLAGS _IOW('f', 2, long) +#define EXT3_IOC_GETVERSION_NEW _IOR('f', 3, long) +#define EXT3_IOC_SETVERSION_NEW _IOW('f', 4, long) +#define EXT3_IOC_GROUP_EXTEND _IOW('f', 7, unsigned long) +#define EXT3_IOC_GROUP_ADD _IOW('f', 8,struct ext3_new_group_input) +#define EXT3_IOC_GETVERSION_OLD _IOR('v', 1, long) +#define EXT3_IOC_SETVERSION_OLD _IOW('v', 2, long) #ifdef CONFIG_JBD_DEBUG #define EXT3_IOC_WAIT_FOR_READONLY _IOR('f', 99, long) #endif +#define EXT3_IOC_SETVERSION EXT3_IOC_SETVERSION_NEW +#define EXT3_IOC_GETVERSION EXT3_IOC_GETVERSION_NEW + /* * Structure of an inode on the disk */ @@ -429,7 +459,7 @@ */ __u8 s_prealloc_blocks; /* Nr of blocks to try to preallocate*/ __u8 s_prealloc_dir_blocks; /* Nr to preallocate for dirs */ - __u16 s_padding1; + __u16 s_reserved_gdt_blocks; /* Per group desc for online growth */ /* * Journaling support valid if EXT3_FEATURE_COMPAT_HAS_JOURNAL set. */ @@ -651,6 +681,17 @@ extern int ext3_orphan_add(handle_t *, struct inode *); extern int ext3_orphan_del(handle_t *, struct inode *); +/* resize.c */ +#ifdef CONFIG_EXT3_RESIZE +extern int ext3_group_add(struct super_block *sb, + struct ext3_new_group_data *input); +extern int ext3_group_extend(struct super_block *sb, + struct ext3_super_block *es, + unsigned long n_blocks_count); +#else +#define ext3_group_extend(sb, es, n_blocks_count) 0 +#endif + /* super.c */ extern void ext3_error (struct super_block *, const char *, const char *, ...) __attribute__ ((format (printf, 3, 4))); -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ |