From: Randy R. <rmr...@us...> - 2008-03-31 09:44:16
|
My apologies if this has been covered already, as I am new to this list. I'm writing a filesystem with the following goals: merge multiple remote and local partitions/volumes into a single virtual volume, such that, for example, on every computer I use, I can access files whether they be local or remote transparently. Thus instead of having 3-4 different "bin" directories on each computer I use, I have one- /virt/bin, and if a file, such as /virt/bin/myscript happens to live on a different computer, then reads/writes get transparently redirected to an NFS mount containing the real file. For example, /virt/bin/myscript is on host 'conroe' and is redirected to /.virtfs/conroe/bin/myscript, whereas /virt/bin/otherscript is on host 'psion' and is redirected to /.virtfs/psion/bin/otherscript. Meanwhile a separate agent does nice things in the background, such as automatically handling disconnected operation and reconnection, transparently maintaining replicas in any free disk space, following simple rules and heuristics to place files for optimum performance, etc. For example, imagine you have different speeds of storage- a laptop with 4200RPM drive, desktop with 7200RPM, and a super fast (but tiny) SSD or RAMFS. You might have rules which tell the system that you want all .c and .h files (for compilation) to be placed on the fastest available storage, while .VOB files can go on the slowest. One problem I am facing is that VFS seems to send down all writes in 4096byte chunks which kills performance, it looks like this issue has already been discussed some (http://kerneltrap.org/node/6739) was this ever put into mainline? Or are there patches I can make to either fuse or linux itself to allow larger writes? In reality, though, since all I'm really doing is resolving paths, I'd rather not have the data path flow through userspace at all. Is there a mechanism by which one can intercept file access requests (presumably through fuse) handle the resolving of these requests in userspace, and then return the kernel with a path to the "real" file, and have further access happen directly? Thanks, Randy Robertson |
From: Franco B. <fr...@bo...> - 2008-03-31 12:52:13
|
Not sure about the 4096 byte writes but the rest of what you have described is perfectly doable with FUSE. If you find a way to do the path dereferencing, I'd love to know, the best I could come up with is a call to getxattr to return the true pathname. Works great for our own software but obviously doesn't do a thing for other utilities and applications. On Mon, 2008-03-31 at 02:44 -0700, Randy Robertson wrote: > My apologies if this has been covered already, as I am new to this list. > > I'm writing a filesystem with the following goals: merge multiple > remote and local > partitions/volumes into a single virtual volume, such that, for > example, on every computer I use, I can access files whether they be > local or remote transparently. > > Thus instead of having 3-4 different "bin" directories on each > computer I use, I have one- /virt/bin, and if a file, such as > /virt/bin/myscript happens to live on a different computer, then > reads/writes get transparently redirected to an NFS mount containing > the real file. > > For example, /virt/bin/myscript is on host 'conroe' and is redirected > to /.virtfs/conroe/bin/myscript, whereas /virt/bin/otherscript is on > host 'psion' and is redirected to /.virtfs/psion/bin/otherscript. > > Meanwhile a separate agent does nice things in the background, such as > automatically handling disconnected operation and reconnection, > transparently maintaining replicas in any free disk space, following > simple rules and heuristics to place files for optimum performance, > etc. > > For example, imagine you have different speeds of storage- a laptop > with 4200RPM drive, desktop with 7200RPM, and a super fast (but tiny) > SSD or RAMFS. You might have rules which tell the system that you > want all .c and .h files (for compilation) to be placed on the fastest > available storage, while .VOB files can go on the slowest. > > One problem I am facing is that VFS seems to send down all writes in > 4096byte chunks which kills performance, it looks like this issue has > already been discussed some (http://kerneltrap.org/node/6739) was this > ever put into mainline? Or are there patches I can make to either > fuse or linux itself to allow larger writes? > > In reality, though, since all I'm really doing is resolving paths, I'd > rather not have the data path flow through userspace at all. Is there > a mechanism by which one can intercept file access requests > (presumably through fuse) handle the resolving of these requests in > userspace, and then return the kernel with a path to the "real" file, > and have further access happen directly? > > Thanks, > Randy Robertson > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > |
From: Miklos S. <mi...@sz...> - 2008-03-31 13:18:38
|
> One problem I am facing is that VFS seems to send down all writes in > 4096byte chunks which kills performance, it looks like this issue has > already been discussed some (http://kerneltrap.org/node/6739) was this > ever put into mainline? It's now in -mm (not released yet). Will hopefully be in linux-2.6.26, but that's some months away. > Or are there patches I can make to either fuse or linux itself to > allow larger writes? Here's a big patch against 2.6.25-rc7, that supports >4k writes and writable mmaps. It's very well tested, and shouldn't cause problems. In the unlikely case, that it does cause problems, I'd very much like to hear about it :) > In reality, though, since all I'm really doing is resolving paths, I'd > rather not have the data path flow through userspace at all. Is there > a mechanism by which one can intercept file access requests > (presumably through fuse) handle the resolving of these requests in > userspace, and then return the kernel with a path to the "real" file, > and have further access happen directly? Have you thought of using symlinks? Those would do *exactly* what you want: the fuse filesystem just tells the kernel where to find the real file, and the kernel does the rest. Or is it a requirement that userspace must not see where the files are actually coming from? We could introduce some sort of "hidden symlink" into fuse that acts just the same way as a real symlink, but is always followed, regardless of whether userspace asks for that or not. Miklos Index: linux-2.6.25-rc7/Documentation/ABI/testing/sysfs-class-bdi =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.25-rc7/Documentation/ABI/testing/sysfs-class-bdi 2008-03-31 14:50:42.000000000 +0200 @@ -0,0 +1,46 @@ +What: /sys/class/bdi/<bdi>/ +Date: January 2008 +Contact: Peter Zijlstra <a.p...@ch...> +Description: + +Provide a place in sysfs for the backing_dev_info object. This allows +setting and retrieving various BDI specific variables. + +The <bdi> identifier can be either of the following: + +MAJOR:MINOR + + Device number for block devices, or value of st_dev on + non-block filesystems which provide their own BDI, such as NFS + and FUSE. + +default + + The default backing dev, used for non-block device backed + filesystems which do not provide their own BDI. + +Files under /sys/class/bdi/<bdi>/ +--------------------------------- + +read_ahead_kb (read-write) + + Size of the read-ahead window in kilobytes + +min_ratio (read-write) + + Under normal circumstances each device is given a part of the + total write-back cache that relates to its current average + writeout speed in relation to the other devices. + + The 'min_ratio' parameter allows assigning a minimum + percentage of the write-back cache to a particular device. + For example, this is useful for providing a minimum QoS. + +max_ratio (read-write) + + Allows limiting a particular device to use not more than the + given percentage of the write-back cache. This is useful in + situations where we want to avoid one device taking all or + most of the write-back cache. For example in case of an NFS + mount that is prone to get stuck, or a FUSE mount which cannot + be trusted to play fair. Index: linux-2.6.25-rc7/block/genhd.c =================================================================== --- linux-2.6.25-rc7.orig/block/genhd.c 2008-03-31 14:50:39.000000000 +0200 +++ linux-2.6.25-rc7/block/genhd.c 2008-03-31 14:50:42.000000000 +0200 @@ -182,11 +182,17 @@ static int exact_lock(dev_t devt, void * */ void add_disk(struct gendisk *disk) { + struct backing_dev_info *bdi; + disk->flags |= GENHD_FL_UP; blk_register_region(MKDEV(disk->major, disk->first_minor), disk->minors, NULL, exact_match, exact_lock, disk); register_disk(disk); blk_register_queue(disk); + + bdi = &disk->queue->backing_dev_info; + bdi_register_dev(bdi, MKDEV(disk->major, disk->first_minor)); + sysfs_create_link(&disk->dev.kobj, &bdi->dev->kobj, "bdi"); } EXPORT_SYMBOL(add_disk); @@ -194,6 +200,8 @@ EXPORT_SYMBOL(del_gendisk); /* in partit void unlink_gendisk(struct gendisk *disk) { + sysfs_remove_link(&disk->dev.kobj, "bdi"); + bdi_unregister(&disk->queue->backing_dev_info); blk_unregister_queue(disk); blk_unregister_region(MKDEV(disk->major, disk->first_minor), disk->minors); Index: linux-2.6.25-rc7/include/linux/backing-dev.h =================================================================== --- linux-2.6.25-rc7.orig/include/linux/backing-dev.h 2008-03-31 14:50:39.000000000 +0200 +++ linux-2.6.25-rc7/include/linux/backing-dev.h 2008-03-31 14:50:42.000000000 +0200 @@ -11,9 +11,13 @@ #include <linux/percpu_counter.h> #include <linux/log2.h> #include <linux/proportions.h> +#include <linux/kernel.h> +#include <linux/fs.h> #include <asm/atomic.h> struct page; +struct device; +struct dentry; /* * Bits in backing_dev_info.state @@ -48,11 +52,26 @@ struct backing_dev_info { struct prop_local_percpu completions; int dirty_exceeded; + + unsigned int min_ratio; + unsigned int max_ratio, max_prop_frac; + + struct device *dev; + +#ifdef CONFIG_DEBUG_FS + struct dentry *debug_dir; + struct dentry *debug_stats; +#endif }; int bdi_init(struct backing_dev_info *bdi); void bdi_destroy(struct backing_dev_info *bdi); +int bdi_register(struct backing_dev_info *bdi, struct device *parent, + const char *fmt, ...); +int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev); +void bdi_unregister(struct backing_dev_info *bdi); + static inline void __add_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item, s64 amount) { @@ -116,6 +135,8 @@ static inline s64 bdi_stat_sum(struct ba return sum; } +extern void bdi_writeout_inc(struct backing_dev_info *bdi); + /* * maximal error of a stat counter. */ @@ -128,24 +149,48 @@ static inline unsigned long bdi_stat_err #endif } +int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio); +int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio); + /* * Flags in backing_dev_info::capability - * - The first two flags control whether dirty pages will contribute to the - * VM's accounting and whether writepages() should be called for dirty pages - * (something that would not, for example, be appropriate for ramfs) - * - These flags let !MMU mmap() govern direct device mapping vs immediate - * copying more easily for MAP_PRIVATE, especially for ROM filesystems + * + * The first three flags control whether dirty pages will contribute to the + * VM's accounting and whether writepages() should be called for dirty pages + * (something that would not, for example, be appropriate for ramfs) + * + * WARNING: these flags are closely related and should not normally be + * used separately. The BDI_CAP_NO_ACCT_AND_WRITEBACK combines these + * three flags into a single convenience macro. + * + * BDI_CAP_NO_ACCT_DIRTY: Dirty pages shouldn't contribute to accounting + * BDI_CAP_NO_WRITEBACK: Don't write pages back + * BDI_CAP_NO_ACCT_WB: Don't automatically account writeback pages + * + * These flags let !MMU mmap() govern direct device mapping vs immediate + * copying more easily for MAP_PRIVATE, especially for ROM filesystems. + * + * BDI_CAP_MAP_COPY: Copy can be mapped (MAP_PRIVATE) + * BDI_CAP_MAP_DIRECT: Can be mapped directly (MAP_SHARED) + * BDI_CAP_READ_MAP: Can be mapped for reading + * BDI_CAP_WRITE_MAP: Can be mapped for writing + * BDI_CAP_EXEC_MAP: Can be mapped for execution */ -#define BDI_CAP_NO_ACCT_DIRTY 0x00000001 /* Dirty pages shouldn't contribute to accounting */ -#define BDI_CAP_NO_WRITEBACK 0x00000002 /* Don't write pages back */ -#define BDI_CAP_MAP_COPY 0x00000004 /* Copy can be mapped (MAP_PRIVATE) */ -#define BDI_CAP_MAP_DIRECT 0x00000008 /* Can be mapped directly (MAP_SHARED) */ -#define BDI_CAP_READ_MAP 0x00000010 /* Can be mapped for reading */ -#define BDI_CAP_WRITE_MAP 0x00000020 /* Can be mapped for writing */ -#define BDI_CAP_EXEC_MAP 0x00000040 /* Can be mapped for execution */ +#define BDI_CAP_NO_ACCT_DIRTY 0x00000001 +#define BDI_CAP_NO_WRITEBACK 0x00000002 +#define BDI_CAP_MAP_COPY 0x00000004 +#define BDI_CAP_MAP_DIRECT 0x00000008 +#define BDI_CAP_READ_MAP 0x00000010 +#define BDI_CAP_WRITE_MAP 0x00000020 +#define BDI_CAP_EXEC_MAP 0x00000040 +#define BDI_CAP_NO_ACCT_WB 0x00000080 + #define BDI_CAP_VMFLAGS \ (BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP) +#define BDI_CAP_NO_ACCT_AND_WRITEBACK \ + (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_ACCT_WB) + #if defined(VM_MAYREAD) && \ (BDI_CAP_READ_MAP != VM_MAYREAD || \ BDI_CAP_WRITE_MAP != VM_MAYWRITE || \ @@ -187,17 +232,32 @@ void clear_bdi_congested(struct backing_ void set_bdi_congested(struct backing_dev_info *bdi, int rw); long congestion_wait(int rw, long timeout); -#define bdi_cap_writeback_dirty(bdi) \ - (!((bdi)->capabilities & BDI_CAP_NO_WRITEBACK)) -#define bdi_cap_account_dirty(bdi) \ - (!((bdi)->capabilities & BDI_CAP_NO_ACCT_DIRTY)) +static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi) +{ + return !(bdi->capabilities & BDI_CAP_NO_WRITEBACK); +} + +static inline bool bdi_cap_account_dirty(struct backing_dev_info *bdi) +{ + return !(bdi->capabilities & BDI_CAP_NO_ACCT_DIRTY); +} -#define mapping_cap_writeback_dirty(mapping) \ - bdi_cap_writeback_dirty((mapping)->backing_dev_info) +static inline bool bdi_cap_account_writeback(struct backing_dev_info *bdi) +{ + /* Paranoia: BDI_CAP_NO_WRITEBACK implies BDI_CAP_NO_ACCT_WB */ + return !(bdi->capabilities & (BDI_CAP_NO_ACCT_WB | + BDI_CAP_NO_WRITEBACK)); +} -#define mapping_cap_account_dirty(mapping) \ - bdi_cap_account_dirty((mapping)->backing_dev_info) +static inline bool mapping_cap_writeback_dirty(struct address_space *mapping) +{ + return bdi_cap_writeback_dirty(mapping->backing_dev_info); +} +static inline bool mapping_cap_account_dirty(struct address_space *mapping) +{ + return bdi_cap_account_dirty(mapping->backing_dev_info); +} #endif /* _LINUX_BACKING_DEV_H */ Index: linux-2.6.25-rc7/include/linux/writeback.h =================================================================== --- linux-2.6.25-rc7.orig/include/linux/writeback.h 2008-03-31 14:50:39.000000000 +0200 +++ linux-2.6.25-rc7/include/linux/writeback.h 2008-03-31 14:50:42.000000000 +0200 @@ -114,6 +114,9 @@ struct file; int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, + struct backing_dev_info *bdi); + void page_writeback_init(void); void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, unsigned long nr_pages_dirtied); Index: linux-2.6.25-rc7/lib/percpu_counter.c =================================================================== --- linux-2.6.25-rc7.orig/lib/percpu_counter.c 2008-03-31 14:50:39.000000000 +0200 +++ linux-2.6.25-rc7/lib/percpu_counter.c 2008-03-31 14:50:42.000000000 +0200 @@ -102,6 +102,7 @@ void percpu_counter_destroy(struct percp return; free_percpu(fbc->counters); + fbc->counters = NULL; #ifdef CONFIG_HOTPLUG_CPU mutex_lock(&percpu_counters_lock); list_del(&fbc->list); Index: linux-2.6.25-rc7/mm/backing-dev.c =================================================================== --- linux-2.6.25-rc7.orig/mm/backing-dev.c 2008-03-31 14:50:39.000000000 +0200 +++ linux-2.6.25-rc7/mm/backing-dev.c 2008-03-31 14:50:42.000000000 +0200 @@ -4,12 +4,229 @@ #include <linux/fs.h> #include <linux/sched.h> #include <linux/module.h> +#include <linux/writeback.h> +#include <linux/device.h> + + +static struct class *bdi_class; + +#ifdef CONFIG_DEBUG_FS +#include <linux/debugfs.h> +#include <linux/seq_file.h> + +static struct dentry *bdi_debug_root; + +static void bdi_debug_init(void) +{ + bdi_debug_root = debugfs_create_dir("bdi", NULL); +} + +static int bdi_debug_stats_show(struct seq_file *m, void *v) +{ + struct backing_dev_info *bdi = m->private; + long background_thresh; + long dirty_thresh; + long bdi_thresh; + + get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi); + +#define K(x) ((x) << (PAGE_SHIFT - 10)) + seq_printf(m, + "BdiWriteback: %8lu kB\n" + "BdiReclaimable: %8lu kB\n" + "BdiDirtyThresh: %8lu kB\n" + "DirtyThresh: %8lu kB\n" + "BackgroundThresh: %8lu kB\n", + (unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)), + (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)), + K(bdi_thresh), + K(dirty_thresh), + K(background_thresh)); +#undef K + + return 0; +} + +static int bdi_debug_stats_open(struct inode *inode, struct file *file) +{ + return single_open(file, bdi_debug_stats_show, inode->i_private); +} + +static const struct file_operations bdi_debug_stats_fops = { + .open = bdi_debug_stats_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static void bdi_debug_register(struct backing_dev_info *bdi, const char *name) +{ + bdi->debug_dir = debugfs_create_dir(name, bdi_debug_root); + bdi->debug_stats = debugfs_create_file("stats", 0444, bdi->debug_dir, + bdi, &bdi_debug_stats_fops); +} + +static void bdi_debug_unregister(struct backing_dev_info *bdi) +{ + debugfs_remove(bdi->debug_stats); + debugfs_remove(bdi->debug_dir); +} +#else +static inline void bdi_debug_init(void) +{ +} +static inline void bdi_debug_register(struct backing_dev_info *bdi, + const char *name) +{ +} +static inline void bdi_debug_unregister(struct backing_dev_info *bdi) +{ +} +#endif + +static ssize_t read_ahead_kb_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + char *end; + unsigned long read_ahead_kb; + ssize_t ret = -EINVAL; + + read_ahead_kb = simple_strtoul(buf, &end, 10); + if (*buf && (end[0] == '\0' || (end[0] == '\n' && end[1] == '\0'))) { + bdi->ra_pages = read_ahead_kb >> (PAGE_SHIFT - 10); + ret = count; + } + return ret; +} + +#define K(pages) ((pages) << (PAGE_SHIFT - 10)) + +#define BDI_SHOW(name, expr) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, char *page) \ +{ \ + struct backing_dev_info *bdi = dev_get_drvdata(dev); \ + \ + return snprintf(page, PAGE_SIZE-1, "%lld\n", (long long)expr); \ +} + +BDI_SHOW(read_ahead_kb, K(bdi->ra_pages)) + +static ssize_t min_ratio_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t count) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + char *end; + unsigned int ratio; + ssize_t ret = -EINVAL; + + ratio = simple_strtoul(buf, &end, 10); + if (*buf && (end[0] == '\0' || (end[0] == '\n' && end[1] == '\0'))) { + ret = bdi_set_min_ratio(bdi, ratio); + if (!ret) + ret = count; + } + return ret; +} +BDI_SHOW(min_ratio, bdi->min_ratio) + +static ssize_t max_ratio_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t count) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + char *end; + unsigned int ratio; + ssize_t ret = -EINVAL; + + ratio = simple_strtoul(buf, &end, 10); + if (*buf && (end[0] == '\0' || (end[0] == '\n' && end[1] == '\0'))) { + ret = bdi_set_max_ratio(bdi, ratio); + if (!ret) + ret = count; + } + return ret; +} +BDI_SHOW(max_ratio, bdi->max_ratio) + +#define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store) + +static struct device_attribute bdi_dev_attrs[] = { + __ATTR_RW(read_ahead_kb), + __ATTR_RW(min_ratio), + __ATTR_RW(max_ratio), + __ATTR_NULL, +}; + +static __init int bdi_class_init(void) +{ + bdi_class = class_create(THIS_MODULE, "bdi"); + bdi_class->dev_attrs = bdi_dev_attrs; + bdi_debug_init(); + return 0; +} + +postcore_initcall(bdi_class_init); + +int bdi_register(struct backing_dev_info *bdi, struct device *parent, + const char *fmt, ...) +{ + char *name; + va_list args; + int ret = 0; + struct device *dev; + + va_start(args, fmt); + name = kvasprintf(GFP_KERNEL, fmt, args); + va_end(args); + + if (!name) + return -ENOMEM; + + dev = device_create(bdi_class, parent, MKDEV(0, 0), name); + if (IS_ERR(dev)) { + ret = PTR_ERR(dev); + goto exit; + } + + bdi->dev = dev; + dev_set_drvdata(bdi->dev, bdi); + bdi_debug_register(bdi, name); + +exit: + kfree(name); + return ret; +} +EXPORT_SYMBOL(bdi_register); + +int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev) +{ + return bdi_register(bdi, NULL, "%u:%u", MAJOR(dev), MINOR(dev)); +} +EXPORT_SYMBOL(bdi_register_dev); + +void bdi_unregister(struct backing_dev_info *bdi) +{ + if (bdi->dev) { + bdi_debug_unregister(bdi); + device_unregister(bdi->dev); + bdi->dev = NULL; + } +} +EXPORT_SYMBOL(bdi_unregister); int bdi_init(struct backing_dev_info *bdi) { int i; int err; + bdi->dev = NULL; + + bdi->min_ratio = 0; + bdi->max_ratio = 100; + bdi->max_prop_frac = PROP_FRAC_BASE; + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0); if (err) @@ -33,6 +250,8 @@ void bdi_destroy(struct backing_dev_info { int i; + bdi_unregister(bdi); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); Index: linux-2.6.25-rc7/mm/page-writeback.c =================================================================== --- linux-2.6.25-rc7.orig/mm/page-writeback.c 2008-03-31 14:50:39.000000000 +0200 +++ linux-2.6.25-rc7/mm/page-writeback.c 2008-03-31 14:50:42.000000000 +0200 @@ -164,9 +164,20 @@ int dirty_ratio_handler(struct ctl_table */ static inline void __bdi_writeout_inc(struct backing_dev_info *bdi) { - __prop_inc_percpu(&vm_completions, &bdi->completions); + __prop_inc_percpu_max(&vm_completions, &bdi->completions, + bdi->max_prop_frac); } +void bdi_writeout_inc(struct backing_dev_info *bdi) +{ + unsigned long flags; + + local_irq_save(flags); + __bdi_writeout_inc(bdi); + local_irq_restore(flags); +} +EXPORT_SYMBOL_GPL(bdi_writeout_inc); + static inline void task_dirty_inc(struct task_struct *tsk) { prop_inc_single(&vm_dirties, &tsk->dirties); @@ -200,7 +211,8 @@ clip_bdi_dirty_limit(struct backing_dev_ avail_dirty = dirty - (global_page_state(NR_FILE_DIRTY) + global_page_state(NR_WRITEBACK) + - global_page_state(NR_UNSTABLE_NFS)); + global_page_state(NR_UNSTABLE_NFS) + + global_page_state(NR_WRITEBACK_TEMP)); if (avail_dirty < 0) avail_dirty = 0; @@ -243,6 +255,55 @@ static void task_dirty_limit(struct task } /* + * + */ +static DEFINE_SPINLOCK(bdi_lock); +static unsigned int bdi_min_ratio; + +int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio) +{ + int ret = 0; + unsigned long flags; + + spin_lock_irqsave(&bdi_lock, flags); + if (min_ratio > bdi->max_ratio) { + ret = -EINVAL; + } else { + min_ratio -= bdi->min_ratio; + if (bdi_min_ratio + min_ratio < 100) { + bdi_min_ratio += min_ratio; + bdi->min_ratio += min_ratio; + } else { + ret = -EINVAL; + } + } + spin_unlock_irqrestore(&bdi_lock, flags); + + return ret; +} + +int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned max_ratio) +{ + unsigned long flags; + int ret = 0; + + if (max_ratio > 100) + return -EINVAL; + + spin_lock_irqsave(&bdi_lock, flags); + if (bdi->min_ratio > max_ratio) { + ret = -EINVAL; + } else { + bdi->max_ratio = max_ratio; + bdi->max_prop_frac = (PROP_FRAC_BASE * max_ratio) / 100; + } + spin_unlock_irqrestore(&bdi_lock, flags); + + return ret; +} +EXPORT_SYMBOL(bdi_set_max_ratio); + +/* * Work out the current dirty-memory clamping and background writeout * thresholds. * @@ -300,7 +361,7 @@ static unsigned long determine_dirtyable return x + 1; /* Ensure that we never return 0 */ } -static void +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, struct backing_dev_info *bdi) { @@ -330,7 +391,7 @@ get_dirty_limits(long *pbackground, long *pdirty = dirty; if (bdi) { - u64 bdi_dirty = dirty; + u64 bdi_dirty; long numerator, denominator; /* @@ -338,8 +399,12 @@ get_dirty_limits(long *pbackground, long */ bdi_writeout_fraction(bdi, &numerator, &denominator); + bdi_dirty = (dirty * (100 - bdi_min_ratio)) / 100; bdi_dirty *= numerator; do_div(bdi_dirty, denominator); + bdi_dirty += (dirty * bdi->min_ratio) / 100; + if (bdi_dirty > (dirty * bdi->max_ratio) / 100) + bdi_dirty = dirty * bdi->max_ratio / 100; *pbdi_dirty = bdi_dirty; clip_bdi_dirty_limit(bdi, dirty, pbdi_dirty); @@ -1192,7 +1257,7 @@ int test_clear_page_writeback(struct pag radix_tree_tag_clear(&mapping->page_tree, page_index(page), PAGECACHE_TAG_WRITEBACK); - if (bdi_cap_writeback_dirty(bdi)) { + if (bdi_cap_account_writeback(bdi)) { __dec_bdi_stat(bdi, BDI_WRITEBACK); __bdi_writeout_inc(bdi); } @@ -1221,7 +1286,7 @@ int test_set_page_writeback(struct page radix_tree_tag_set(&mapping->page_tree, page_index(page), PAGECACHE_TAG_WRITEBACK); - if (bdi_cap_writeback_dirty(bdi)) + if (bdi_cap_account_writeback(bdi)) __inc_bdi_stat(bdi, BDI_WRITEBACK); } if (!PageDirty(page)) Index: linux-2.6.25-rc7/mm/readahead.c =================================================================== --- linux-2.6.25-rc7.orig/mm/readahead.c 2008-03-31 14:50:39.000000000 +0200 +++ linux-2.6.25-rc7/mm/readahead.c 2008-03-31 14:50:42.000000000 +0200 @@ -235,7 +235,13 @@ unsigned long max_sane_readahead(unsigne static int __init readahead_init(void) { - return bdi_init(&default_backing_dev_info); + int err; + + err = bdi_init(&default_backing_dev_info); + if (!err) + bdi_register(&default_backing_dev_info, NULL, "default"); + + return err; } subsys_initcall(readahead_init); Index: linux-2.6.25-rc7/fs/nfs/super.c =================================================================== --- linux-2.6.25-rc7.orig/fs/nfs/super.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/nfs/super.c 2008-03-31 14:50:42.000000000 +0200 @@ -1507,6 +1507,11 @@ static int nfs_compare_super(struct supe return nfs_compare_mount_options(sb, server, mntflags); } +static int nfs_bdi_register(struct nfs_server *server) +{ + return bdi_register_dev(&server->backing_dev_info, server->s_dev); +} + static int nfs_get_sb(struct file_system_type *fs_type, int flags, const char *dev_name, void *raw_data, struct vfsmount *mnt) { @@ -1549,6 +1554,10 @@ static int nfs_get_sb(struct file_system if (s->s_fs_info != server) { nfs_free_server(server); server = NULL; + } else { + error = nfs_bdi_register(server); + if (error) + goto error_splat_super; } if (!s->s_root) { @@ -1596,6 +1605,7 @@ static void nfs_kill_super(struct super_ { struct nfs_server *server = NFS_SB(s); + bdi_unregister(&server->backing_dev_info); kill_anon_super(s); nfs_free_server(server); } @@ -1640,6 +1650,10 @@ static int nfs_xdev_get_sb(struct file_s if (s->s_fs_info != server) { nfs_free_server(server); server = NULL; + } else { + error = nfs_bdi_register(server); + if (error) + goto error_splat_super; } if (!s->s_root) { @@ -1935,6 +1949,10 @@ static int nfs4_get_sb(struct file_syste if (s->s_fs_info != server) { nfs_free_server(server); server = NULL; + } else { + error = nfs_bdi_register(server); + if (error) + goto error_splat_super; } if (!s->s_root) { @@ -2021,6 +2039,10 @@ static int nfs4_xdev_get_sb(struct file_ if (s->s_fs_info != server) { nfs_free_server(server); server = NULL; + } else { + error = nfs_bdi_register(server); + if (error) + goto error_splat_super; } if (!s->s_root) { @@ -2100,6 +2122,10 @@ static int nfs4_referral_get_sb(struct f if (s->s_fs_info != server) { nfs_free_server(server); server = NULL; + } else { + error = nfs_bdi_register(server); + if (error) + goto error_splat_super; } if (!s->s_root) { Index: linux-2.6.25-rc7/fs/fuse/control.c =================================================================== --- linux-2.6.25-rc7.orig/fs/fuse/control.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/fuse/control.c 2008-03-31 14:50:42.000000000 +0200 @@ -117,7 +117,7 @@ int fuse_ctl_add_conn(struct fuse_conn * parent = fuse_control_sb->s_root; inc_nlink(parent->d_inode); - sprintf(name, "%llu", (unsigned long long) fc->id); + sprintf(name, "%u", fc->dev); parent = fuse_ctl_add_dentry(parent, fc, name, S_IFDIR | 0500, 2, &simple_dir_inode_operations, &simple_dir_operations); Index: linux-2.6.25-rc7/fs/fuse/fuse_i.h =================================================================== --- linux-2.6.25-rc7.orig/fs/fuse/fuse_i.h 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/fuse/fuse_i.h 2008-03-31 14:50:42.000000000 +0200 @@ -15,6 +15,7 @@ #include <linux/mm.h> #include <linux/backing-dev.h> #include <linux/mutex.h> +#include <linux/rwsem.h> /** Max number of pages that can be used in a single read request */ #define FUSE_MAX_PAGES_PER_REQ 32 @@ -25,6 +26,9 @@ /** Congestion starts at 75% of maximum */ #define FUSE_CONGESTION_THRESHOLD (FUSE_MAX_BACKGROUND * 75 / 100) +/** Bias for fi->writectr, meaning new writepages must not be sent */ +#define FUSE_NOWRITE INT_MIN + /** It could be as large as PATH_MAX, but would that have any uses? */ #define FUSE_NAME_MAX 1024 @@ -73,6 +77,19 @@ struct fuse_inode { /** Files usable in writepage. Protected by fc->lock */ struct list_head write_files; + + /** Writepages pending on truncate or fsync */ + struct list_head queued_writes; + + /** Number of sent writes, a negative bias (FUSE_NOWRITE) + * means more writes are blocked */ + int writectr; + + /** Waitq for writepage completion */ + wait_queue_head_t page_waitq; + + /** List of writepage requestst (pending or sent) */ + struct list_head writepages; }; /** FUSE specific file data */ @@ -242,6 +259,12 @@ struct fuse_req { /** File used in the request (or NULL) */ struct fuse_file *ff; + /** Inode used in the request or NULL */ + struct inode *inode; + + /** Link on fi->writepages */ + struct list_head writepages_entry; + /** Request completion callback */ void (*end)(struct fuse_conn *, struct fuse_req *); @@ -390,8 +413,8 @@ struct fuse_conn { /** Entry on the fuse_conn_list */ struct list_head entry; - /** Unique ID */ - u64 id; + /** Device ID from super block */ + dev_t dev; /** Dentries in the control filesystem */ struct dentry *ctl_dentry[FUSE_CTL_NUM_DENTRIES]; @@ -504,6 +527,11 @@ void fuse_init_symlink(struct inode *ino void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, u64 attr_valid, u64 attr_version); +void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, + u64 attr_valid); + +void fuse_truncate(struct address_space *mapping, loff_t offset); + /** * Initialize the client device */ @@ -522,6 +550,8 @@ void fuse_ctl_cleanup(void); */ struct fuse_req *fuse_request_alloc(void); +struct fuse_req *fuse_request_alloc_nofs(void); + /** * Free a request */ @@ -558,6 +588,8 @@ void request_send_noreply(struct fuse_co */ void request_send_background(struct fuse_conn *fc, struct fuse_req *req); +void request_send_background_locked(struct fuse_conn *fc, struct fuse_req *req); + /* Abort all requests */ void fuse_abort_conn(struct fuse_conn *fc); @@ -600,3 +632,8 @@ u64 fuse_lock_owner_id(struct fuse_conn int fuse_update_attributes(struct inode *inode, struct kstat *stat, struct file *file, bool *refreshed); + +void fuse_flush_writepages(struct inode *inode); + +void fuse_set_nowrite(struct inode *inode); +void fuse_release_nowrite(struct inode *inode); Index: linux-2.6.25-rc7/fs/fuse/inode.c =================================================================== --- linux-2.6.25-rc7.orig/fs/fuse/inode.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/fuse/inode.c 2008-03-31 14:50:42.000000000 +0200 @@ -59,7 +59,11 @@ static struct inode *fuse_alloc_inode(st fi->nodeid = 0; fi->nlookup = 0; fi->attr_version = 0; + fi->writectr = 0; INIT_LIST_HEAD(&fi->write_files); + INIT_LIST_HEAD(&fi->queued_writes); + INIT_LIST_HEAD(&fi->writepages); + init_waitqueue_head(&fi->page_waitq); fi->forget_req = fuse_request_alloc(); if (!fi->forget_req) { kmem_cache_free(fuse_inode_cachep, inode); @@ -73,6 +77,7 @@ static void fuse_destroy_inode(struct in { struct fuse_inode *fi = get_fuse_inode(inode); BUG_ON(!list_empty(&fi->write_files)); + BUG_ON(!list_empty(&fi->queued_writes)); if (fi->forget_req) fuse_request_free(fi->forget_req); kmem_cache_free(fuse_inode_cachep, inode); @@ -109,7 +114,7 @@ static int fuse_remount_fs(struct super_ return 0; } -static void fuse_truncate(struct address_space *mapping, loff_t offset) +void fuse_truncate(struct address_space *mapping, loff_t offset) { /* See vmtruncate() */ unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1); @@ -117,19 +122,12 @@ static void fuse_truncate(struct address unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1); } - -void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, - u64 attr_valid, u64 attr_version) +void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, + u64 attr_valid) { struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); - loff_t oldsize; - spin_lock(&fc->lock); - if (attr_version != 0 && fi->attr_version > attr_version) { - spin_unlock(&fc->lock); - return; - } fi->attr_version = ++fc->attr_version; fi->i_time = attr_valid; @@ -159,6 +157,22 @@ void fuse_change_attributes(struct inode fi->orig_i_mode = inode->i_mode; if (!(fc->flags & FUSE_DEFAULT_PERMISSIONS)) inode->i_mode &= ~S_ISVTX; +} + +void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, + u64 attr_valid, u64 attr_version) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + loff_t oldsize; + + spin_lock(&fc->lock); + if (attr_version != 0 && fi->attr_version > attr_version) { + spin_unlock(&fc->lock); + return; + } + + fuse_change_attributes_common(inode, attr, attr_valid); oldsize = inode->i_size; i_size_write(inode, attr->size); @@ -448,7 +462,7 @@ static int fuse_show_options(struct seq_ return 0; } -static struct fuse_conn *new_conn(void) +static struct fuse_conn *new_conn(struct super_block *sb) { struct fuse_conn *fc; int err; @@ -469,19 +483,41 @@ static struct fuse_conn *new_conn(void) atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; + /* fuse does it's own writeback accounting */ + fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB; + fc->dev = sb->s_dev; err = bdi_init(&fc->bdi); - if (err) { - kfree(fc); - fc = NULL; - goto out; - } + if (err) + goto error_kfree; + err = bdi_register_dev(&fc->bdi, fc->dev); + if (err) + goto error_bdi_destroy; + /* + * For a single fuse filesystem use max 1% of dirty + + * writeback threshold. + * + * This gives about 1M of write buffer for memory maps on a + * machine with 1G and 10% dirty_ratio, which should be more + * than enough. + * + * Privileged users can raise it by writing to + * + * /sys/class/bdi/<bdi>/max_ratio + */ + bdi_set_max_ratio(&fc->bdi, 1); fc->reqctr = 0; fc->blocked = 1; fc->attr_version = 1; get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); } -out: return fc; + +error_bdi_destroy: + bdi_destroy(&fc->bdi); +error_kfree: + mutex_destroy(&fc->inst_mutex); + kfree(fc); + return NULL; } void fuse_conn_put(struct fuse_conn *fc) @@ -579,12 +615,6 @@ static void fuse_send_init(struct fuse_c request_send_background(fc, req); } -static u64 conn_id(void) -{ - static u64 ctr = 1; - return ctr++; -} - static int fuse_fill_super(struct super_block *sb, void *data, int silent) { struct fuse_conn *fc; @@ -622,7 +652,7 @@ static int fuse_fill_super(struct super_ if (file->f_op != &fuse_dev_operations) return -EINVAL; - fc = new_conn(); + fc = new_conn(sb); if (!fc) return -ENOMEM; @@ -660,7 +690,6 @@ static int fuse_fill_super(struct super_ if (file->private_data) goto err_unlock; - fc->id = conn_id(); err = fuse_ctl_add_conn(fc); if (err) goto err_unlock; Index: linux-2.6.25-rc7/include/linux/proportions.h =================================================================== --- linux-2.6.25-rc7.orig/include/linux/proportions.h 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/include/linux/proportions.h 2008-03-31 14:50:42.000000000 +0200 @@ -78,6 +78,19 @@ void prop_inc_percpu(struct prop_descrip } /* + * Limit the time part in order to ensure there are some bits left for the + * cycle counter and fraction multiply. + */ +#define PROP_MAX_SHIFT (3*BITS_PER_LONG/4) + +#define PROP_FRAC_SHIFT (BITS_PER_LONG - PROP_MAX_SHIFT - 1) +#define PROP_FRAC_BASE (1UL << PROP_FRAC_SHIFT) + +void __prop_inc_percpu_max(struct prop_descriptor *pd, + struct prop_local_percpu *pl, long frac); + + +/* * ----- SINGLE ------ */ Index: linux-2.6.25-rc7/lib/proportions.c =================================================================== --- linux-2.6.25-rc7.orig/lib/proportions.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/lib/proportions.c 2008-03-31 14:50:42.000000000 +0200 @@ -73,12 +73,6 @@ #include <linux/proportions.h> #include <linux/rcupdate.h> -/* - * Limit the time part in order to ensure there are some bits left for the - * cycle counter. - */ -#define PROP_MAX_SHIFT (3*BITS_PER_LONG/4) - int prop_descriptor_init(struct prop_descriptor *pd, int shift) { int err; @@ -268,6 +262,38 @@ void __prop_inc_percpu(struct prop_descr } /* + * identical to __prop_inc_percpu, except that it limits this pl's fraction to + * @frac/PROP_FRAC_BASE by ignoring events when this limit has been exceeded. + */ +void __prop_inc_percpu_max(struct prop_descriptor *pd, + struct prop_local_percpu *pl, long frac) +{ + struct prop_global *pg = prop_get_global(pd); + + prop_norm_percpu(pg, pl); + + if (unlikely(frac != PROP_FRAC_BASE)) { + unsigned long period_2 = 1UL << (pg->shift - 1); + unsigned long counter_mask = period_2 - 1; + unsigned long global_count; + long numerator, denominator; + + numerator = percpu_counter_read_positive(&pl->events); + global_count = percpu_counter_read(&pg->events); + denominator = period_2 + (global_count & counter_mask); + + if (numerator > ((denominator * frac) >> PROP_FRAC_SHIFT)) + goto out_put; + } + + percpu_counter_add(&pl->events, 1); + percpu_counter_add(&pg->events, 1); + +out_put: + prop_put_global(pd, pg); +} + +/* * Obtain a fraction of this proportion * * p_{j} = x_{j} / (period/2 + t % period/2) Index: linux-2.6.25-rc7/fs/configfs/inode.c =================================================================== --- linux-2.6.25-rc7.orig/fs/configfs/inode.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/configfs/inode.c 2008-03-31 14:50:42.000000000 +0200 @@ -47,7 +47,7 @@ static const struct address_space_operat static struct backing_dev_info configfs_backing_dev_info = { .ra_pages = 0, /* No readahead */ - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, }; static const struct inode_operations configfs_inode_operations ={ Index: linux-2.6.25-rc7/fs/hugetlbfs/inode.c =================================================================== --- linux-2.6.25-rc7.orig/fs/hugetlbfs/inode.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/hugetlbfs/inode.c 2008-03-31 14:50:42.000000000 +0200 @@ -45,7 +45,7 @@ static const struct inode_operations hug static struct backing_dev_info hugetlbfs_backing_dev_info = { .ra_pages = 0, /* No readahead */ - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, }; int sysctl_hugetlb_shm_group; Index: linux-2.6.25-rc7/fs/ocfs2/dlm/dlmfs.c =================================================================== --- linux-2.6.25-rc7.orig/fs/ocfs2/dlm/dlmfs.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/ocfs2/dlm/dlmfs.c 2008-03-31 14:50:42.000000000 +0200 @@ -327,7 +327,7 @@ clear_fields: static struct backing_dev_info dlmfs_backing_dev_info = { .ra_pages = 0, /* No readahead */ - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, }; static struct inode *dlmfs_get_root_inode(struct super_block *sb) Index: linux-2.6.25-rc7/fs/ramfs/inode.c =================================================================== --- linux-2.6.25-rc7.orig/fs/ramfs/inode.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/ramfs/inode.c 2008-03-31 14:50:42.000000000 +0200 @@ -44,7 +44,7 @@ static const struct inode_operations ram static struct backing_dev_info ramfs_backing_dev_info = { .ra_pages = 0, /* No readahead */ - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK | + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_MAP_DIRECT | BDI_CAP_MAP_COPY | BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP, }; Index: linux-2.6.25-rc7/fs/sysfs/inode.c =================================================================== --- linux-2.6.25-rc7.orig/fs/sysfs/inode.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/sysfs/inode.c 2008-03-31 14:50:42.000000000 +0200 @@ -30,7 +30,7 @@ static const struct address_space_operat static struct backing_dev_info sysfs_backing_dev_info = { .ra_pages = 0, /* No readahead */ - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, }; static const struct inode_operations sysfs_inode_operations ={ Index: linux-2.6.25-rc7/kernel/cgroup.c =================================================================== --- linux-2.6.25-rc7.orig/kernel/cgroup.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/kernel/cgroup.c 2008-03-31 14:50:42.000000000 +0200 @@ -562,7 +562,7 @@ static struct inode_operations cgroup_di static struct file_operations proc_cgroupstats_operations; static struct backing_dev_info cgroup_backing_dev_info = { - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, }; static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb) Index: linux-2.6.25-rc7/mm/shmem.c =================================================================== --- linux-2.6.25-rc7.orig/mm/shmem.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/mm/shmem.c 2008-03-31 14:50:42.000000000 +0200 @@ -201,7 +201,7 @@ static struct vm_operations_struct shmem static struct backing_dev_info shmem_backing_dev_info __read_mostly = { .ra_pages = 0, /* No readahead */ - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, .unplug_io_fn = default_unplug_io_fn, }; Index: linux-2.6.25-rc7/mm/swap_state.c =================================================================== --- linux-2.6.25-rc7.orig/mm/swap_state.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/mm/swap_state.c 2008-03-31 14:50:42.000000000 +0200 @@ -33,7 +33,7 @@ static const struct address_space_operat }; static struct backing_dev_info swap_backing_dev_info = { - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, .unplug_io_fn = swap_unplug_io_fn, }; Index: linux-2.6.25-rc7/drivers/base/node.c =================================================================== --- linux-2.6.25-rc7.orig/drivers/base/node.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/drivers/base/node.c 2008-03-31 14:50:42.000000000 +0200 @@ -64,6 +64,7 @@ static ssize_t node_read_meminfo(struct "Node %d PageTables: %8lu kB\n" "Node %d NFS_Unstable: %8lu kB\n" "Node %d Bounce: %8lu kB\n" + "Node %d WritebackTmp: %8lu kB\n" "Node %d Slab: %8lu kB\n" "Node %d SReclaimable: %8lu kB\n" "Node %d SUnreclaim: %8lu kB\n", @@ -86,6 +87,7 @@ static ssize_t node_read_meminfo(struct nid, K(node_page_state(nid, NR_PAGETABLE)), nid, K(node_page_state(nid, NR_UNSTABLE_NFS)), nid, K(node_page_state(nid, NR_BOUNCE)), + nid, K(node_page_state(nid, NR_WRITEBACK_TEMP)), nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) + node_page_state(nid, NR_SLAB_UNRECLAIMABLE)), nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)), Index: linux-2.6.25-rc7/fs/proc/proc_misc.c =================================================================== --- linux-2.6.25-rc7.orig/fs/proc/proc_misc.c 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/fs/proc/proc_misc.c 2008-03-31 14:50:42.000000000 +0200 @@ -179,6 +179,7 @@ static int meminfo_read_proc(char *page, "PageTables: %8lu kB\n" "NFS_Unstable: %8lu kB\n" "Bounce: %8lu kB\n" + "WritebackTmp: %8lu kB\n" "CommitLimit: %8lu kB\n" "Committed_AS: %8lu kB\n" "VmallocTotal: %8lu kB\n" @@ -210,6 +211,7 @@ static int meminfo_read_proc(char *page, K(global_page_state(NR_PAGETABLE)), K(global_page_state(NR_UNSTABLE_NFS)), K(global_page_state(NR_BOUNCE)), + K(global_page_state(NR_WRITEBACK_TEMP)), K(allowed), K(committed), (unsigned long)VMALLOC_TOTAL >> 10, Index: linux-2.6.25-rc7/include/linux/mmzone.h =================================================================== --- linux-2.6.25-rc7.orig/include/linux/mmzone.h 2008-03-31 14:50:38.000000000 +0200 +++ linux-2.6.25-rc7/include/linux/mmzone.h 2008-03-31 14:50:42.000000000 +0200 @@ -95,6 +95,7 @@ enum zone_stat_item { NR_UNSTABLE_NFS, /* NFS unstable pages */ NR_BOUNCE, NR_VMSCAN_WRITE, + NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */ #ifdef CONFIG_NUMA NUMA_HIT, /* allocated in intended node */ NUMA_MISS, /* allocated in non intended node */ Index: linux-2.6.25-rc7/Documentation/filesystems/proc.txt =================================================================== --- linux-2.6.25-rc7.orig/Documentation/filesystems/proc.txt 2008-03-31 14:50:37.000000000 +0200 +++ linux-2.6.25-rc7/Documentation/filesystems/proc.txt 2008-03-31 14:50:42.000000000 +0200 @@ -462,11 +462,17 @@ SwapTotal: 0 kB SwapFree: 0 kB Dirty: 968 kB Writeback: 0 kB +AnonPages: 861800 kB Mapped: 280372 kB -Slab: 684068 kB +Slab: 284364 kB +SReclaimable: 159856 kB +SUnreclaim: 124508 kB +PageTables: 24448 kB +NFS_Unstable: 0 kB +Bounce: 0 kB +WritebackTmp: 0 kB CommitLimit: 7669796 kB Committed_AS: 100056 kB -PageTables: 24448 kB VmallocTotal: 112216 kB VmallocUsed: 428 kB VmallocChunk: 111088 kB @@ -502,8 +508,17 @@ VmallocChunk: 111088 kB on the disk Dirty: Memory which is waiting to get written back to the disk Writeback: Memory which is actively being written back to the disk + AnonPages: Non-file backed pages mapped into userspace page tables Mapped: files which have been mmaped, such as libraries Slab: in-kernel data structures cache +SReclaimable: Part of Slab, that might be reclaimed, such as caches + SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure + PageTables: amount of memory dedicated to the lowest level of page + tables. +NFS_Unstable: NFS pages sent to the server, but not yet committed to stable + storage + Bounce: Memory used for block device "bounce buffers" +WritebackTmp: Memory used by FUSE for temporary writeback buffers CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), this is the total amount of memory currently available to be allocated on the system. This limit is only adhered to @@ -530,8 +545,6 @@ Committed_AS: The amount of memory prese above) will not be permitted. This is useful if one needs to guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated. - PageTables: amount of memory dedicated to the lowest level of page - tables. VmallocTotal: total size of vmalloc memory area VmallocUsed: amount of vmalloc area which is used VmallocChunk: largest contigious block of vmalloc area which is free Index: linux-2.6.25-rc7/fs/fuse/dev.c =================================================================== --- linux-2.6.25-rc7.orig/fs/fuse/dev.c 2008-03-31 14:50:37.000000000 +0200 +++ linux-2.6.25-rc7/fs/fuse/dev.c 2008-03-31 14:50:42.000000000 +0200 @@ -47,6 +47,14 @@ struct fuse_req *fuse_request_alloc(void return req; } +struct fuse_req *fuse_request_alloc_nofs(void) +{ + struct fuse_req *req = kmem_cache_alloc(fuse_req_cachep, GFP_NOFS); + if (req) + fuse_request_init(req); + return req; +} + void fuse_request_free(struct fuse_req *req) { kmem_cache_free(fuse_req_cachep, req); @@ -430,6 +438,17 @@ void request_send_background(struct fuse } /* + * Called under fc->lock + * + * fc->connected must have been checked previously + */ +void request_send_background_locked(struct fuse_conn *fc, struct fuse_req *req) +{ + req->isreply = 1; + request_send_nowait_locked(fc, req); +} + +/* * Lock the request. Up to the next unlock_request() there mustn't be * anything that could cause a page-fault. If the request was already * aborted bail out. Index: linux-2.6.25-rc7/fs/fuse/dir.c =================================================================== --- linux-2.6.25-rc7.orig/fs/fuse/dir.c 2008-03-31 14:50:37.000000000 +0200 +++ linux-2.6.25-rc7/fs/fuse/dir.c 2008-03-31 14:50:42.000000000 +0200 @@ -1107,6 +1107,50 @@ static void iattr_to_fattr(struct iattr } /* + * Prevent concurrent writepages on inode + * + * This is done by adding a negative bias to the inode write counter + * and waiting for all pending writes to finish. + */ +void fuse_set_nowrite(struct inode *inode) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + + BUG_ON(!mutex_is_locked(&inode->i_mutex)); + + spin_lock(&fc->lock); + BUG_ON(fi->writectr < 0); + fi->writectr += FUSE_NOWRITE; + spin_unlock(&fc->lock); + wait_event(fi->page_waitq, fi->writectr == FUSE_NOWRITE); +} + +/* + * Allow writepages on inode + * + * Remove the bias from the writecounter and send any queued + * writepages. + */ +static void __fuse_release_nowrite(struct inode *inode) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + + BUG_ON(fi->writectr != FUSE_NOWRITE); + fi->writectr = 0; + fuse_flush_writepages(inode); +} + +void fuse_release_nowrite(struct inode *inode) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + + spin_lock(&fc->lock); + __fuse_release_nowrite(inode); + spin_unlock(&fc->lock); +} + +/* * Set attributes, and at the same time refresh them. * * Truncation is slightly complicated, because the 'truncate' request @@ -1122,6 +1166,8 @@ static int fuse_do_setattr(struct dentry struct fuse_req *req; struct fuse_setattr_in inarg; struct fuse_attr_out outarg; + bool is_truncate = false; + loff_t oldsize; int err; if (!fuse_allow_task(fc, current)) @@ -1145,12 +1191,16 @@ static int fuse_do_setattr(struct dentry send_sig(SIGXFSZ, current, 0); return -EFBIG; } + is_truncate = true; } req = fuse_get_req(fc); if (IS_ERR(req)) return PTR_ERR(req); + if (is_truncate) + fuse_set_nowrite(inode); + memset(&inarg, 0, sizeof(inarg)); memset(&outarg, 0, sizeof(outarg)); iattr_to_fattr(attr, &inarg); @@ -1181,16 +1231,44 @@ static int fuse_do_setattr(struct dentry if (err) { if (err == -EINTR) fuse_invalidate_attr(inode); - return err; + goto error; } if ((inode->i_mode ^ outarg.attr.mode) & S_IFMT) { make_bad_inode(inode); - return -EIO; + err = -EIO; + goto error; + } + + spin_lock(&fc->lock); + fuse_change_attributes_common(inode, &outarg.attr, + attr_timeout(&outarg)); + oldsize = inode->i_size; + i_size_write(inode, outarg.attr.size); + + if (is_truncate) { + /* NOTE: this may release/reacquire fc->lock */ + __fuse_release_nowrite(inode); + } + spin_unlock(&fc->lock); + + /* + * Only call invalidate_inode_pages2() after removing + * FUSE_NOWRITE, otherwise fuse_launder_page() would deadlock. + */ + if (S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { + if (outarg.attr.size < oldsize) + fuse_truncate(inode->i_mapping, outarg.attr.size); + invalidate_inode_pages2(inode->i_mapping); } - fuse_change_attributes(inode, &outarg.attr, attr_timeout(&outarg), 0); return 0; + +error: + if (is_truncate) + fuse_release_nowrite(inode); + + return err; } static int fuse_setattr(struct dentry *entry, struct iattr *attr) Index: linux-2.6.25-rc7/fs/fuse/file.c =================================================================== --- linux-2.6.25-rc7.orig/fs/fuse/file.c 2008-03-31 14:50:37.000000000 +0200 +++ linux-2.6.25-rc7/fs/fuse/file.c 2008-03-31 14:50:43.000000000 +0200 @@ -210,6 +210,49 @@ u64 fuse_lock_owner_id(struct fuse_conn return (u64) v0 + ((u64) v1 << 32); } +/* + * Check if page is under writeback + * + * This is currently done by walking the list of writepage requests + * for the inode, which can be pretty inefficient. + */ +static bool fuse_page_is_writeback(struct inode *inode, pgoff_t index) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_req *req; + bool found = false; + + spin_lock(&fc->lock); + list_for_each_entry(req, &fi->writepages, writepages_entry) { + pgoff_t curr_index; + + BUG_ON(req->inode != inode); + curr_index = req->misc.write.in.offset >> PAGE_CACHE_SHIFT; + if (curr_index == index) { + found = true; + break; + } + } + spin_unlock(&fc->lock); + + return found; +} + +/* + * Wait for page writeback to be completed. + * + * Since fuse doesn't rely on the VM writeback tracking, this has to + * use some other means. + */ +static int fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + + wait_event(fi->page_waitq, !fuse_page_is_writeback(inode, index)); + return 0; +} + static int fuse_flush(struct file *file, fl_owner_t id) { struct inode *inode = file->f_path.dentry->d_inode; @@ -245,6 +288,21 @@ static int fuse_flush(struct file *file, return err; } +/* + * Wait for all pending writepages on the inode to finish. + * + * This is currently done by blocking further writes with FUSE_NOWRITE + * and waiting for all sent writes to complete. + * + * This must be called under i_mutex, otherwise the FUSE_NOWRITE usage + * could conflict with truncation. + */ +static void fuse_sync_writes(struct inode *inode) +{ + fuse_set_nowrite(inode); + fuse_release_nowrite(inode); +} + int fuse_fsync_common(struct file *file, struct dentry *de, int datasync, int isdir) { @@ -261,6 +319,17 @@ int fuse_fsync_common(struct file *file, if ((!isdir && fc->no_fsync) || (isdir && fc->no_fsyncdir)) return 0; + /* + * Start writeback against all dirty pages of the inode, then + * wait for all outstanding writes, before sending the FSYNC + * request. + */ + err = write_inode_now(inode, 0); + if (err) + return err; + + fuse_sync_writes(inode); + req = fuse_get_req(fc); if (IS_ERR(req)) return PTR_ERR(req); @@ -329,17 +398,39 @@ static size_t fuse_send_read(struct fuse return req->out.args[0].size; } +static void fuse_read_update_size(struct inode *inode, loff_t size) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + + spin_lock(&fc->lock); + fi->attr_version = ++fc->attr_version; + if (size < inode->i_size) + i_size_write(inode, size); + spin_unlock(&fc->lock); +} + static int fuse_readpage(struct file *file, struct page *page) { struct inode *inode = page->mapping->host; struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; + size_t num_read; + loff_t pos = page_offset(page); + size_t count = PAGE_CACHE_SIZE; int err; err = -EIO; if (is_bad_inode(inode)) goto out; + /* + * Page writeback can extend beyond the liftime of the + * page-cache page, so make sure we read a properly synced + * page. + */ + fuse_wait_on_page_writeback(inode, page->index); + req = fuse_get_req(fc); err = PTR_ERR(req); if (IS_ERR(req)) @@ -348,12 +439,20 @@ static int fuse_readpage(struct file *fi req->out.page_zeroing = 1; req->num_pages = 1; req->pages[0] = page; - fuse_send_read(req, file, inode, page_offset(page), PAGE_CACHE_SIZE, - NULL); + num_read = fuse_send_read(req, file, inode, pos, count, NULL); err = req->out.h.error; fuse_put_request(fc, req); - if (!err) + + if (!err) { + /* + * Short read means EOF. If file size is larger, truncate it + */ + if (num_read < count) + fuse_read_update_size(inode, pos + num_read); + SetPageUptodate(page); + } + fuse_invalidate_attr(inode); /* atime changed */ out: unlock_page(page); @@ -363,8 +462,19 @@ static int fuse_readpage(struct file *fi static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req) { int i; + size_t count = req->misc.read_in.size; + size_t num_read = req->out.args[0].size; + struct inode *inode = req->pages[0]->mapping->host; + + /* + * Short read means EOF. If file size is larger, truncate it + */ + if (!req->out.h.error && num_read < count) { + loff_t pos = page_offset(req->pages[0]) + num_read; + fuse_read_update_size(inode, pos); + } - fuse_invalidate_attr(req->pages[0]->mapping->host); /* atime changed */ + fuse_invalidate_attr(inode); /* atime changed */ for (i = 0; i < req->num_pages; i++) { struct page *page = req->pages[i]; @@ -411,6 +521,8 @@ static int fuse_readpages_fill(void *_da struct inode *inode = data->inode; struct fuse_conn *fc = get_fuse_conn(inode); + fuse_wait_on_page_writeback(inode, page->index); + if (req->num_pages && (req->num_pages == FUSE_MAX_PAGES_PER_REQ || (req->num_pages + 1) * PAGE_CACHE_SIZE > fc->max_read || @@ -477,11 +589,10 @@ static ssize_t fuse_file_aio_read(struct } static void fuse_write_fill(struct fuse_req *req, struct file *file, - struct inode *inode, loff_t pos, size_t count, - int writepage) + struct fuse_file *ff, struct inode *inode, + loff_t pos, size_t count, int writepage) { struct fuse_conn *fc = get_fuse_conn(inode); - struct fuse_file *ff = file->private_data; struct fuse_write_in *inarg = &req->misc.write.in; struct fuse_write_out *outarg = &req->misc.write.out; @@ -490,7 +601,7 @@ static void fuse_write_fill(struct fuse_ inarg->offset = pos; inarg->size = count; inarg->write_flags = writepage ? FUSE_WRITE_CACHE : 0; - inarg->flags = file->f_flags; + inarg->flags = file ? file->f_flags : 0; req->in.h.opcode = FUSE_WRITE; req->in.h.nodeid = get_node_id(inode); req->in.argpages = 1; @@ -511,7 +622,7 @@ static size_t fuse_send_write(struct fus fl_owner_t owner) { struct fuse_conn *fc = get_fuse_conn(inode); - fuse_write_fill(req, file, inode, pos, count, 0); + fuse_write_fill(req, file, file->private_data, inode, pos, count, 0); if (owner != NULL) { struct fuse_write_in *inarg = &req->misc.write.in; inarg->write_flags |= FUSE_WRITE_LOCKOWNER; @@ -533,19 +644,36 @@ static int fuse_write_begin(struct file return 0; } +static void fuse_write_update_size(struct inode *inode, loff_t pos) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + + spin_lock(&fc->lock); + fi->attr_version = ++fc->attr_version; + if (pos > inode->i_size) + i_size_write(inode, pos); + spin_unlock(&fc->lock); +} + static int fuse_buffered_write(struct file *file, struct inode *inode, loff_t pos, unsigned count, struct page *page) { int err; size_t nres; struct fuse_conn *fc = get_fuse_conn(inode); - struct fuse_inode *fi = get_fuse_inode(inode); unsigned offset = pos & (PAGE_CACHE_SIZE - 1); struct fuse_req *req; if (is_bad_inode(inode)) return -EIO; + /* + * Make sure writepages on the same page are not mixed up with + * plain writes. + */ + fuse_wait_on_page_writeback(inode, page->index); + req = fuse_get_req(fc); if (IS_ERR(req)) return PTR_ERR(req); @@ -560,12 +688,7 @@ static int fuse_buffered_write(struct fi err = -EIO; if (!err) { pos += nres; - spin_lock(&fc->lock); - fi->attr_version = ++fc->attr_version; - if (pos > inode->i_size) - i_size_write(inode, pos); - spin_unlock(&fc->lock); - + fuse_write_update_size(inode, pos); if (count == PAGE_CACHE_SIZE) SetPageUptodate(page); } @@ -588,6 +711,198 @@ static int fuse_write_end(struct file *f return res; } +static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file, + struct inode *inode, loff_t pos, + size_t count) +{ + size_t res; + unsigned offset; + unsigned i; + + for (i = 0; i < req->num_pages; i++) + fuse_wait_on_page_writeback(inode, req->pages[i]->index); + + res = fuse_send_write(req, file, inode, pos, count, NULL); + + offset = req->page_offset; + count = res; + for (i = 0; i < req->num_pages; i++) { + struct page *page = req->pages[i]; + + if (!req->out.h.error && !offset && count >= PAGE_CACHE_SIZE) + SetPageUptodate(page); + + if (count > PAGE_CACHE_SIZE - offset) + count -= PAGE_CACHE_SIZE - offset; + else + count = 0; + offset = 0; + + unlock_page(page); + page_cache_release(page); + } + + return res; +} + +static ssize_t fuse_fill_write_pages(struct fuse_req *req, + struct address_space *mapping, + struct iov_iter *ii, loff_t pos) +{ + struct fuse_conn *fc = get_fuse_conn(mapping->host); + unsigned offset = pos & (PAGE_CACHE_SIZE - 1); + size_t count = 0; + int err; + + req->page_offset = offset; + + do { + size_t tmp; + struct page *page; + pgoff_t index = pos >> PAGE_CACHE_SHIFT; + size_t bytes = min_t(size_t, PAGE_CACHE_SIZE - offset, + iov_iter_count(ii)); + + bytes = min_t(size_t, bytes, fc->max_write - count); + + again: + err = -EFAULT; + if (iov_iter_fault_in_readable(ii, bytes)) + break; + + err = -ENOMEM; + page = __grab_cache_page(mapping, index); + if (!page) + break; + + pagefault_disable(); + tmp = iov_iter_copy_from_user_atomic(page, ii, offset, bytes); + pagefault_enable(); + flush_dcache_page(page); + + if (!tmp) { + unlock_page(page); + page_cache_release(page); + bytes = min(bytes, iov_iter_single_seg_count(ii)); + goto again; + } + + err = 0; + req->pages[req->num_pages] = page; + req->num_pages++; + + iov_iter_advance(ii, tmp); + count += tmp; + pos += tmp; + offset += tmp; + if (offset == PAGE_CACHE_SIZE) + offset = 0; + + } while (iov_iter_count(ii) && count < fc->max_write && + req->num_pages < FUSE_MAX_PAGES_PER_REQ && offset == 0); + + return count > 0 ? count : err; +} + +static ssize_t fuse_perform_write(struct file *file, + struct address_space *mapping, + struct iov_iter *ii, loff_t pos) +{ + struct inode *inode = mapping->host; + struct fuse_conn *fc = get_fuse_conn(inode); + int err = 0; + ssize_t res = 0; + + if (is_bad_inode(inode)) + return -EIO; + + do { + struct fuse_req *req; + ssize_t count; + + req = fuse_get_req(fc); + if (IS_ERR(req)) { + err = PTR_ERR(req); + break; + } + + count = fuse_fill_write_pages(req, mapping, ii, pos); + if (count <= 0) { + err = count; + } else { + size_t num_written; + + num_written = fuse_send_write_pages(req, file, inode, + pos, count); + err = req->out.h.error; + if (!err) { + res += num_written; + pos += num_written; + + /* break out of the loop on short write */ + if (num_written != count) + err = -EIO; + } + } + fuse_put_request(fc, req); + } while (!err && iov_iter_count(ii)); + + if (res > 0) + fuse_write_update_size(inode, pos); + + fuse_invalidate_attr(inode); + + return res > 0 ? res : err; +} + +static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos) +{ + struct file *file = iocb->ki_filp; + struct address_space *mapping = file->f_mapping; + size_t co... [truncated message content] |
From: Randy R. <rmr...@us...> - 2008-04-22 07:41:03
|
Thank you Milkos. I've finally got around to applying this patch (been busy with day job, etc). I applied it to the latest linus git tree, and it applied cleanly, and rebooted into this kernel (which I know was successful, since I had to manually hack nvidia proprietary drivers and vmware properietary modules to work again with 2.6.25), but I am still seeing 4096 bytes writes, even when I do dd if=/dev/zero of=ZERO bs=1M count=100 on my fuse filesystem. Do I need to be running the latest fuse for this to work? It looks like I am using the 2.7.0 release fuse. On Mon, Mar 31, 2008 at 6:18 AM, Miklos Szeredi <mi...@sz...> wrote: > > One problem I am facing is that VFS seems to send down all writes in > > 4096byte chunks which kills performance, it looks like this issue has > > already been discussed some (http://kerneltrap.org/node/6739) was this > > ever put into mainline? > > It's now in -mm (not released yet). Will hopefully be in > linux-2.6.26, but that's some months away. > > > > Or are there patches I can make to either fuse or linux itself to > > allow larger writes? > > Here's a big patch against 2.6.25-rc7, that supports >4k writes and > writable mmaps. It's very well tested, and shouldn't cause problems. > In the unlikely case, that it does cause problems, I'd very much like > to hear about it :) > > > > In reality, though, since all I'm really doing is resolving paths, I'd > > rather not have the data path flow through userspace at all. Is there > > a mechanism by which one can intercept file access requests > > (presumably through fuse) handle the resolving of these requests in > > userspace, and then return the kernel with a path to the "real" file, > > and have further access happen directly? > > Have you thought of using symlinks? Those would do *exactly* what you > want: the fuse filesystem just tells the kernel where to find the real > file, and the kernel does the rest. > > Or is it a requirement that userspace must not see where the files are > actually coming from? We could introduce some sort of "hidden > symlink" into fuse that acts just the same way as a real symlink, but > is always followed, regardless of whether userspace asks for that or > not. > > Miklos > > > > Index: linux-2.6.25-rc7/Documentation/ABI/testing/sysfs-class-bdi > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-2.6.25-rc7/Documentation/ABI/testing/sysfs-class-bdi 2008-03-31 14:50:42.000000000 +0200 > @@ -0,0 +1,46 @@ > +What: /sys/class/bdi/<bdi>/ > +Date: January 2008 > +Contact: Peter Zijlstra <a.p...@ch...> > +Description: > + > +Provide a place in sysfs for the backing_dev_info object. This allows > +setting and retrieving various BDI specific variables. > + > +The <bdi> identifier can be either of the following: > + > +MAJOR:MINOR > + > + Device number for block devices, or value of st_dev on > + non-block filesystems which provide their own BDI, such as NFS > + and FUSE. > + > +default > + > + The default backing dev, used for non-block device backed > + filesystems which do not provide their own BDI. > + > +Files under /sys/class/bdi/<bdi>/ > +--------------------------------- > + > +read_ahead_kb (read-write) > + > + Size of the read-ahead window in kilobytes > + > +min_ratio (read-write) > + > + Under normal circumstances each device is given a part of the > + total write-back cache that relates to its current average > + writeout speed in relation to the other devices. > + > + The 'min_ratio' parameter allows assigning a minimum > + percentage of the write-back cache to a particular device. > + For example, this is useful for providing a minimum QoS. > + > +max_ratio (read-write) > + > + Allows limiting a particular device to use not more than the > + given percentage of the write-back cache. This is useful in > + situations where we want to avoid one device taking all or > + most of the write-back cache. For example in case of an NFS > + mount that is prone to get stuck, or a FUSE mount which cannot > + be trusted to play fair. > Index: linux-2.6.25-rc7/block/genhd.c > =================================================================== > --- linux-2.6.25-rc7.orig/block/genhd.c 2008-03-31 14:50:39.000000000 +0200 > +++ linux-2.6.25-rc7/block/genhd.c 2008-03-31 14:50:42.000000000 +0200 > @@ -182,11 +182,17 @@ static int exact_lock(dev_t devt, void * > */ > void add_disk(struct gendisk *disk) > { > + struct backing_dev_info *bdi; > + > disk->flags |= GENHD_FL_UP; > blk_register_region(MKDEV(disk->major, disk->first_minor), > disk->minors, NULL, exact_match, exact_lock, disk); > register_disk(disk); > blk_register_queue(disk); > + > + bdi = &disk->queue->backing_dev_info; > + bdi_register_dev(bdi, MKDEV(disk->major, disk->first_minor)); > + sysfs_create_link(&disk->dev.kobj, &bdi->dev->kobj, "bdi"); > } > > EXPORT_SYMBOL(add_disk); > @@ -194,6 +200,8 @@ EXPORT_SYMBOL(del_gendisk); /* in partit > > void unlink_gendisk(struct gendisk *disk) > { > + sysfs_remove_link(&disk->dev.kobj, "bdi"); > + bdi_unregister(&disk->queue->backing_dev_info); > blk_unregister_queue(disk); > blk_unregister_region(MKDEV(disk->major, disk->first_minor), > disk->minors); > Index: linux-2.6.25-rc7/include/linux/backing-dev.h > =================================================================== > --- linux-2.6.25-rc7.orig/include/linux/backing-dev.h 2008-03-31 14:50:39.000000000 +0200 > +++ linux-2.6.25-rc7/include/linux/backing-dev.h 2008-03-31 14:50:42.000000000 +0200 > @@ -11,9 +11,13 @@ > #include <linux/percpu_counter.h> > #include <linux/log2.h> > #include <linux/proportions.h> > +#include <linux/kernel.h> > +#include <linux/fs.h> > #include <asm/atomic.h> > > struct page; > +struct device; > +struct dentry; > > /* > * Bits in backing_dev_info.state > @@ -48,11 +52,26 @@ struct backing_dev_info { > > struct prop_local_percpu completions; > int dirty_exceeded; > + > + unsigned int min_ratio; > + unsigned int max_ratio, max_prop_frac; > + > + struct device *dev; > + > +#ifdef CONFIG_DEBUG_FS > + struct dentry *debug_dir; > + struct dentry *debug_stats; > +#endif > }; > > int bdi_init(struct backing_dev_info *bdi); > void bdi_destroy(struct backing_dev_info *bdi); > > +int bdi_register(struct backing_dev_info *bdi, struct device *parent, > + const char *fmt, ...); > +int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev); > +void bdi_unregister(struct backing_dev_info *bdi); > + > static inline void __add_bdi_stat(struct backing_dev_info *bdi, > enum bdi_stat_item item, s64 amount) > { > @@ -116,6 +135,8 @@ static inline s64 bdi_stat_sum(struct ba > return sum; > } > > +extern void bdi_writeout_inc(struct backing_dev_info *bdi); > + > /* > * maximal error of a stat counter. > */ > @@ -128,24 +149,48 @@ static inline unsigned long bdi_stat_err > #endif > } > > +int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio); > +int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio); > + > /* > * Flags in backing_dev_info::capability > - * - The first two flags control whether dirty pages will contribute to the > - * VM's accounting and whether writepages() should be called for dirty pages > - * (something that would not, for example, be appropriate for ramfs) > - * - These flags let !MMU mmap() govern direct device mapping vs immediate > - * copying more easily for MAP_PRIVATE, especially for ROM filesystems > + * > + * The first three flags control whether dirty pages will contribute to the > + * VM's accounting and whether writepages() should be called for dirty pages > + * (something that would not, for example, be appropriate for ramfs) > + * > + * WARNING: these flags are closely related and should not normally be > + * used separately. The BDI_CAP_NO_ACCT_AND_WRITEBACK combines these > + * three flags into a single convenience macro. > + * > + * BDI_CAP_NO_ACCT_DIRTY: Dirty pages shouldn't contribute to accounting > + * BDI_CAP_NO_WRITEBACK: Don't write pages back > + * BDI_CAP_NO_ACCT_WB: Don't automatically account writeback pages > + * > + * These flags let !MMU mmap() govern direct device mapping vs immediate > + * copying more easily for MAP_PRIVATE, especially for ROM filesystems. > + * > + * BDI_CAP_MAP_COPY: Copy can be mapped (MAP_PRIVATE) > + * BDI_CAP_MAP_DIRECT: Can be mapped directly (MAP_SHARED) > + * BDI_CAP_READ_MAP: Can be mapped for reading > + * BDI_CAP_WRITE_MAP: Can be mapped for writing > + * BDI_CAP_EXEC_MAP: Can be mapped for execution > */ > -#define BDI_CAP_NO_ACCT_DIRTY 0x00000001 /* Dirty pages shouldn't contribute to accounting */ > -#define BDI_CAP_NO_WRITEBACK 0x00000002 /* Don't write pages back */ > -#define BDI_CAP_MAP_COPY 0x00000004 /* Copy can be mapped (MAP_PRIVATE) */ > -#define BDI_CAP_MAP_DIRECT 0x00000008 /* Can be mapped directly (MAP_SHARED) */ > -#define BDI_CAP_READ_MAP 0x00000010 /* Can be mapped for reading */ > -#define BDI_CAP_WRITE_MAP 0x00000020 /* Can be mapped for writing */ > -#define BDI_CAP_EXEC_MAP 0x00000040 /* Can be mapped for execution */ > +#define BDI_CAP_NO_ACCT_DIRTY 0x00000001 > +#define BDI_CAP_NO_WRITEBACK 0x00000002 > +#define BDI_CAP_MAP_COPY 0x00000004 > +#define BDI_CAP_MAP_DIRECT 0x00000008 > +#define BDI_CAP_READ_MAP 0x00000010 > +#define BDI_CAP_WRITE_MAP 0x00000020 > +#define BDI_CAP_EXEC_MAP 0x00000040 > +#define BDI_CAP_NO_ACCT_WB 0x00000080 > + > #define BDI_CAP_VMFLAGS \ > (BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP) > > +#define BDI_CAP_NO_ACCT_AND_WRITEBACK \ > + (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_ACCT_WB) > + > #if defined(VM_MAYREAD) && \ > (BDI_CAP_READ_MAP != VM_MAYREAD || \ > BDI_CAP_WRITE_MAP != VM_MAYWRITE || \ > @@ -187,17 +232,32 @@ void clear_bdi_congested(struct backing_ > void set_bdi_congested(struct backing_dev_info *bdi, int rw); > long congestion_wait(int rw, long timeout); > > -#define bdi_cap_writeback_dirty(bdi) \ > - (!((bdi)->capabilities & BDI_CAP_NO_WRITEBACK)) > > -#define bdi_cap_account_dirty(bdi) \ > - (!((bdi)->capabilities & BDI_CAP_NO_ACCT_DIRTY)) > +static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi) > +{ > + return !(bdi->capabilities & BDI_CAP_NO_WRITEBACK); > +} > + > +static inline bool bdi_cap_account_dirty(struct backing_dev_info *bdi) > +{ > + return !(bdi->capabilities & BDI_CAP_NO_ACCT_DIRTY); > +} > > -#define mapping_cap_writeback_dirty(mapping) \ > - bdi_cap_writeback_dirty((mapping)->backing_dev_info) > +static inline bool bdi_cap_account_writeback(struct backing_dev_info *bdi) > +{ > + /* Paranoia: BDI_CAP_NO_WRITEBACK implies BDI_CAP_NO_ACCT_WB */ > + return !(bdi->capabilities & (BDI_CAP_NO_ACCT_WB | > + BDI_CAP_NO_WRITEBACK)); > +} > > -#define mapping_cap_account_dirty(mapping) \ > - bdi_cap_account_dirty((mapping)->backing_dev_info) > +static inline bool mapping_cap_writeback_dirty(struct address_space *mapping) > +{ > + return bdi_cap_writeback_dirty(mapping->backing_dev_info); > +} > > +static inline bool mapping_cap_account_dirty(struct address_space *mapping) > +{ > + return bdi_cap_account_dirty(mapping->backing_dev_info); > +} > > #endif /* _LINUX_BACKING_DEV_H */ > Index: linux-2.6.25-rc7/include/linux/writeback.h > =================================================================== > --- linux-2.6.25-rc7.orig/include/linux/writeback.h 2008-03-31 14:50:39.000000000 +0200 > +++ linux-2.6.25-rc7/include/linux/writeback.h 2008-03-31 14:50:42.000000000 +0200 > @@ -114,6 +114,9 @@ struct file; > int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *, > void __user *, size_t *, loff_t *); > > +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, > + struct backing_dev_info *bdi); > + > void page_writeback_init(void); > void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, > unsigned long nr_pages_dirtied); > Index: linux-2.6.25-rc7/lib/percpu_counter.c > =================================================================== > --- linux-2.6.25-rc7.orig/lib/percpu_counter.c 2008-03-31 14:50:39.000000000 +0200 > +++ linux-2.6.25-rc7/lib/percpu_counter.c 2008-03-31 14:50:42.000000000 +0200 > @@ -102,6 +102,7 @@ void percpu_counter_destroy(struct percp > return; > > free_percpu(fbc->counters); > + fbc->counters = NULL; > #ifdef CONFIG_HOTPLUG_CPU > mutex_lock(&percpu_counters_lock); > list_del(&fbc->list); > Index: linux-2.6.25-rc7/mm/backing-dev.c > =================================================================== > --- linux-2.6.25-rc7.orig/mm/backing-dev.c 2008-03-31 14:50:39.000000000 +0200 > +++ linux-2.6.25-rc7/mm/backing-dev.c 2008-03-31 14:50:42.000000000 +0200 > @@ -4,12 +4,229 @@ > #include <linux/fs.h> > #include <linux/sched.h> > #include <linux/module.h> > +#include <linux/writeback.h> > +#include <linux/device.h> > + > + > +static struct class *bdi_class; > + > +#ifdef CONFIG_DEBUG_FS > +#include <linux/debugfs.h> > +#include <linux/seq_file.h> > + > +static struct dentry *bdi_debug_root; > + > +static void bdi_debug_init(void) > +{ > + bdi_debug_root = debugfs_create_dir("bdi", NULL); > +} > + > +static int bdi_debug_stats_show(struct seq_file *m, void *v) > +{ > + struct backing_dev_info *bdi = m->private; > + long background_thresh; > + long dirty_thresh; > + long bdi_thresh; > + > + get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi); > + > +#define K(x) ((x) << (PAGE_SHIFT - 10)) > + seq_printf(m, > + "BdiWriteback: %8lu kB\n" > + "BdiReclaimable: %8lu kB\n" > + "BdiDirtyThresh: %8lu kB\n" > + "DirtyThresh: %8lu kB\n" > + "BackgroundThresh: %8lu kB\n", > + (unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)), > + (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)), > + K(bdi_thresh), > + K(dirty_thresh), > + K(background_thresh)); > +#undef K > + > + return 0; > +} > + > +static int bdi_debug_stats_open(struct inode *inode, struct file *file) > +{ > + return single_open(file, bdi_debug_stats_show, inode->i_private); > +} > + > +static const struct file_operations bdi_debug_stats_fops = { > + .open = bdi_debug_stats_open, > + .read = seq_read, > + .llseek = seq_lseek, > + .release = single_release, > +}; > + > +static void bdi_debug_register(struct backing_dev_info *bdi, const char *name) > +{ > + bdi->debug_dir = debugfs_create_dir(name, bdi_debug_root); > + bdi->debug_stats = debugfs_create_file("stats", 0444, bdi->debug_dir, > + bdi, &bdi_debug_stats_fops); > +} > + > +static void bdi_debug_unregister(struct backing_dev_info *bdi) > +{ > + debugfs_remove(bdi->debug_stats); > + debugfs_remove(bdi->debug_dir); > +} > +#else > +static inline void bdi_debug_init(void) > +{ > +} > +static inline void bdi_debug_register(struct backing_dev_info *bdi, > + const char *name) > +{ > +} > +static inline void bdi_debug_unregister(struct backing_dev_info *bdi) > +{ > +} > +#endif > + > +static ssize_t read_ahead_kb_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + struct backing_dev_info *bdi = dev_get_drvdata(dev); > + char *end; > + unsigned long read_ahead_kb; > + ssize_t ret = -EINVAL; > + > + read_ahead_kb = simple_strtoul(buf, &end, 10); > + if (*buf && (end[0] == '\0' || (end[0] == '\n' && end[1] == '\0'))) { > + bdi->ra_pages = read_ahead_kb >> (PAGE_SHIFT - 10); > + ret = count; > + } > + return ret; > +} > + > +#define K(pages) ((pages) << (PAGE_SHIFT - 10)) > + > +#define BDI_SHOW(name, expr) \ > +static ssize_t name##_show(struct device *dev, \ > + struct device_attribute *attr, char *page) \ > +{ \ > + struct backing_dev_info *bdi = dev_get_drvdata(dev); \ > + \ > + return snprintf(page, PAGE_SIZE-1, "%lld\n", (long long)expr); \ > +} > + > +BDI_SHOW(read_ahead_kb, K(bdi->ra_pages)) > + > +static ssize_t min_ratio_store(struct device *dev, > + struct device_attribute *attr, const char *buf, size_t count) > +{ > + struct backing_dev_info *bdi = dev_get_drvdata(dev); > + char *end; > + unsigned int ratio; > + ssize_t ret = -EINVAL; > + > + ratio = simple_strtoul(buf, &end, 10); > + if (*buf && (end[0] == '\0' || (end[0] == '\n' && end[1] == '\0'))) { > + ret = bdi_set_min_ratio(bdi, ratio); > + if (!ret) > + ret = count; > + } > + return ret; > +} > +BDI_SHOW(min_ratio, bdi->min_ratio) > + > +static ssize_t max_ratio_store(struct device *dev, > + struct device_attribute *attr, const char *buf, size_t count) > +{ > + struct backing_dev_info *bdi = dev_get_drvdata(dev); > + char *end; > + unsigned int ratio; > + ssize_t ret = -EINVAL; > + > + ratio = simple_strtoul(buf, &end, 10); > + if (*buf && (end[0] == '\0' || (end[0] == '\n' && end[1] == '\0'))) { > + ret = bdi_set_max_ratio(bdi, ratio); > + if (!ret) > + ret = count; > + } > + return ret; > +} > +BDI_SHOW(max_ratio, bdi->max_ratio) > + > +#define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store) > + > +static struct device_attribute bdi_dev_attrs[] = { > + __ATTR_RW(read_ahead_kb), > + __ATTR_RW(min_ratio), > + __ATTR_RW(max_ratio), > + __ATTR_NULL, > +}; > + > +static __init int bdi_class_init(void) > +{ > + bdi_class = class_create(THIS_MODULE, "bdi"); > + bdi_class->dev_attrs = bdi_dev_attrs; > + bdi_debug_init(); > + return 0; > +} > + > +postcore_initcall(bdi_class_init); > + > +int bdi_register(struct backing_dev_info *bdi, struct device *parent, > + const char *fmt, ...) > +{ > + char *name; > + va_list args; > + int ret = 0; > + struct device *dev; > + > + va_start(args, fmt); > + name = kvasprintf(GFP_KERNEL, fmt, args); > + va_end(args); > + > + if (!name) > + return -ENOMEM; > + > + dev = device_create(bdi_class, parent, MKDEV(0, 0), name); > + if (IS_ERR(dev)) { > + ret = PTR_ERR(dev); > + goto exit; > + } > + > + bdi->dev = dev; > + dev_set_drvdata(bdi->dev, bdi); > + bdi_debug_register(bdi, name); > + > +exit: > + kfree(name); > + return ret; > +} > +EXPORT_SYMBOL(bdi_register); > + > +int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev) > +{ > + return bdi_register(bdi, NULL, "%u:%u", MAJOR(dev), MINOR(dev)); > +} > +EXPORT_SYMBOL(bdi_register_dev); > + > +void bdi_unregister(struct backing_dev_info *bdi) > +{ > + if (bdi->dev) { > + bdi_debug_unregister(bdi); > + device_unregister(bdi->dev); > + bdi->dev = NULL; > + } > +} > +EXPORT_SYMBOL(bdi_unregister); > > int bdi_init(struct backing_dev_info *bdi) > { > int i; > int err; > > + bdi->dev = NULL; > + > + bdi->min_ratio = 0; > + bdi->max_ratio = 100; > + bdi->max_prop_frac = PROP_FRAC_BASE; > + > for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { > err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0); > if (err) > @@ -33,6 +250,8 @@ void bdi_destroy(struct backing_dev_info > { > int i; > > + bdi_unregister(bdi); > + > for (i = 0; i < NR_BDI_STAT_ITEMS; i++) > percpu_counter_destroy(&bdi->bdi_stat[i]); > > Index: linux-2.6.25-rc7/mm/page-writeback.c > =================================================================== > --- linux-2.6.25-rc7.orig/mm/page-writeback.c 2008-03-31 14:50:39.000000000 +0200 > +++ linux-2.6.25-rc7/mm/page-writeback.c 2008-03-31 14:50:42.000000000 +0200 > @@ -164,9 +164,20 @@ int dirty_ratio_handler(struct ctl_table > */ > static inline void __bdi_writeout_inc(struct backing_dev_info *bdi) > { > - __prop_inc_percpu(&vm_completions, &bdi->completions); > + __prop_inc_percpu_max(&vm_completions, &bdi->completions, > + bdi->max_prop_frac); > } > > +void bdi_writeout_inc(struct backing_dev_info *bdi) > +{ > + unsigned long flags; > + > + local_irq_save(flags); > + __bdi_writeout_inc(bdi); > + local_irq_restore(flags); > +} > +EXPORT_SYMBOL_GPL(bdi_writeout_inc); > + > static inline void task_dirty_inc(struct task_struct *tsk) > { > prop_inc_single(&vm_dirties, &tsk->dirties); > @@ -200,7 +211,8 @@ clip_bdi_dirty_limit(struct backing_dev_ > avail_dirty = dirty - > (global_page_state(NR_FILE_DIRTY) + > global_page_state(NR_WRITEBACK) + > - global_page_state(NR_UNSTABLE_NFS)); > + global_page_state(NR_UNSTABLE_NFS) + > + global_page_state(NR_WRITEBACK_TEMP)); > > if (avail_dirty < 0) > avail_dirty = 0; > @@ -243,6 +255,55 @@ static void task_dirty_limit(struct task > } > > /* > + * > + */ > +static DEFINE_SPINLOCK(bdi_lock); > +static unsigned int bdi_min_ratio; > + > +int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio) > +{ > + int ret = 0; > + unsigned long flags; > + > + spin_lock_irqsave(&bdi_lock, flags); > + if (min_ratio > bdi->max_ratio) { > + ret = -EINVAL; > + } else { > + min_ratio -= bdi->min_ratio; > + if (bdi_min_ratio + min_ratio < 100) { > + bdi_min_ratio += min_ratio; > + bdi->min_ratio += min_ratio; > + } else { > + ret = -EINVAL; > + } > + } > + spin_unlock_irqrestore(&bdi_lock, flags); > + > + return ret; > +} > + > +int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned max_ratio) > +{ > + unsigned long flags; > + int ret = 0; > + > + if (max_ratio > 100) > + return -EINVAL; > + > + spin_lock_irqsave(&bdi_lock, flags); > + if (bdi->min_ratio > max_ratio) { > + ret = -EINVAL; > + } else { > + bdi->max_ratio = max_ratio; > + bdi->max_prop_frac = (PROP_FRAC_BASE * max_ratio) / 100; > + } > + spin_unlock_irqrestore(&bdi_lock, flags); > + > + return ret; > +} > +EXPORT_SYMBOL(bdi_set_max_ratio); > + > +/* > * Work out the current dirty-memory clamping and background writeout > * thresholds. > * > @@ -300,7 +361,7 @@ static unsigned long determine_dirtyable > return x + 1; /* Ensure that we never return 0 */ > } > > -static void > +void > get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, > struct backing_dev_info *bdi) > { > @@ -330,7 +391,7 @@ get_dirty_limits(long *pbackground, long > *pdirty = dirty; > > if (bdi) { > - u64 bdi_dirty = dirty; > + u64 bdi_dirty; > long numerator, denominator; > > /* > @@ -338,8 +399,12 @@ get_dirty_limits(long *pbackground, long > */ > bdi_writeout_fraction(bdi, &numerator, &denominator); > > + bdi_dirty = (dirty * (100 - bdi_min_ratio)) / 100; > bdi_dirty *= numerator; > do_div(bdi_dirty, denominator); > + bdi_dirty += (dirty * bdi->min_ratio) / 100; > + if (bdi_dirty > (dirty * bdi->max_ratio) / 100) > + bdi_dirty = dirty * bdi->max_ratio / 100; > > *pbdi_dirty = bdi_dirty; > clip_bdi_dirty_limit(bdi, dirty, pbdi_dirty); > @@ -1192,7 +1257,7 @@ int test_clear_page_writeback(struct pag > radix_tree_tag_clear(&mapping->page_tree, > page_index(page), > PAGECACHE_TAG_WRITEBACK); > - if (bdi_cap_writeback_dirty(bdi)) { > + if (bdi_cap_account_writeback(bdi)) { > __dec_bdi_stat(bdi, BDI_WRITEBACK); > __bdi_writeout_inc(bdi); > } > @@ -1221,7 +1286,7 @@ int test_set_page_writeback(struct page > radix_tree_tag_set(&mapping->page_tree, > page_index(page), > PAGECACHE_TAG_WRITEBACK); > - if (bdi_cap_writeback_dirty(bdi)) > + if (bdi_cap_account_writeback(bdi)) > __inc_bdi_stat(bdi, BDI_WRITEBACK); > } > if (!PageDirty(page)) > Index: linux-2.6.25-rc7/mm/readahead.c > =================================================================== > --- linux-2.6.25-rc7.orig/mm/readahead.c 2008-03-31 14:50:39.000000000 +0200 > +++ linux-2.6.25-rc7/mm/readahead.c 2008-03-31 14:50:42.000000000 +0200 > @@ -235,7 +235,13 @@ unsigned long max_sane_readahead(unsigne > > static int __init readahead_init(void) > { > - return bdi_init(&default_backing_dev_info); > + int err; > + > + err = bdi_init(&default_backing_dev_info); > + if (!err) > + bdi_register(&default_backing_dev_info, NULL, "default"); > + > + return err; > } > subsys_initcall(readahead_init); > > Index: linux-2.6.25-rc7/fs/nfs/super.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/nfs/super.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/nfs/super.c 2008-03-31 14:50:42.000000000 +0200 > @@ -1507,6 +1507,11 @@ static int nfs_compare_super(struct supe > return nfs_compare_mount_options(sb, server, mntflags); > } > > +static int nfs_bdi_register(struct nfs_server *server) > +{ > + return bdi_register_dev(&server->backing_dev_info, server->s_dev); > +} > + > static int nfs_get_sb(struct file_system_type *fs_type, > int flags, const char *dev_name, void *raw_data, struct vfsmount *mnt) > { > @@ -1549,6 +1554,10 @@ static int nfs_get_sb(struct file_system > if (s->s_fs_info != server) { > nfs_free_server(server); > server = NULL; > + } else { > + error = nfs_bdi_register(server); > + if (error) > + goto error_splat_super; > } > > if (!s->s_root) { > @@ -1596,6 +1605,7 @@ static void nfs_kill_super(struct super_ > { > struct nfs_server *server = NFS_SB(s); > > + bdi_unregister(&server->backing_dev_info); > kill_anon_super(s); > nfs_free_server(server); > } > @@ -1640,6 +1650,10 @@ static int nfs_xdev_get_sb(struct file_s > if (s->s_fs_info != server) { > nfs_free_server(server); > server = NULL; > + } else { > + error = nfs_bdi_register(server); > + if (error) > + goto error_splat_super; > } > > if (!s->s_root) { > @@ -1935,6 +1949,10 @@ static int nfs4_get_sb(struct file_syste > if (s->s_fs_info != server) { > nfs_free_server(server); > server = NULL; > + } else { > + error = nfs_bdi_register(server); > + if (error) > + goto error_splat_super; > } > > if (!s->s_root) { > @@ -2021,6 +2039,10 @@ static int nfs4_xdev_get_sb(struct file_ > if (s->s_fs_info != server) { > nfs_free_server(server); > server = NULL; > + } else { > + error = nfs_bdi_register(server); > + if (error) > + goto error_splat_super; > } > > if (!s->s_root) { > @@ -2100,6 +2122,10 @@ static int nfs4_referral_get_sb(struct f > if (s->s_fs_info != server) { > nfs_free_server(server); > server = NULL; > + } else { > + error = nfs_bdi_register(server); > + if (error) > + goto error_splat_super; > } > > if (!s->s_root) { > Index: linux-2.6.25-rc7/fs/fuse/control.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/fuse/control.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/fuse/control.c 2008-03-31 14:50:42.000000000 +0200 > @@ -117,7 +117,7 @@ int fuse_ctl_add_conn(struct fuse_conn * > > parent = fuse_control_sb->s_root; > inc_nlink(parent->d_inode); > - sprintf(name, "%llu", (unsigned long long) fc->id); > + sprintf(name, "%u", fc->dev); > parent = fuse_ctl_add_dentry(parent, fc, name, S_IFDIR | 0500, 2, > &simple_dir_inode_operations, > &simple_dir_operations); > Index: linux-2.6.25-rc7/fs/fuse/fuse_i.h > =================================================================== > --- linux-2.6.25-rc7.orig/fs/fuse/fuse_i.h 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/fuse/fuse_i.h 2008-03-31 14:50:42.000000000 +0200 > @@ -15,6 +15,7 @@ > #include <linux/mm.h> > #include <linux/backing-dev.h> > #include <linux/mutex.h> > +#include <linux/rwsem.h> > > /** Max number of pages that can be used in a single read request */ > #define FUSE_MAX_PAGES_PER_REQ 32 > @@ -25,6 +26,9 @@ > /** Congestion starts at 75% of maximum */ > #define FUSE_CONGESTION_THRESHOLD (FUSE_MAX_BACKGROUND * 75 / 100) > > +/** Bias for fi->writectr, meaning new writepages must not be sent */ > +#define FUSE_NOWRITE INT_MIN > + > /** It could be as large as PATH_MAX, but would that have any uses? */ > #define FUSE_NAME_MAX 1024 > > @@ -73,6 +77,19 @@ struct fuse_inode { > > /** Files usable in writepage. Protected by fc->lock */ > struct list_head write_files; > + > + /** Writepages pending on truncate or fsync */ > + struct list_head queued_writes; > + > + /** Number of sent writes, a negative bias (FUSE_NOWRITE) > + * means more writes are blocked */ > + int writectr; > + > + /** Waitq for writepage completion */ > + wait_queue_head_t page_waitq; > + > + /** List of writepage requestst (pending or sent) */ > + struct list_head writepages; > }; > > /** FUSE specific file data */ > @@ -242,6 +259,12 @@ struct fuse_req { > /** File used in the request (or NULL) */ > struct fuse_file *ff; > > + /** Inode used in the request or NULL */ > + struct inode *inode; > + > + /** Link on fi->writepages */ > + struct list_head writepages_entry; > + > /** Request completion callback */ > void (*end)(struct fuse_conn *, struct fuse_req *); > > @@ -390,8 +413,8 @@ struct fuse_conn { > /** Entry on the fuse_conn_list */ > struct list_head entry; > > - /** Unique ID */ > - u64 id; > + /** Device ID from super block */ > + dev_t dev; > > /** Dentries in the control filesystem */ > struct dentry *ctl_dentry[FUSE_CTL_NUM_DENTRIES]; > @@ -504,6 +527,11 @@ void fuse_init_symlink(struct inode *ino > void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, > u64 attr_valid, u64 attr_version); > > +void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, > + u64 attr_valid); > + > +void fuse_truncate(struct address_space *mapping, loff_t offset); > + > /** > * Initialize the client device > */ > @@ -522,6 +550,8 @@ void fuse_ctl_cleanup(void); > */ > struct fuse_req *fuse_request_alloc(void); > > +struct fuse_req *fuse_request_alloc_nofs(void); > + > /** > * Free a request > */ > @@ -558,6 +588,8 @@ void request_send_noreply(struct fuse_co > */ > void request_send_background(struct fuse_conn *fc, struct fuse_req *req); > > +void request_send_background_locked(struct fuse_conn *fc, struct fuse_req *req); > + > /* Abort all requests */ > void fuse_abort_conn(struct fuse_conn *fc); > > @@ -600,3 +632,8 @@ u64 fuse_lock_owner_id(struct fuse_conn > > int fuse_update_attributes(struct inode *inode, struct kstat *stat, > struct file *file, bool *refreshed); > + > +void fuse_flush_writepages(struct inode *inode); > + > +void fuse_set_nowrite(struct inode *inode); > +void fuse_release_nowrite(struct inode *inode); > Index: linux-2.6.25-rc7/fs/fuse/inode.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/fuse/inode.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/fuse/inode.c 2008-03-31 14:50:42.000000000 +0200 > @@ -59,7 +59,11 @@ static struct inode *fuse_alloc_inode(st > fi->nodeid = 0; > fi->nlookup = 0; > fi->attr_version = 0; > + fi->writectr = 0; > INIT_LIST_HEAD(&fi->write_files); > + INIT_LIST_HEAD(&fi->queued_writes); > + INIT_LIST_HEAD(&fi->writepages); > + init_waitqueue_head(&fi->page_waitq); > fi->forget_req = fuse_request_alloc(); > if (!fi->forget_req) { > kmem_cache_free(fuse_inode_cachep, inode); > @@ -73,6 +77,7 @@ static void fuse_destroy_inode(struct in > { > struct fuse_inode *fi = get_fuse_inode(inode); > BUG_ON(!list_empty(&fi->write_files)); > + BUG_ON(!list_empty(&fi->queued_writes)); > if (fi->forget_req) > fuse_request_free(fi->forget_req); > kmem_cache_free(fuse_inode_cachep, inode); > @@ -109,7 +114,7 @@ static int fuse_remount_fs(struct super_ > return 0; > } > > -static void fuse_truncate(struct address_space *mapping, loff_t offset) > +void fuse_truncate(struct address_space *mapping, loff_t offset) > { > /* See vmtruncate() */ > unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1); > @@ -117,19 +122,12 @@ static void fuse_truncate(struct address > unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1); > } > > - > -void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, > - u64 attr_valid, u64 attr_version) > +void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, > + u64 attr_valid) > { > struct fuse_conn *fc = get_fuse_conn(inode); > struct fuse_inode *fi = get_fuse_inode(inode); > - loff_t oldsize; > > - spin_lock(&fc->lock); > - if (attr_version != 0 && fi->attr_version > attr_version) { > - spin_unlock(&fc->lock); > - return; > - } > fi->attr_version = ++fc->attr_version; > fi->i_time = attr_valid; > > @@ -159,6 +157,22 @@ void fuse_change_attributes(struct inode > fi->orig_i_mode = inode->i_mode; > if (!(fc->flags & FUSE_DEFAULT_PERMISSIONS)) > inode->i_mode &= ~S_ISVTX; > +} > + > +void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, > + u64 attr_valid, u64 attr_version) > +{ > + struct fuse_conn *fc = get_fuse_conn(inode); > + struct fuse_inode *fi = get_fuse_inode(inode); > + loff_t oldsize; > + > + spin_lock(&fc->lock); > + if (attr_version != 0 && fi->attr_version > attr_version) { > + spin_unlock(&fc->lock); > + return; > + } > + > + fuse_change_attributes_common(inode, attr, attr_valid); > > oldsize = inode->i_size; > i_size_write(inode, attr->size); > @@ -448,7 +462,7 @@ static int fuse_show_options(struct seq_ > return 0; > } > > -static struct fuse_conn *new_conn(void) > +static struct fuse_conn *new_conn(struct super_block *sb) > { > struct fuse_conn *fc; > int err; > @@ -469,19 +483,41 @@ static struct fuse_conn *new_conn(void) > atomic_set(&fc->num_waiting, 0); > fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; > fc->bdi.unplug_io_fn = default_unplug_io_fn; > + /* fuse does it's own writeback accounting */ > + fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB; > + fc->dev = sb->s_dev; > err = bdi_init(&fc->bdi); > - if (err) { > - kfree(fc); > - fc = NULL; > - goto out; > - } > + if (err) > + goto error_kfree; > + err = bdi_register_dev(&fc->bdi, fc->dev); > + if (err) > + goto error_bdi_destroy; > + /* > + * For a single fuse filesystem use max 1% of dirty + > + * writeback threshold. > + * > + * This gives about 1M of write buffer for memory maps on a > + * machine with 1G and 10% dirty_ratio, which should be more > + * than enough. > + * > + * Privileged users can raise it by writing to > + * > + * /sys/class/bdi/<bdi>/max_ratio > + */ > + bdi_set_max_ratio(&fc->bdi, 1); > fc->reqctr = 0; > fc->blocked = 1; > fc->attr_version = 1; > get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); > } > -out: > return fc; > + > +error_bdi_destroy: > + bdi_destroy(&fc->bdi); > +error_kfree: > + mutex_destroy(&fc->inst_mutex); > + kfree(fc); > + return NULL; > } > > void fuse_conn_put(struct fuse_conn *fc) > @@ -579,12 +615,6 @@ static void fuse_send_init(struct fuse_c > request_send_background(fc, req); > } > > -static u64 conn_id(void) > -{ > - static u64 ctr = 1; > - return ctr++; > -} > - > static int fuse_fill_super(struct super_block *sb, void *data, int silent) > { > struct fuse_conn *fc; > @@ -622,7 +652,7 @@ static int fuse_fill_super(struct super_ > if (file->f_op != &fuse_dev_operations) > return -EINVAL; > > - fc = new_conn(); > + fc = new_conn(sb); > if (!fc) > return -ENOMEM; > > @@ -660,7 +690,6 @@ static int fuse_fill_super(struct super_ > if (file->private_data) > goto err_unlock; > > - fc->id = conn_id(); > err = fuse_ctl_add_conn(fc); > if (err) > goto err_unlock; > Index: linux-2.6.25-rc7/include/linux/proportions.h > =================================================================== > --- linux-2.6.25-rc7.orig/include/linux/proportions.h 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/include/linux/proportions.h 2008-03-31 14:50:42.000000000 +0200 > @@ -78,6 +78,19 @@ void prop_inc_percpu(struct prop_descrip > } > > /* > + * Limit the time part in order to ensure there are some bits left for the > + * cycle counter and fraction multiply. > + */ > +#define PROP_MAX_SHIFT (3*BITS_PER_LONG/4) > + > +#define PROP_FRAC_SHIFT (BITS_PER_LONG - PROP_MAX_SHIFT - 1) > +#define PROP_FRAC_BASE (1UL << PROP_FRAC_SHIFT) > + > +void __prop_inc_percpu_max(struct prop_descriptor *pd, > + struct prop_local_percpu *pl, long frac); > + > + > +/* > * ----- SINGLE ------ > */ > > Index: linux-2.6.25-rc7/lib/proportions.c > =================================================================== > --- linux-2.6.25-rc7.orig/lib/proportions.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/lib/proportions.c 2008-03-31 14:50:42.000000000 +0200 > @@ -73,12 +73,6 @@ > #include <linux/proportions.h> > #include <linux/rcupdate.h> > > -/* > - * Limit the time part in order to ensure there are some bits left for the > - * cycle counter. > - */ > -#define PROP_MAX_SHIFT (3*BITS_PER_LONG/4) > - > int prop_descriptor_init(struct prop_descriptor *pd, int shift) > { > int err; > @@ -268,6 +262,38 @@ void __prop_inc_percpu(struct prop_descr > } > > /* > + * identical to __prop_inc_percpu, except that it limits this pl's fraction to > + * @frac/PROP_FRAC_BASE by ignoring events when this limit has been exceeded. > + */ > +void __prop_inc_percpu_max(struct prop_descriptor *pd, > + struct prop_local_percpu *pl, long frac) > +{ > + struct prop_global *pg = prop_get_global(pd); > + > + prop_norm_percpu(pg, pl); > + > + if (unlikely(frac != PROP_FRAC_BASE)) { > + unsigned long period_2 = 1UL << (pg->shift - 1); > + unsigned long counter_mask = period_2 - 1; > + unsigned long global_count; > + long numerator, denominator; > + > + numerator = percpu_counter_read_positive(&pl->events); > + global_count = percpu_counter_read(&pg->events); > + denominator = period_2 + (global_count & counter_mask); > + > + if (numerator > ((denominator * frac) >> PROP_FRAC_SHIFT)) > + goto out_put; > + } > + > + percpu_counter_add(&pl->events, 1); > + percpu_counter_add(&pg->events, 1); > + > +out_put: > + prop_put_global(pd, pg); > +} > + > +/* > * Obtain a fraction of this proportion > * > * p_{j} = x_{j} / (period/2 + t % period/2) > Index: linux-2.6.25-rc7/fs/configfs/inode.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/configfs/inode.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/configfs/inode.c 2008-03-31 14:50:42.000000000 +0200 > @@ -47,7 +47,7 @@ static const struct address_space_operat > > static struct backing_dev_info configfs_backing_dev_info = { > .ra_pages = 0, /* No readahead */ > - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, > }; > > static const struct inode_operations configfs_inode_operations ={ > Index: linux-2.6.25-rc7/fs/hugetlbfs/inode.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/hugetlbfs/inode.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/hugetlbfs/inode.c 2008-03-31 14:50:42.000000000 +0200 > @@ -45,7 +45,7 @@ static const struct inode_operations hug > > static struct backing_dev_info hugetlbfs_backing_dev_info = { > .ra_pages = 0, /* No readahead */ > - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, > }; > > int sysctl_hugetlb_shm_group; > Index: linux-2.6.25-rc7/fs/ocfs2/dlm/dlmfs.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/ocfs2/dlm/dlmfs.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/ocfs2/dlm/dlmfs.c 2008-03-31 14:50:42.000000000 +0200 > @@ -327,7 +327,7 @@ clear_fields: > > static struct backing_dev_info dlmfs_backing_dev_info = { > .ra_pages = 0, /* No readahead */ > - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, > }; > > static struct inode *dlmfs_get_root_inode(struct super_block *sb) > Index: linux-2.6.25-rc7/fs/ramfs/inode.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/ramfs/inode.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/ramfs/inode.c 2008-03-31 14:50:42.000000000 +0200 > @@ -44,7 +44,7 @@ static const struct inode_operations ram > > static struct backing_dev_info ramfs_backing_dev_info = { > .ra_pages = 0, /* No readahead */ > - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK | > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | > BDI_CAP_MAP_DIRECT | BDI_CAP_MAP_COPY | > BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP, > }; > Index: linux-2.6.25-rc7/fs/sysfs/inode.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/sysfs/inode.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/sysfs/inode.c 2008-03-31 14:50:42.000000000 +0200 > @@ -30,7 +30,7 @@ static const struct address_space_operat > > static struct backing_dev_info sysfs_backing_dev_info = { > .ra_pages = 0, /* No readahead */ > - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, > }; > > static const struct inode_operations sysfs_inode_operations ={ > Index: linux-2.6.25-rc7/kernel/cgroup.c > =================================================================== > --- linux-2.6.25-rc7.orig/kernel/cgroup.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/kernel/cgroup.c 2008-03-31 14:50:42.000000000 +0200 > @@ -562,7 +562,7 @@ static struct inode_operations cgroup_di > static struct file_operations proc_cgroupstats_operations; > > static struct backing_dev_info cgroup_backing_dev_info = { > - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, > }; > > static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb) > Index: linux-2.6.25-rc7/mm/shmem.c > =================================================================== > --- linux-2.6.25-rc7.orig/mm/shmem.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/mm/shmem.c 2008-03-31 14:50:42.000000000 +0200 > @@ -201,7 +201,7 @@ static struct vm_operations_struct shmem > > static struct backing_dev_info shmem_backing_dev_info __read_mostly = { > .ra_pages = 0, /* No readahead */ > - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, > .unplug_io_fn = default_unplug_io_fn, > }; > > Index: linux-2.6.25-rc7/mm/swap_state.c > =================================================================== > --- linux-2.6.25-rc7.orig/mm/swap_state.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/mm/swap_state.c 2008-03-31 14:50:42.000000000 +0200 > @@ -33,7 +33,7 @@ static const struct address_space_operat > }; > > static struct backing_dev_info swap_backing_dev_info = { > - .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK, > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK, > .unplug_io_fn = swap_unplug_io_fn, > }; > > Index: linux-2.6.25-rc7/drivers/base/node.c > =================================================================== > --- linux-2.6.25-rc7.orig/drivers/base/node.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/drivers/base/node.c 2008-03-31 14:50:42.000000000 +0200 > @@ -64,6 +64,7 @@ static ssize_t node_read_meminfo(struct > "Node %d PageTables: %8lu kB\n" > "Node %d NFS_Unstable: %8lu kB\n" > "Node %d Bounce: %8lu kB\n" > + "Node %d WritebackTmp: %8lu kB\n" > "Node %d Slab: %8lu kB\n" > "Node %d SReclaimable: %8lu kB\n" > "Node %d SUnreclaim: %8lu kB\n", > @@ -86,6 +87,7 @@ static ssize_t node_read_meminfo(struct > nid, K(node_page_state(nid, NR_PAGETABLE)), > nid, K(node_page_state(nid, NR_UNSTABLE_NFS)), > nid, K(node_page_state(nid, NR_BOUNCE)), > + nid, K(node_page_state(nid, NR_WRITEBACK_TEMP)), > nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) + > node_page_state(nid, NR_SLAB_UNRECLAIMABLE)), > nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)), > Index: linux-2.6.25-rc7/fs/proc/proc_misc.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/proc/proc_misc.c 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/fs/proc/proc_misc.c 2008-03-31 14:50:42.000000000 +0200 > @@ -179,6 +179,7 @@ static int meminfo_read_proc(char *page, > "PageTables: %8lu kB\n" > "NFS_Unstable: %8lu kB\n" > "Bounce: %8lu kB\n" > + "WritebackTmp: %8lu kB\n" > "CommitLimit: %8lu kB\n" > "Committed_AS: %8lu kB\n" > "VmallocTotal: %8lu kB\n" > @@ -210,6 +211,7 @@ static int meminfo_read_proc(char *page, > K(global_page_state(NR_PAGETABLE)), > K(global_page_state(NR_UNSTABLE_NFS)), > K(global_page_state(NR_BOUNCE)), > + K(global_page_state(NR_WRITEBACK_TEMP)), > K(allowed), > K(committed), > (unsigned long)VMALLOC_TOTAL >> 10, > Index: linux-2.6.25-rc7/include/linux/mmzone.h > =================================================================== > --- linux-2.6.25-rc7.orig/include/linux/mmzone.h 2008-03-31 14:50:38.000000000 +0200 > +++ linux-2.6.25-rc7/include/linux/mmzone.h 2008-03-31 14:50:42.000000000 +0200 > @@ -95,6 +95,7 @@ enum zone_stat_item { > NR_UNSTABLE_NFS, /* NFS unstable pages */ > NR_BOUNCE, > NR_VMSCAN_WRITE, > + NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */ > #ifdef CONFIG_NUMA > NUMA_HIT, /* allocated in intended node */ > NUMA_MISS, /* allocated in non intended node */ > Index: linux-2.6.25-rc7/Documentation/filesystems/proc.txt > =================================================================== > --- linux-2.6.25-rc7.orig/Documentation/filesystems/proc.txt 2008-03-31 14:50:37.000000000 +0200 > +++ linux-2.6.25-rc7/Documentation/filesystems/proc.txt 2008-03-31 14:50:42.000000000 +0200 > @@ -462,11 +462,17 @@ SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 968 kB > Writeback: 0 kB > +AnonPages: 861800 kB > Mapped: 280372 kB > -Slab: 684068 kB > +Slab: 284364 kB > +SReclaimable: 159856 kB > +SUnreclaim: 124508 kB > +PageTables: 24448 kB > +NFS_Unstable: 0 kB > +Bounce: 0 kB > +WritebackTmp: 0 kB > CommitLimit: 7669796 kB > Committed_AS: 100056 kB > -PageTables: 24448 kB > VmallocTotal: 112216 kB > VmallocUsed: 428 kB > VmallocChunk: 111088 kB > @@ -502,8 +508,17 @@ VmallocChunk: 111088 kB > on the disk > Dirty: Memory which is waiting to get written back to the disk > Writeback: Memory which is actively being written back to the disk > + AnonPages: Non-file backed pages mapped into userspace page tables > Mapped: files which have been mmaped, such as libraries > Slab: in-kernel data structures cache > +SReclaimable: Part of Slab, that might be reclaimed, such as caches > + SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure > + PageTables: amount of memory dedicated to the lowest level of page > + tables. > +NFS_Unstable: NFS pages sent to the server, but not yet committed to stable > + storage > + Bounce: Memory used for block device "bounce buffers" > +WritebackTmp: Memory used by FUSE for temporary writeback buffers > CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), > this is the total amount of memory currently available to > be allocated on the system. This limit is only adhered to > @@ -530,8 +545,6 @@ Committed_AS: The amount of memory prese > above) will not be permitted. This is useful if one needs > to guarantee that processes will not fail due to lack of > memory once that memory has been successfully allocated. > - PageTables: amount of memory dedicated to the lowest level of page > - tables. > VmallocTotal: total size of vmalloc memory area > VmallocUsed: amount of vmalloc area which is used > VmallocChunk: largest contigious block of vmalloc area which is free > Index: linux-2.6.25-rc7/fs/fuse/dev.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/fuse/dev.c 2008-03-31 14:50:37.000000000 +0200 > +++ linux-2.6.25-rc7/fs/fuse/dev.c 2008-03-31 14:50:42.000000000 +0200 > @@ -47,6 +47,14 @@ struct fuse_req *fuse_request_alloc(void > return req; > } > > +struct fuse_req *fuse_request_alloc_nofs(void) > +{ > + struct fuse_req *req = kmem_cache_alloc(fuse_req_cachep, GFP_NOFS); > + if (req) > + fuse_request_init(req); > + return req; > +} > + > void fuse_request_free(struct fuse_req *req) > { > kmem_cache_free(fuse_req_cachep, req); > @@ -430,6 +438,17 @@ void request_send_background(struct fuse > } > > /* > + * Called under fc->lock > + * > + * fc->connected must have been checked previously > + */ > +void request_send_background_locked(struct fuse_conn *fc, struct fuse_req *req) > +{ > + req->isreply = 1; > + request_send_nowait_locked(fc, req); > +} > + > +/* > * Lock the request. Up to the next unlock_request() there mustn't be > * anything that could cause a page-fault. If the request was already > * aborted bail out. > Index: linux-2.6.25-rc7/fs/fuse/dir.c > =================================================================== > --- linux-2.6.25-rc7.orig/fs/fuse/dir.c 2008-03-31 14:50:37.000000000 +0200 > +++ linux-2.6.25-rc7/fs/fuse/dir.c 2008-03-31 14:50:42.000000000 +0200 > @@ -1107,6 +1107,50 @@ static void iattr_to_fattr(struct iattr > } > > /* > + * Prevent concurrent writepages on inode > + * > + * This is done by adding a negative bias to the inode write counter > + * and waiting for all pending writes to finish. > + */ > +void fuse_set_nowrite(struct inode *inode) > +{ > + struct fuse_conn *fc = get_fuse_conn(inode); > + struct fuse_inode *fi = get_fuse_inode(inode); > + > + BUG_ON(!mutex_is_locked(&inode->i_mutex)); > + > + spin_lock(&fc->lock); > + BUG_ON(fi->writectr < 0); > + fi->writectr += FUSE_NOWRITE; > + spin_unlock(&fc->lock); > + wait_event(fi->page_waitq, fi->writectr == FUSE_NOWRITE); > +} > + > +/* > + * Allow writepages on inode > + * > + * Remove the bias from the writecounter and send any queued > + * writepages. > + */ > +static void __fuse_release_nowrite(struct inode *inode) > +{ > + struct fuse_inode *fi = get_fuse_inode(inode); > + > + BUG_ON(fi->writectr != FUSE_NOWRITE); > + fi->writectr = 0; > + fuse_flush_writepages(inode); > +} > + > +void fuse_release_nowrite(struct inode *inode) > +{ > + struct fuse_conn *fc = get_fuse_conn(inode); > + > + spin_lock(&fc->lock); > + __fuse_release_nowrite(inode); > + spin_unlock(&fc->lock); > +} > + > +/* > * Set attributes, and at the same time refresh them. > * > * Truncation is slightly complicated, because the 'truncate' request > @@ -1122,6 +1166,8 @@ static int fuse_do_setattr(struct dentry > struct fuse_req *req; > struct fuse_setattr_in inarg; > struct fuse_attr_out outarg; > + bool is_truncate = false; > + loff_t oldsize; > int err; > > if (!fuse_allow_task(fc, current)) > @@ -1145,12 +1191,16 @@ static int fuse_do_setattr(struct dentry > send_sig(SIGXFSZ, current, 0); > return -EFBIG; > } > + is_truncate = true; > } > > req = fuse_get_req(fc); > if (IS_ERR(req)) > return PTR_ERR(req); > > + if (is_truncate) > + fuse_set_nowrite(inode); > + > memset(&inarg, 0, sizeof(inarg)); > memset(&outarg, 0, sizeof(outarg)); > iattr_to_fattr(attr, &inarg); > @@ -1181,16 +1231,44 @@ static int fuse_do_setattr(struct dentry > if (err) { > if (err == -EINTR) > fuse_invalidate_attr(inode); > - return err; > + goto error; > } > > if ((inode->i_mode ^ outarg.attr.mode) & S_IFMT) { > make_bad_inode(inode); > - return -EIO; > + err = -EIO; > + goto error; > + } > + > + spin_lock(&fc->lock); > + fuse_change_attributes_common(inode, &outarg.attr, > + attr_timeout(&outarg)); > + oldsize = inode->i_size; > + i_size_write(inode, outarg.attr.size); > + > + if (is_truncate) { > + /* NOTE: this may release/reacquire fc->lock */ > + __fuse_release_nowrite(inode); > + } > + spin_unlock(&fc->lock); > + > + /* > + * Only call invalidate_inode_pages2() after removing > + * FUSE_NOWRITE, otherwise fuse_launder_page() would deadlock. > + */ > + if (S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { > + if (outarg.attr.size < oldsize) > + fuse_truncate(inode->i_mapping, outarg.attr.size); > + invalidate_inode_pages2(inode->i_mapping); > } > > - fuse_change_attributes(inode, &outarg.attr, attr_timeout(&outarg), 0); > return 0; > + > +error: > + if (is_truncate) > + fuse_release_nowrite(inode); > + > + return err; > } > > static int fuse_setattr(struct dentry *entry, struct iattr *attr) > Index: linux-2.6.25-rc7/fs/fuse/file.c > ================================================================... [truncated message content] |
From: Miklos S. <mi...@sz...> - 2008-04-22 08:58:40
|
> Thank you Milkos. I've finally got around to applying this patch > (been busy with day job, etc). I applied it to the latest linus git > tree, and it applied cleanly, and rebooted into this kernel (which I > know was successful, since I had to manually hack nvidia proprietary > drivers and vmware properietary modules to work again with 2.6.25), > but I am still seeing 4096 bytes writes, even when I do dd > if=/dev/zero of=ZERO bs=1M count=100 on my fuse filesystem. > > Do I need to be running the latest fuse for this to work? It looks > like I am using the 2.7.0 release fuse. Umm, strange. The fuse release shouldn't matter. Do you see a directory named /sys/class/bdi/ ? If not, then something went wrong with the patching process. Thanks, Miklos |
From: Amar S. T. <am...@zr...> - 2008-05-16 02:00:33
|
Hi, With the following patch, in fuse kernel src , we were able to get 1M reads. ---------------------------- diff -aurN fuse-2.7.3/kernel/dev.c fuse-2.7.3glfs10/kernel/dev.c --- fuse-2.7.3/kernel/dev.c 2008-02-19 11:51:24.000000000 -0800 +++ fuse-2.7.3glfs10/kernel/dev.c 2008-04-21 16:54:50.000000000 -0700 @@ -83,7 +83,7 @@ { req->in.h.uid = current->fsuid; req->in.h.gid = current->fsgid; - req->in.h.pid = current->pid; + req->in.h.pid = current->tgid; } struct fuse_req *fuse_get_req(struct fuse_conn *fc) diff -aurN fuse-2.7.3/kernel/dir.c fuse-2.7.3glfs10/kernel/dir.c --- fuse-2.7.3/kernel/dir.c 2008-02-19 11:51:24.000000000 -0800 +++ fuse-2.7.3glfs10/kernel/dir.c 2008-04-21 16:54:50.000000000 -0700 @@ -380,6 +380,10 @@ d_instantiate(entry, inode); fuse_invalidate_attr(dir); fuse_change_timeout(entry, &outentry); + if (flags & O_DIRECT) { + outopen.open_flags |= FOPEN_DIRECT_IO; + nd->intent.open.flags &= ~O_DIRECT; + } file = lookup_instantiate_filp(nd, entry, generic_file_open); if (IS_ERR(file)) { ff->fh = outopen.fh; @@ -1135,11 +1139,12 @@ { struct inode *inode = entry->d_inode; int err = fuse_revalidate(entry); - if (!err) + if (!err) { /* FIXME: may want specialized function because of st_blksize on block devices on 2.6.19+ */ generic_fillattr(inode, stat); - + stat->blksize = GLUSTER_BLKSIZE; + } return err; } diff -aurN fuse-2.7.3/kernel/file.c fuse-2.7.3glfs10/kernel/file.c --- fuse-2.7.3/kernel/file.c 2008-02-19 11:51:24.000000000 -0800 +++ fuse-2.7.3glfs10/kernel/file.c 2008-04-21 16:54:50.000000000 -0700 @@ -91,10 +91,6 @@ struct fuse_file *ff; int err; - /* VFS checks this, but only _after_ ->open() */ - if (file->f_flags & O_DIRECT) - return -EINVAL; - err = generic_file_open(inode, file); if (err) return err; @@ -115,6 +111,19 @@ if (err) fuse_file_free(ff); else { + if (file->f_flags & O_DIRECT) { + /* set fuse_direct_io_file_operations as fops + in fuse_finish_open as though the FS + enforced direct_io + */ + outarg.open_flags |= FOPEN_DIRECT_IO; + /* make VFS think this is a regular open + to make it not check for aops->direct_IO. + the same effect is acheived by setting fops + to fuse_direct_io_file_operations + */ + file->f_flags &= ~O_DIRECT; + } if (isdir) outarg.open_flags &= ~FOPEN_DIRECT_IO; fuse_finish_open(inode, file, ff, &outarg); @@ -578,6 +587,8 @@ } nbytes = (req->num_pages << PAGE_SHIFT) - req->page_offset; nbytes = min(count, nbytes); + nbytes = min(nmax, nbytes); + if (write) nres = fuse_send_write(req, file, inode, pos, nbytes); else @@ -865,6 +876,7 @@ .release = fuse_release, .fsync = fuse_fsync, .lock = fuse_file_lock, + .flock = fuse_file_lock, #ifdef KERNEL_2_6_23_PLUS .splice_read = generic_file_splice_read, #else @@ -881,6 +893,7 @@ .release = fuse_release, .fsync = fuse_fsync, .lock = fuse_file_lock, + .flock = fuse_file_lock, /* no mmap and sendfile */ }; diff -aurN fuse-2.7.3/kernel/fuse_i.h fuse-2.7.3glfs10/kernel/fuse_i.h --- fuse-2.7.3/kernel/fuse_i.h 2008-02-19 11:51:24.000000000 -0800 +++ fuse-2.7.3glfs10/kernel/fuse_i.h 2008-04-21 16:54:50.000000000 -0700 @@ -102,7 +102,8 @@ #endif /** Max number of pages that can be used in a single read request */ -#define FUSE_MAX_PAGES_PER_REQ 32 +/** Default 32, changed to 257 for GlusterFS */ +#define FUSE_MAX_PAGES_PER_REQ 257 /** Maximum number of outstanding background requests */ #define FUSE_MAX_BACKGROUND 10 @@ -122,6 +123,11 @@ doing the mount will be allowed to access the filesystem */ #define FUSE_ALLOW_OTHER (1 << 1) +/** GlusterFS options */ +#define GLUSTER_BLKSIZE 1048576 +#define GLUSTER_BLKSIZE_BITS 20 +#define GLUSTER_RA_PAGES 256 + /** List of active connections */ extern struct list_head fuse_conn_list; diff -aurN fuse-2.7.3/kernel/inode.c fuse-2.7.3glfs10/kernel/inode.c --- fuse-2.7.3/kernel/inode.c 2008-02-19 11:51:24.000000000 -0800 +++ fuse-2.7.3glfs10/kernel/inode.c 2008-04-21 16:54:50.000000000 -0700 @@ -140,7 +140,7 @@ i_size_write(inode, attr->size); spin_unlock(&fc->lock); #ifdef HAVE_I_BLKSIZE - inode->i_blksize = PAGE_CACHE_SIZE; + inode->i_blksize = GLUSTER_BLKSIZE; #endif inode->i_blocks = attr->blocks; inode->i_atime.tv_sec = attr->atime; @@ -350,7 +350,7 @@ char *p; memset(d, 0, sizeof(struct fuse_mount_data)); d->max_read = ~0; - d->blksize = 512; + d->blksize = GLUSTER_BLKSIZE; /* * For unprivileged mounts use current uid/gid. Still allow @@ -488,7 +488,7 @@ INIT_LIST_HEAD(&fc->io); INIT_LIST_HEAD(&fc->interrupts); atomic_set(&fc->num_waiting, 0); - fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; + fc->bdi.ra_pages = GLUSTER_RA_PAGES; fc->bdi.unplug_io_fn = default_unplug_io_fn; fc->reqctr = 0; fc->blocked = 1; @@ -786,8 +786,8 @@ return -EINVAL; #endif } else { - sb->s_blocksize = PAGE_CACHE_SIZE; - sb->s_blocksize_bits = PAGE_CACHE_SHIFT; + sb->s_blocksize = GLUSTER_BLKSIZE; + sb->s_blocksize_bits = GLUSTER_BLKSIZE_BITS; } sb->s_magic = FUSE_SUPER_MAGIC; sb->s_op = &fuse_super_operations; diff -aurN fuse-2.7.3/lib/fuse_kern_chan.c fuse-2.7.3glfs10/lib/fuse_kern_chan.c --- fuse-2.7.3/lib/fuse_kern_chan.c 2008-02-19 11:51:25.000000000 -0800 +++ fuse-2.7.3glfs10/lib/fuse_kern_chan.c 2008-04-21 16:54:50.000000000 -0700 @@ -80,7 +80,7 @@ close(fuse_chan_fd(ch)); } -#define MIN_BUFSIZE 0x21000 +#define MIN_BUFSIZE (1048576 + 0x1000) struct fuse_chan *fuse_kern_chan_new(int fd) { ------------------------------------- Complete patch to 2.7.3 tarball is at http://ftp.zresearch.com/pub/gluster/glusterfs/fuse/ http://ftp.zresearch.com/pub/gluster/glusterfs/fuse/fuse2.7.3glfs10.patch.txt This patch is working fine for us from 2.7.2 release. Regards, Amar On Tue, Apr 22, 2008 at 1:58 AM, Miklos Szeredi <mi...@sz...> wrote: > > Thank you Milkos. I've finally got around to applying this patch > > (been busy with day job, etc). I applied it to the latest linus git > > tree, and it applied cleanly, and rebooted into this kernel (which I > > know was successful, since I had to manually hack nvidia proprietary > > drivers and vmware properietary modules to work again with 2.6.25), > > but I am still seeing 4096 bytes writes, even when I do dd > > if=/dev/zero of=ZERO bs=1M count=100 on my fuse filesystem. > > > > Do I need to be running the latest fuse for this to work? It looks > > like I am using the 2.7.0 release fuse. > > Umm, strange. The fuse release shouldn't matter. Do you see a > directory named > > /sys/class/bdi/ > > ? > > If not, then something went wrong with the patching process. > > Thanks, > Miklos > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > > -- Amar Tumballi Gluster/GlusterFS Hacker [bulde on #gluster/irc.gnu.org] http://www.zresearch.com - Commoditizing Super Storage! |
From: Gordon W. <gor...@gm...> - 2008-07-06 08:13:00
|
On Mon, Mar 31, 2008 at 11:18 PM, Miklos Szeredi <mi...@sz...> wrote: > > Or is it a requirement that userspace must not see where the files are > actually coming from? We could introduce some sort of "hidden > symlink" into fuse that acts just the same way as a real symlink, but > is always followed, regardless of whether userspace asks for that or > not. > that reads like a variant on the direct access theme On Mon, Mar 31, 2008 at 10:46 PM, Franco Broi <fr...@bo...> wrote: > > If you find a way to do the path dereferencing, I'd love to know, the best I could come up with is a call to getxattr to return the true pathname. Works great for our own software but obviously doesn't do a thing for other utilities and applications. > > On Mon, 2008-03-31 at 02:44 -0700, Randy Robertson wrote: > > In reality, though, since all I'm really doing is resolving paths, I'd > > rather not have the data path flow through userspace at all. Is there > > a mechanism by which one can intercept file access requests > > (presumably through fuse) handle the resolving of these requests in > > userspace, and then return the kernel with a path to the "real" file, > > and have further access happen directly? > and that looks like two more requests for the same functionality and I have a huge backlog of mailing list messages to catch up on |
From: pyro <py...@li...> - 2008-07-06 18:28:46
|
Greetings, This is more thinking out loud than an actual proposal at this point, but: What if a fuse filesystem had the option to open the file and then tell the kernel module to hand it off to the requesting app with notifications? That is, the fuse module grabs the inode out of the filesystem's file handle and from then on directly calls the inode's methods in the calling process's context rather than downcalling to the userspace filesystem. Optionally, the filesystem can pass a mask of desired notifications (including none at all) or even what operations to pass directly to the backing filesystem and which the fuse filesystem wants to handle itself. Notifications would just be queued for delivery 'soon' rather than making the calling process wait on a context switch. G'day, sjames On Sun, 6 Jul 2008, Gordon Wrigley wrote: > On Mon, Mar 31, 2008 at 11:18 PM, Miklos Szeredi <mi...@sz...> wrote: > > > > > Or is it a requirement that userspace must not see where the files are > > actually coming from? We could introduce some sort of "hidden > > symlink" into fuse that acts just the same way as a real symlink, but > > is always followed, regardless of whether userspace asks for that or > > not. > > > > that reads like a variant on the direct access theme > > > On Mon, Mar 31, 2008 at 10:46 PM, Franco Broi <fr...@bo...> > wrote: > > > > > If you find a way to do the path dereferencing, I'd love to know, the best > > I could come up with is a call to getxattr to return the true pathname. > > Works great for our own software but obviously doesn't do a thing for > > other utilities and applications. > > > > On Mon, 2008-03-31 at 02:44 -0700, Randy Robertson wrote: > > > In reality, though, since all I'm really doing is resolving paths, I'd > > > rather not have the data path flow through userspace at all. Is there > > > a mechanism by which one can intercept file access requests > > > (presumably through fuse) handle the resolving of these requests in > > > userspace, and then return the kernel with a path to the "real" file, > > > and have further access happen directly? > > > > and that looks like two more requests for the same functionality > > and I have a huge backlog of mailing list messages to catch up on > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > ||||| |||| ||||||||||||| ||| by Linux Labs International, Inc. Steven James, CTO 866 824 9737 support |
From: Miklos S. <mi...@sz...> - 2008-07-07 13:34:43
|
On Sun, 6 Jul 2008, Gordon Wrigley wrote: [...] > and that looks like two more requests for the same functionality Yes, fuse performance issues seem to be a hot topic these days. Which I think means, that fuse is sufficiently feature-complete, that people are starting to worry about performance. Which is a good thing ;) Miklos |
From: Manuel A. (Rudd-O) <ru...@ru...> - 2008-07-07 21:43:12
|
> Yes, fuse performance issues seem to be a hot topic these days. Which > I think means, that fuse is sufficiently feature-complete, that people > are starting to worry about performance. Which is a good thing ;) You could not be more right. Thanks for the amazing work and the outstanding fruits of it. :-) > > Miklos > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel -- Manuel Amador (Rudd-O) <ru...@ru...> Rudd-O.com - http://rudd-o.com/ GPG key ID 0xC8D28B92 at http://wwwkeys.pgp.net/ Your reasoning is excellent -- it's only your basic assumptions that are wrong. |