You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(11) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(1) |
Feb
(13) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(3) |
Oct
(2) |
Nov
(21) |
Dec
(24) |
2004 |
Jan
(23) |
Feb
(45) |
Mar
(29) |
Apr
(16) |
May
(34) |
Jun
(93) |
Jul
(52) |
Aug
(38) |
Sep
(161) |
Oct
(124) |
Nov
(134) |
Dec
(80) |
2005 |
Jan
(182) |
Feb
(72) |
Mar
(149) |
Apr
(136) |
May
(154) |
Jun
(64) |
Jul
(122) |
Aug
(134) |
Sep
(171) |
Oct
(116) |
Nov
(184) |
Dec
(130) |
2006 |
Jan
(141) |
Feb
(146) |
Mar
(208) |
Apr
(96) |
May
(105) |
Jun
(103) |
Jul
(90) |
Aug
(85) |
Sep
(136) |
Oct
(142) |
Nov
(157) |
Dec
(90) |
2007 |
Jan
(56) |
Feb
(99) |
Mar
(154) |
Apr
(124) |
May
(153) |
Jun
(120) |
Jul
(205) |
Aug
(155) |
Sep
(104) |
Oct
(155) |
Nov
(162) |
Dec
(130) |
2008 |
Jan
(111) |
Feb
(99) |
Mar
(155) |
Apr
(159) |
May
(56) |
Jun
(147) |
Jul
(293) |
Aug
(260) |
Sep
(98) |
Oct
(103) |
Nov
(169) |
Dec
(117) |
2009 |
Jan
(97) |
Feb
(50) |
Mar
(132) |
Apr
(129) |
May
(117) |
Jun
(63) |
Jul
(59) |
Aug
(99) |
Sep
(96) |
Oct
(87) |
Nov
(188) |
Dec
(129) |
2010 |
Jan
(107) |
Feb
(160) |
Mar
(55) |
Apr
(99) |
May
(47) |
Jun
(142) |
Jul
(146) |
Aug
(84) |
Sep
(108) |
Oct
(122) |
Nov
(114) |
Dec
(44) |
2011 |
Jan
(67) |
Feb
(69) |
Mar
(96) |
Apr
(77) |
May
(182) |
Jun
(129) |
Jul
(115) |
Aug
(98) |
Sep
(80) |
Oct
(86) |
Nov
(99) |
Dec
(187) |
2012 |
Jan
(57) |
Feb
(65) |
Mar
(103) |
Apr
(106) |
May
(123) |
Jun
(107) |
Jul
(157) |
Aug
(81) |
Sep
(159) |
Oct
(117) |
Nov
(70) |
Dec
(78) |
2013 |
Jan
(167) |
Feb
(187) |
Mar
(71) |
Apr
(130) |
May
(85) |
Jun
(112) |
Jul
(95) |
Aug
(149) |
Sep
(43) |
Oct
(64) |
Nov
(45) |
Dec
(27) |
2014 |
Jan
(55) |
Feb
(68) |
Mar
(64) |
Apr
(61) |
May
(51) |
Jun
(80) |
Jul
(90) |
Aug
(63) |
Sep
(142) |
Oct
(113) |
Nov
(145) |
Dec
(24) |
2015 |
Jan
(20) |
Feb
(20) |
Mar
(61) |
Apr
(43) |
May
(44) |
Jun
(37) |
Jul
(43) |
Aug
(59) |
Sep
(85) |
Oct
(58) |
Nov
(47) |
Dec
(131) |
2016 |
Jan
(130) |
Feb
(47) |
Mar
(121) |
Apr
(131) |
May
(75) |
Jun
(55) |
Jul
(25) |
Aug
(56) |
Sep
(42) |
Oct
(92) |
Nov
(96) |
Dec
(74) |
2017 |
Jan
(124) |
Feb
(67) |
Mar
(41) |
Apr
(42) |
May
(48) |
Jun
(47) |
Jul
(51) |
Aug
(43) |
Sep
(63) |
Oct
(33) |
Nov
(35) |
Dec
(2) |
2018 |
Jan
(47) |
Feb
(24) |
Mar
(67) |
Apr
(29) |
May
(8) |
Jun
(4) |
Jul
(21) |
Aug
(34) |
Sep
(27) |
Oct
(26) |
Nov
(35) |
Dec
(64) |
2019 |
Jan
(36) |
Feb
(116) |
Mar
(85) |
Apr
(46) |
May
(16) |
Jun
(21) |
Jul
(27) |
Aug
(42) |
Sep
(33) |
Oct
(57) |
Nov
(41) |
Dec
(27) |
2020 |
Jan
(23) |
Feb
(46) |
Mar
(33) |
Apr
(54) |
May
(72) |
Jun
(49) |
Jul
(59) |
Aug
(41) |
Sep
(98) |
Oct
(61) |
Nov
(489) |
Dec
(34) |
2021 |
Jan
(94) |
Feb
(68) |
Mar
(41) |
Apr
(27) |
May
(40) |
Jun
(41) |
Jul
(32) |
Aug
(19) |
Sep
(27) |
Oct
(34) |
Nov
(59) |
Dec
(55) |
2022 |
Jan
(39) |
Feb
(69) |
Mar
(57) |
Apr
(50) |
May
(131) |
Jun
(58) |
Jul
(65) |
Aug
(22) |
Sep
(68) |
Oct
(34) |
Nov
(31) |
Dec
(36) |
2023 |
Jan
(22) |
Feb
(38) |
Mar
(65) |
Apr
(37) |
May
(115) |
Jun
(65) |
Jul
(47) |
Aug
(82) |
Sep
(33) |
Oct
(57) |
Nov
(52) |
Dec
(45) |
2024 |
Jan
(38) |
Feb
(45) |
Mar
(35) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Miklos S. <mi...@sz...> - 2008-06-27 11:09:33
|
> I'm an intern at VMware looking into alternatives to our blocking > kernel module/filesystem which has been released as part of > open-vm-tools [1]. One option is to reimplement it using FUSE. I'm > trying to understand what's involved in having it run on as many > systems as possible, including some older ones, and how the > versioning works. > > I've done a bit of searching through the mailing list archives and > site and have some questions: > Is there a general method that's recommended for supporting a wide > variety of systems? > It looks like the userspace library's API keeps changing, but an > application can request a specific version by defining > FUSE_USE_VERSION. Is that right? Right. > If I try to build against a library which isn't new enough to > support the requested API, will the compile fail? Yes. > Is the ABI backwards compatible in that a filesystem compiled > against an older version of the library will run on newer libraries? Yes. > What if I try to run an FS compiled with a newer version of the API > against an older library? It will emit a run time error about a missing library dependency. > Is statically linking against the fuse library a bad idea? Not necessarily. Szaka (of ntfs-3g fame) made a stripped down fuse-lite package specifically for static linking, without all the ABI compatibilty stuff. > Do all versions of the library since 2.4.0 still support all kernel > modules? Yes. Miklos |
From: Miklos S. <mi...@sz...> - 2008-06-27 11:02:08
|
> I am trying to write an virtual chroot filesystem, which creates chroot > environments for each user logged into my system. This project is written > using python-fuse. > > Project works, but when trying to mount it using /etc/fstab, my option > "dev" is always ignored and my filesystem is mounted with "nodev" option. > I can mount it using ./chrootfs.py -o allow_other,default_permissions,dev , > but can't using mount /mnt/chroot -o dev . > Is it a bug of fuse? Why this parameter is forced and can't be overwritten > when using fstab entry? Are you running the filesystem as root? If not, then '-odev' is ignored on purpose: it would be a security hole to allow it. Thanks, Miklos |
From: Daniel B. <db...@vm...> - 2008-06-26 20:00:25
|
Hi all, I'm an intern at VMware looking into alternatives to our blocking kernel module/filesystem which has been released as part of open-vm-tools [1]. One option is to reimplement it using FUSE. I'm trying to understand what's involved in having it run on as many systems as possible, including some older ones, and how the versioning works. I've done a bit of searching through the mailing list archives and site and have some questions: Is there a general method that's recommended for supporting a wide variety of systems? It looks like the userspace library's API keeps changing, but an application can request a specific version by defining FUSE_USE_VERSION. Is that right? If I try to build against a library which isn't new enough to support the requested API, will the compile fail? Is the ABI backwards compatible in that a filesystem compiled against an older version of the library will run on newer libraries? What if I try to run an FS compiled with a newer version of the API against an older library? Is statically linking against the fuse library a bad idea? Do all versions of the library since 2.4.0 still support all kernel modules? Thanks a lot! Dan [1] http://open-vm-tools.sourceforge.net/ |
From: Ing. J. O. <on...@up...> - 2008-06-26 16:08:15
|
Hello, I am trying to write an virtual chroot filesystem, which creates chroot environments for each user logged into my system. This project is written using python-fuse. Project works, but when trying to mount it using /etc/fstab, my option "dev" is always ignored and my filesystem is mounted with "nodev" option. I can mount it using ./chrootfs.py -o allow_other,default_permissions,dev , but can't using mount /mnt/chroot -o dev . Is it a bug of fuse? Why this parameter is forced and can't be overwritten when using fstab entry? Another solution can be an /dev/null simulation in my code. May be it is a better solution, than mounting my filesystem with dev option. I am trying to write a file handler, which will not write data to this file, when /dev/null path is detected. But it does not work, I get always this error: LOOKUP /salstar.sk/www.salstar.sk/dev/null NODEID: 5 unique: 10, error: 0 (Success), outsize: 136 unique: 11, opcode: SETATTR (4), nodeid: 5, insize: 128 unique: 11, error: -2 (No such file or directory), outsize: 16 Why this -2 error is returned? What is this magic SETATTR? How can I handle it? This occurs always before open, so I am unable to handle it and I think it has nothing with my code. When trying to open /dev/null2, which is a regular device it work, but opening /dev/null, which is an empty file does not work. :-( SAL |
From: bargav y. <by...@ya...> - 2008-06-26 07:09:20
|
----- Original Message ---- > From: Miklos Szeredi <mi...@sz...> > To: by...@ya... > Cc: fus...@li... > Sent: Friday, June 20, 2008 10:16:10 AM > Subject: Re: [fuse-devel] tree_lock causes starvation > > > we've noticed that when there is steady readdir traffic, for example > > as generated by multiple find commands, delete operations (rmdir, > > unlink, rename) seem to get starved out. We're assuming this is due > > to the rwlock being in a reader-favoring mode. This is the default > > behaviour of the pthread mutex initialization value chosen by fuse > > for the lock called "tree_lock". > > Is there a specific reason for this decision? What would be the > > effect of changing the rwlock to prefer writers? > > Could you try out the CVS version of libfuse? The "tree_lock" logic > has been completely rewritten to have much less contention, and better > fairness on contention. I'm not sure it's completely fair or not, but > in practice you shouldn't see any such problems. > > Thanks, > Miklos When will this version be released? |
From: Patrick E. <pat...@gm...> - 2008-06-25 19:02:11
|
Here are the notes I have about running Fuse with Valgrind... * Problem By default, Fuse and Valgrind do not play nice together, making debugging memory errors in <myfs> a bit of a hassle. See the links below for a little bit of background. >From http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/4095/focus=4114. >> [*] I had been having problems with valgrind and permission problems. >> What works is to run valgrind as root, and chmod 755 >> /usr/bin/fusermount. > >Yes, stupid valgrind. My workaround is to replace /usr/bin/fusermount >with a shell script which just execs /usr/bin/fusermount.real. >Miklos Also, from http://osdir.com/ml/file-systems.fuse.devel/2006-01/msg00093.html, it appears the problem is related to setuid programs. The Fuse mount utility, fusermount, is a setuid program. Finally, a link in the previous message http://bugs.kde.org/show_bug.cgi?id=119404 further explains that setuid programs can be run by valgrind, but they cannot be traced. * Solution Here is one possible workaround the problem. As root, $ cd /usr/bin $ if [[ -e fusermount ]]; then mv fusermount fusermount.real; fi $ touch fusermount $ chown root.fuse fusermount $ chmod ug+x fusermount $ echo '#!/bin/sh' >> fusermount $ echo 'exec /usr/bin/fusermount.real $@' >> fusermount or depending on the path of fusermount $ echo 'exec /bin/fusermount $@' >> fusermount The other key point is to ensure that the setuid fusermount is not traced. Thus valgrind should be called as $ valgrind --tool=memcheck --trace-children=no ... myfs ... Finally, to work with Valgrind, <myfs> should be run in the foreground. To keep <myfs> from daemonizing, include the -d (debug) option. Putting it all together, one can run <myfs> under Valgrind with a command like $ valgrind --tool=memcheck --trace-children=no --leak-check=full --show-reachable=yes --max-stackframe=3000000 -v myfs -d /mnt/myfs ... On Wed, Jun 25, 2008 at 2:12 PM, kuba <kub...@gm...> wrote: > Hi all, > > Is it possible to determine memory leaks using valgrind on fuse filesystem? Or > maybe you have any other suggestions ? I tried to use Garbage Collector for C, > beacuse I read that it can produce statistics on program termination but It > didn't worke out. > I ran my filesystem with valgrind by it didn't produce proper stats at the end? > i.e. > ==18349== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. > > wasn't true because I ran in my program at least two allocs. > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > fuse-devel mailing list > fus...@li... > https://lists.sourceforge.net/lists/listinfo/fuse-devel > |
From: kuba <kub...@gm...> - 2008-06-25 18:12:26
|
Hi all, Is it possible to determine memory leaks using valgrind on fuse filesystem? Or maybe you have any other suggestions ? I tried to use Garbage Collector for C, beacuse I read that it can produce statistics on program termination but It didn't worke out. I ran my filesystem with valgrind by it didn't produce proper stats at the end? i.e. ==18349== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. wasn't true because I ran in my program at least two allocs. |
From: Gerard J. C. <gj...@ci...> - 2008-06-24 19:58:04
|
Marc Andre Tanner wrote: > Hi, > > I would like to know what is considered to be the right way to handle > generic errors like malloc failures? As far as i understand the kernel > is normally overcommitting memory, so malloc shouldn't fail and if it > does then there is really no more memory available. Also since the file > system is just a normal process it may be killed by the OOM Killer. > > With help of google i found a lwn.net article[1] from 2004 which mentions > a patch which introduces a new ioctl to prevent processes from being > killed. Do you know whether something like this hit mainline? Or is the > setting in /proc/<pid>/oom_adj as mentioned in [2] the correct way to > tweak the OOM Killer? > > Given these things, does it even make sense to test for malloc failures > and if so what should be done when an error occurs? Simply aborting is > probably not an option since there might be other file system operations > active which could be completed without new memory. > > Is returning ENOMEM the prefered solution? > > Some comments/advices would be appreciated. > > Thanks, > Marc > > [1] http://lwn.net/Articles/104179/ > [2] http://linux-mm.org/OOM_Killer > > I return ENOMEM |
From: Marc A. T. <ma...@br...> - 2008-06-24 19:48:04
|
Hi, I would like to know what is considered to be the right way to handle generic errors like malloc failures? As far as i understand the kernel is normally overcommitting memory, so malloc shouldn't fail and if it does then there is really no more memory available. Also since the file system is just a normal process it may be killed by the OOM Killer. With help of google i found a lwn.net article[1] from 2004 which mentions a patch which introduces a new ioctl to prevent processes from being killed. Do you know whether something like this hit mainline? Or is the setting in /proc/<pid>/oom_adj as mentioned in [2] the correct way to tweak the OOM Killer? Given these things, does it even make sense to test for malloc failures and if so what should be done when an error occurs? Simply aborting is probably not an option since there might be other file system operations active which could be completed without new memory. Is returning ENOMEM the prefered solution? Some comments/advices would be appreciated. Thanks, Marc [1] http://lwn.net/Articles/104179/ [2] http://linux-mm.org/OOM_Killer -- Marc Andre Tanner >< http://www.brain-dump.org/ >< GPG key: CF7D56C0 |
From: Miklos S. <mi...@sz...> - 2008-06-24 18:04:28
|
> > You should handle read beyon EOF, and return zero in that case. I'll > > have a look at the kernel code, why it is doing this. It's superfluous > > at best. > > Wasn't it this one? > > https://bugzilla.redhat.com/show_bug.cgi?id=325141#c63 Ah, yes. Indeed, I had a feeling that I've already seen this problem, but couldn't quite remember when. But my conclusion was right: the filesystem should be fixed to handle this situation. Miklos |
From: Szabolcs S. <sz...@nt...> - 2008-06-24 17:48:56
|
On Tue, 24 Jun 2008, Miklos Szeredi wrote: > You should handle read beyon EOF, and return zero in that case. I'll > have a look at the kernel code, why it is doing this. It's superfluous > at best. Wasn't it this one? https://bugzilla.redhat.com/show_bug.cgi?id=325141#c63 Szaka -- NTFS-3G: http://ntfs-3g.org |
From: Miklos S. <mi...@sz...> - 2008-06-24 17:18:13
|
> I recently started testing fuse with big_writes support because of the > performance benefits for the data deduplication filesystem that I am working > on. > Strangely enough 'dedupfs' exited with the warning that I was trying to read > a block of data beyond the end of a file. > > I am able to replicate this behavior with fusexmp. > After adding some logging to fusexmp I got this output: > Jun 24 16:11:47 snapper lt-fusexmp: xmp_read : offset 5234688, size 8192 > Jun 24 16:11:47 snapper lt-fusexmp: xmp_read : offset 5226496, size 8192 > Jun 24 16:11:47 snapper lt-fusexmp: xmp_read : offset 5242880, size 4096 > Jun 24 16:11:47 snapper lt-fusexmp: xmp_read : read returns 0 > > The file that I am reading here has a size of 5242880 bytes. > Before big_writes xmp_read would read offset 5226496 + size 8192 = 5242880 and > then stop. > With big_writes the extra read for offset 5242880, size 4096 is introduced. I can reproduce this on 2.6.25 without big writes. So while it's strange, this is not something that got introduced recently. > I mounted fusexmp with the arguments: > ./fusexmp /fuse -o > use_ino,readdir_ino,default_permissions,allow_other,big_writes,max_read=8192,max_write=8192 > Kernel: 2.6.26-rc7-smp > libfuse from pulled from cvs (yesterday). > > Should I change my code so that it allows reads beyond the end of a file or > is this a fuse bug? You should handle read beyon EOF, and return zero in that case. I'll have a look at the kernel code, why it is doing this. It's superfluous at best. Thanks, Miklos |
From: Miklos S. <mi...@sz...> - 2008-06-24 15:47:58
|
(untangling top-posting mess) > >> Here's what I see in my debug log: > >> > >> GetAttr called. > >> /one > >> GetAttr called. > >> /one/foo > >> GetAttr called. > >> /one/foo/1 > >> > >> So, it seems that getattr() is called recursively for every element of > >> the path!!! > > Does anybody know what NFS does in this case? Does it take the full > > round-trip hit times the directory depth, every time a file is accessed > > without its parent directories stats cached? Yes. NFSv4 has provision for gang-lookup, but that needs tweaking the VFS as well, so currently the Linux implementation doesn't use it. If such a feature is added to the VFS, then fuse could be extended as well to handle lookup of multiple path components in one go. Miklos |
From: Mark R. <mru...@gm...> - 2008-06-24 14:43:02
|
I recently started testing fuse with big_writes support because of the performance benefits for the data deduplication filesystem that I am working on. Strangely enough 'dedupfs' exited with the warning that I was trying to read a block of data beyond the end of a file. I am able to replicate this behavior with fusexmp. After adding some logging to fusexmp I got this output: Jun 24 16:11:47 snapper lt-fusexmp: xmp_read : offset 5234688, size 8192 Jun 24 16:11:47 snapper lt-fusexmp: xmp_read : offset 5226496, size 8192 Jun 24 16:11:47 snapper lt-fusexmp: xmp_read : offset 5242880, size 4096 Jun 24 16:11:47 snapper lt-fusexmp: xmp_read : read returns 0 The file that I am reading here has a size of 5242880 bytes. Before big_writes xmp_read would read offset 5226496 + size 8192 = 5242880 and then stop. With big_writes the extra read for offset 5242880, size 4096 is introduced. I mounted fusexmp with the arguments: ./fusexmp /fuse -o use_ino,readdir_ino,default_permissions,allow_other,big_writes,max_read=8192,max_write=8192 Kernel: 2.6.26-rc7-smp libfuse from pulled from cvs (yesterday). Should I change my code so that it allows reads beyond the end of a file or is this a fuse bug? Thanks in advance, Mark Ruijter |
From: Fi D. <fi....@gm...> - 2008-06-23 21:40:16
|
> Does anybody know what NFS does in this case? Does it take the full > round-trip hit times the directory depth, every time a file is accessed > without its parent directories stats cached? So no NFS experts around here? :) Fi. On Thu, Jun 19, 2008 at 5:59 AM, Rev Lebaredian <RLe...@nv...> wrote: > > Caching helps, unless your access pattern is random access to files > spread throughout different parts of the filesystem. The deeper the > files are in the directory tree, the worse it gets with high round-trip > times. > > Does anybody know what NFS does in this case? Does it take the full > round-trip hit times the directory depth, every time a file is accessed > without its parent directories stats cached? > > Rev > > -----Original Message----- > From: Jonathan Pryor [mailto:jon...@vt...] > Sent: Thursday, June 19, 2008 09:39 > To: Fi Dot > Cc: fus...@li...; Fedor Fomichev; Sergey Nozhkin; > Rev Lebaredian > Subject: Re: [fuse-devel] getattr() called recursively when doing stat() > on afile? > > On Thu, 2008-06-19 at 01:33 +0400, Fi Dot wrote: >> I am trying to understand the behaviour of FUSE / Linux VFS when doing >> getattr() calls to my filesystem.... > > <snip/> > >> and i'm calling stat() on my file that's being exposed by this >> filesystem. Path to this file is /one/foo/1. > > <snip/> > >> Here's what I see in my debug log: >> >> GetAttr called. >> /one >> GetAttr called. >> /one/foo >> GetAttr called. >> /one/foo/1 >> >> So, it seems that getattr() is called recursively for every element of >> the path!!! > > I wouldn't call this "recursive" (getattr isn't calling getattr), but > yes, getattr is being invoked for each piece of the path. > > This is By (Unix) Design. > > getattr() needs to be called to: > > 1. check that the path component exists (if getattr() returns an error, > it doesn't exist), and > > 2. check that the caller can access the contents of that directory. > (Consider a directory /tmp/foo with 000 permissions -- no non-root > user should be able to access the contents of that directory, and > a getattr() call is needed in order to make this check.) > >> The problem is that this filesystem is designed to expose remote >> files, possibly via links with high latency, so every getattr() call >> costs me the roundtrip time to remote host (which can be up to a >> second!!!), effectively making highly nested directory structures very >> inefficient. > > The answer: local caching -- every getattr() call shouldn't be a round > trip, you should cache the results of previous trips (and/or better, > when you make your remote call get the results for several files at > once, as you can be reasonably sure more getattr() calls will be coming > soon...). > > - Jon > > > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and may contain > confidential information. Any unauthorized review, use, disclosure or distribution > is prohibited. If you are not the intended recipient, please contact the sender by > reply email and destroy all copies of the original message. > ----------------------------------------------------------------------------------- > |
From: Miklos S. <mi...@sz...> - 2008-06-20 17:34:50
|
> we've noticed that when there is steady readdir traffic, for example > as generated by multiple find commands, delete operations (rmdir, > unlink, rename) seem to get starved out. We're assuming this is due > to the rwlock being in a reader-favoring mode. This is the default > behaviour of the pthread mutex initialization value chosen by fuse > for the lock called "tree_lock". > Is there a specific reason for this decision? What would be the > effect of changing the rwlock to prefer writers? Could you try out the CVS version of libfuse? The "tree_lock" logic has been completely rewritten to have much less contention, and better fairness on contention. I'm not sure it's completely fair or not, but in practice you shouldn't see any such problems. Thanks, Miklos |
From: Goswin v. B. <gos...@we...> - 2008-06-20 15:18:47
|
Bra...@sc... writes: > C. General I/Os (not just block devices): >> If this is done with normal FDs then the libfuse can just transform >> then into read/write calls for the actual data and ignore the cache >> fills (or also emulate that). > Yes, true - however - I have been focusing on readpage() and writepage() - > mostly because this seems to be the underlying mechanism to implement read() > and write(), (in almost all cases) as well as sendfile(). I am not sure about > splice() - I will have to look into it. read() and write() themselves however > would copy the data, which is what I am trying to avoid. Perhaps directly > calling readpage() writepage() of the target device would do the trick in my > situation, but read() or write() in others (like a network socket?) would be > correct. The readpage() writepage() only works for things like block devices, > but a general read() or write() would work in a broader sense. > Perhaps splice() would easily work the same way! The read/write suggestion was for when libfuse detects a kernel that does not support zero-copy I/O, I.e. a fuse module without the feature. Certainly that would copy buffers but that is the price of not having zero-copy. The goal was that a zero-copy FS would work with old fuse kernel support. MfG Goswin |
From: Miklos S. <mi...@sz...> - 2008-06-20 13:38:44
|
CC-ing linux-fsdevel, because this issue might be interesting to other filesystems which allow NFS exporting, and do page cache invalidation. Brian Wang wrote: >> Thanks for the quick fix. But I may have a hard one for you. >> >> 1. big_writes definitely works now. it also fixed the performance problem >> I reported. I think it is related to the 4k reads the patch fixed. >> >> 2. The problem is definitely NFS related. If you write some big files via >> NFS and read them back right away, it works. Then you leave it alone for >> a few hours, try to read them again, you will get Input/Output error. I >> used "-o big_writes, noforget" options. > More info on this. > > Even read from NFS returns IO error, read from local fuse works fine. > After waiting for a few hour(you got io errors reading the files you wrote > before), if you write a new file, then try to read it back, it takes high > CPU and won't finish. looks like it sits in a deadloop. > OK, I found the reason for the I/O errors and slowdowns. Short story: try the 'kernel_cache' option, it fixed both issues for me. Long story: NFSv2/3 don't have the concept of an open file, so for each read, nfsd basically does: open file read from file close file When opening the file fuse will flush the pages associated with the inode, unless the 'kernel_cache' option is used. This in itself shouldn't be a problem, since the invalidated pages will just be read again. The problem comes from the way nfsd does the reading: it uses splice to reference pages from the filesystem, instead of copying data to a temporary buffer. The following can happen: - one nfsd thread is doing the read, and is inside the splice code - an other nfsd thread is starting the read and calls open on the same inode The open will invalidate the current page cache for the inode, which will result in splice returing a short read count. In an extreme case, it could return a zero read count. All this still doesn't result in any errors in most cases, since the linux read code is built in a way to first do readahead asynchronously, and only do single page synchronous reads if the page wasn't read-in on the previous readahead pass. So mostly in the above case the short read count is ignored, and the read is retried, but now a separate 4k read request for each page. This is the cause of the slowdown. However in the rare case that the splice returns zero even for the single page read, then the linux read logic will take that as a read error and return -EIO. While 'kernel_cache' is a good workaround for this issue, it might not be ideal for all filesystems, because cache invalidation is an important issue in some cases. So I'm going to think about how to solve this properly. Probably splice should detect, that pages have been invalidated, and retry the operation. Miklos |
From: bargav y. <by...@ya...> - 2008-06-20 05:44:53
|
we've noticed that when there is steady readdir traffic, for example as generated by multiple find commands, delete operations (rmdir, unlink, rename) seem to get starved out. We're assuming this is due to the rwlock being in a reader-favoring mode. This is the default behaviour of the pthread mutex initialization value chosen by fuse for the lock called "tree_lock". Is there a specific reason for this decision? What would be the effect of changing the rwlock to prefer writers? |
From: Miklos S. <mi...@sz...> - 2008-06-19 15:57:35
|
> but it didnot create any fuse device as expected. > > As my /dev/ is read only I defined the flag -DFUSE_DEV_NEW=/var/dev/fuse. > and then recompiled fuse but again no device was created. > I created a character device in /var/device -: > *mknod /var/dev/fuse c 10 229 > * > But when I tried to mount the ntfs volume by using the ntfs-3g then the > mount() library function it uses returns > ENODEV.(This i found by debugging ntfs which uses mount and this mount is of > BusyBox version--1.1.3). Can you please send strace output from strace -o /tmp/strace ntfs-3g ... Thanks, Miklos |
From: Miklos S. <mi...@sz...> - 2008-06-19 15:49:33
|
> Attached is the output of ./configure and make > > Make fails with an error on the code. > gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7) > Linux 2.6.24-16-generic #1 SMP Thu Apr 10 13:23:42 UTC 2008 i686 GNU/Linux > > I hope you can tell me how to fix it. Don't use '--enable-kernel-module'. The ubuntu kernel should already contain a fuse module. Miklos |
From: <Bra...@sc...> - 2008-06-19 14:09:46
|
I think I specifically touched-on (and agreed with) most of your points in later comments (in a later message) A. Kernel XOR/Crypto/checksum/ECC w/ or wo/ hardware assist B. The concept of ext2 having the first "x" blocks cached - and saving those context switches. Some new points you mentioned: A. Preallocation - I didn't think of this - indeed, it is filesystem-specific. I believe that the current "cache" mechanism I am proposing would work well with this - if you populate the cache with those "pre-allocated" blocks - writes will be able to use them. Perhaps the only addition now would be for the kernel to be able to "cache" the update to the filesize, if in-fact you have written within those (preallocated/cached) blocks, but beyond the current EOF. BTW - This also fits logically into the schema of my filsystem - and I am sure most. (i.e. I allocate blocks, but maybe don't immediatley write all the way to EOF). B. >> Doesn't it suffice to do that on flush() and fsync() but not on fdatasync()? Yes, you are correct. C. General I/Os (not just block devices): > If this is done with normal FDs then the libfuse can just transform > then into read/write calls for the actual data and ignore the cache > fills (or also emulate that). Yes, true - however - I have been focusing on readpage() and writepage() - mostly because this seems to be the underlying mechanism to implement read() and write(), (in almost all cases) as well as sendfile(). I am not sure about splice() - I will have to look into it. read() and write() themselves however would copy the data, which is what I am trying to avoid. Perhaps directly calling readpage() writepage() of the target device would do the trick in my situation, but read() or write() in others (like a network socket?) would be correct. The readpage() writepage() only works for things like block devices, but a general read() or write() would work in a broader sense. Perhaps splice() would easily work the same way! I don't know about anyone else - but I am really liking the possibilities here - i.e. abiliy for very high-performance user-space filesystem I/O! -BKG gos...@we... wrote on 06/19/2008 09:43:02 AM: > Miklos Szeredi <mi...@sz...> writes: > > >> FUSE [seems to have been] written primarily with the concept of user-space > >> "translation" filesytems - i.e. thinkgs like gmailfs and sshfs - i.e. > >> doing translation of data, involving user-space processes, and other > >> things that would be cumbersome to implmenet in kernel-space - and even if > >> they were - would have to communicate with a whole bunch of user-space > >> code anyway (like SSH, etc.) > >> > >> It seems as though a lot of people (like myself, NTFS-3g, ext2inuserspace, > >> etc.) are now trying to use it for "traditional" filesystems which are > >> "traditional" in the sense that they are filesystems which are backed by > >> block devices. Examples would be [user-space implmenetations of] ext[234], > >> NTFS (g3) or ZFS. Examples would NOT include things like gmailfs or sshfs. > > > > ZFS does checksumming of blocks I think, so even though it's backed by > > a block device (or block devices) it has to process data that passes > > through it. This applies to compressed files on NTFS as well. > > And I would very much like to pass the checksumming of blocks to the > kernels async crypto engine. If no hardware support is there then the > generic_* driver wil do it in software and nothing is gained much. But > if you have hardware this frees a lot of cpu time. > > This also applies for the striping that uses XOR for parity. > > >> However, FUSE is not really intended to be optimized as normal in-kernel > >> filesytems in this respect - i.e. user<->kernel tanslation, context > >> switching, copying, etc. > >> > >> 1. This implmenetation is geared primarily for "traditional" type > >> filesystems - i.e. ones backed by block devices. > >> > >> 2. The patimplmenetation ch aims to accomplish the following: Let > >> user-space code do the "heavy-lifting" - i.e. the "logic" of the > >> filesystem is implmented in normal FUSE code - handling of the VFS > >> functions, etc. Just like FUSE does today. User-space code implmenets > >> almost the entire filesystem. HOWEVER when it comes to the actual I/O - > >> get the user-space code out of the way and let the kernel code take over. > >> This provides the following optimizaitons: > >> > >> a. Reducing context switching - user-space code may be avoided in > >> most cases during normal I/O (read/write/readpage, etc) > > > > This helps only with I/O on large files which are read infrequently > > and thus do not get cached (or are too large to be cached). I know > > that this does apply in your case, I'm just noting that this is not a > > universal solution to all problems :) > > Why? On the first read for an ext2 filesystem the cache gets set up > for the first 20 or so blocks. Even a 64k file will save context > switches. > > > Also when write is growing a file (which is by far the most common > > mode of operation), the userspace code has to do the block allocation. > > So unless some trickery (fallocate()) is used, this won't get rid of > > interaction with userspace. > > The kernel ext2/3 code can do preallocation. Doing that in userspace > would still require some feedback when a block is then used but that > could run in parallel with the kernel writing to the cache block > address. > > > With writes there's also the question of st_mtime update. Currently > > updating the timestamp is the responsibility of the userspace part, so > > if writing is moved to the kernel, this issue needs to be addressed as > > well. > > Doesn't it suffice to do that on flush() and fsync() but not on > fdatasync()? > > >> b. Allow the filesystem to "redirect" their I/Os to the underlying > >> block devices. (I have re-included some code sinippits below of how they > >> do this). This would be optimum for zero-copy I/O. > > > > Why just block devices? One common use of fuse is to do some > > transformation on a normal filesystem, and sometimes the actual data > > is not involved in the transformation. So having fuse perform I/O > > directly on the underlying file is also a feature that is often asked > > for. > > ACK. I would prefer if this would work with any file descriptor. In my > mind splice() should be used somehow. > > > If we are doing some sort of zero-copy thing, I'd really not like to > > limit it to just block devices. > > > > Also it would be important to have an API that can easily be emulated > > with legacy kernel support, so filesystems using the new zero-copy > > interface are not forced to implement two kinds of APIs for backward > > compatibility (and compatibility with other OS's than Linux). > > If this is done with normal FDs then the libfuse can just transform > then into read/write calls for the actual data and ignore the cache > fills (or also emulate that). > > > Miklos > > > One thing I'm missing is a feedback method for errors. Lets stick with > the ZFS example from above. Say you have a raidX chunk and one disk > fails. Then the userspace should be told about read errors, fetch the > parity block, reconstruct the missing block, do any repair work on the > FS and return the proper data to the reader. > > For the checksumming it would also be nice to do that on read. There > would have to be a callback for whenever a block is read to fire off > the in kernel checksumming and to OK the read block. > > MfG > Goswin |
From: Goswin v. B. <gos...@we...> - 2008-06-19 13:43:02
|
Miklos Szeredi <mi...@sz...> writes: >> FUSE [seems to have been] written primarily with the concept of user-space >> "translation" filesytems - i.e. thinkgs like gmailfs and sshfs - i.e. >> doing translation of data, involving user-space processes, and other >> things that would be cumbersome to implmenet in kernel-space - and even if >> they were - would have to communicate with a whole bunch of user-space >> code anyway (like SSH, etc.) >> >> It seems as though a lot of people (like myself, NTFS-3g, ext2inuserspace, >> etc.) are now trying to use it for "traditional" filesystems which are >> "traditional" in the sense that they are filesystems which are backed by >> block devices. Examples would be [user-space implmenetations of] ext[234], >> NTFS (g3) or ZFS. Examples would NOT include things like gmailfs or sshfs. > > ZFS does checksumming of blocks I think, so even though it's backed by > a block device (or block devices) it has to process data that passes > through it. This applies to compressed files on NTFS as well. And I would very much like to pass the checksumming of blocks to the kernels async crypto engine. If no hardware support is there then the generic_* driver wil do it in software and nothing is gained much. But if you have hardware this frees a lot of cpu time. This also applies for the striping that uses XOR for parity. >> However, FUSE is not really intended to be optimized as normal in-kernel >> filesytems in this respect - i.e. user<->kernel tanslation, context >> switching, copying, etc. >> >> 1. This implmenetation is geared primarily for "traditional" type >> filesystems - i.e. ones backed by block devices. >> >> 2. The patimplmenetation ch aims to accomplish the following: Let >> user-space code do the "heavy-lifting" - i.e. the "logic" of the >> filesystem is implmented in normal FUSE code - handling of the VFS >> functions, etc. Just like FUSE does today. User-space code implmenets >> almost the entire filesystem. HOWEVER when it comes to the actual I/O - >> get the user-space code out of the way and let the kernel code take over. >> This provides the following optimizaitons: >> >> a. Reducing context switching - user-space code may be avoided in >> most cases during normal I/O (read/write/readpage, etc) > > This helps only with I/O on large files which are read infrequently > and thus do not get cached (or are too large to be cached). I know > that this does apply in your case, I'm just noting that this is not a > universal solution to all problems :) Why? On the first read for an ext2 filesystem the cache gets set up for the first 20 or so blocks. Even a 64k file will save context switches. > Also when write is growing a file (which is by far the most common > mode of operation), the userspace code has to do the block allocation. > So unless some trickery (fallocate()) is used, this won't get rid of > interaction with userspace. The kernel ext2/3 code can do preallocation. Doing that in userspace would still require some feedback when a block is then used but that could run in parallel with the kernel writing to the cache block address. > With writes there's also the question of st_mtime update. Currently > updating the timestamp is the responsibility of the userspace part, so > if writing is moved to the kernel, this issue needs to be addressed as > well. Doesn't it suffice to do that on flush() and fsync() but not on fdatasync()? >> b. Allow the filesystem to "redirect" their I/Os to the underlying >> block devices. (I have re-included some code sinippits below of how they >> do this). This would be optimum for zero-copy I/O. > > Why just block devices? One common use of fuse is to do some > transformation on a normal filesystem, and sometimes the actual data > is not involved in the transformation. So having fuse perform I/O > directly on the underlying file is also a feature that is often asked > for. ACK. I would prefer if this would work with any file descriptor. In my mind splice() should be used somehow. > If we are doing some sort of zero-copy thing, I'd really not like to > limit it to just block devices. > > Also it would be important to have an API that can easily be emulated > with legacy kernel support, so filesystems using the new zero-copy > interface are not forced to implement two kinds of APIs for backward > compatibility (and compatibility with other OS's than Linux). If this is done with normal FDs then the libfuse can just transform then into read/write calls for the actual data and ignore the cache fills (or also emulate that). > Miklos One thing I'm missing is a feedback method for errors. Lets stick with the ZFS example from above. Say you have a raidX chunk and one disk fails. Then the userspace should be told about read errors, fetch the parity block, reconstruct the missing block, do any repair work on the FS and return the proper data to the reader. For the checksumming it would also be nice to do that on read. There would have to be a callback for whenever a block is read to fire off the in kernel checksumming and to OK the read block. MfG Goswin |
From: rajat m. <raj...@gm...> - 2008-06-19 11:03:31
|
Hi , I have cross compiled fuse-2.5.3 for my MIPS platform. My kernel version is 2.6.8.1. I am using ntfs-3g user space driver to write to the "ntfs" file system. I successfully compiled the fuse module and loaded it on my platform by :- *# modprobe fuse fuse init (API version 7.5) fuse distribution version: 2.5.3* but it didnot create any fuse device as expected. As my /dev/ is read only I defined the flag -DFUSE_DEV_NEW=/var/dev/fuse. and then recompiled fuse but again no device was created. I created a character device in /var/device -: *mknod /var/dev/fuse c 10 229 * But when I tried to mount the ntfs volume by using the ntfs-3g then the mount() library function it uses returns ENODEV.(This i found by debugging ntfs which uses mount and this mount is of BusyBox version--1.1.3). Which means either the kernel module is not loaded properly or mount is some how unable to find it Strange thing is i can see it in the /proc/filesystems *# cat /proc/filesystems nodev sysfs nodev rootfs nodev bdev nodev proc nodev sockfs nodev usbfs nodev usbdevfs nodev tmpfs nodev pipefs nodev devpts ext3 squashfs nodev ramfs msdos vfat ntfs romfs nodev ffs nodev fuse ** *If i repeat the same* *procedure with the same packages on the desktop linux m/c with kernel version-2.6.11. it works. I can't find out whether the problem is with "fuse module or "mount module". I will be highly obliged if you can throw some light on it. Please do get back if you have any queries. regards, Rajat * * * * |
From: Rev L. <RLe...@nv...> - 2008-06-19 02:00:36
|
Caching helps, unless your access pattern is random access to files spread throughout different parts of the filesystem. The deeper the files are in the directory tree, the worse it gets with high round-trip times. Does anybody know what NFS does in this case? Does it take the full round-trip hit times the directory depth, every time a file is accessed without its parent directories stats cached? Rev -----Original Message----- From: Jonathan Pryor [mailto:jon...@vt...] Sent: Thursday, June 19, 2008 09:39 To: Fi Dot Cc: fus...@li...; Fedor Fomichev; Sergey Nozhkin; Rev Lebaredian Subject: Re: [fuse-devel] getattr() called recursively when doing stat() on afile? On Thu, 2008-06-19 at 01:33 +0400, Fi Dot wrote: > I am trying to understand the behaviour of FUSE / Linux VFS when doing > getattr() calls to my filesystem.... <snip/> > and i'm calling stat() on my file that's being exposed by this > filesystem. Path to this file is /one/foo/1. <snip/> > Here's what I see in my debug log: > > GetAttr called. > /one > GetAttr called. > /one/foo > GetAttr called. > /one/foo/1 > > So, it seems that getattr() is called recursively for every element of > the path!!! I wouldn't call this "recursive" (getattr isn't calling getattr), but yes, getattr is being invoked for each piece of the path. This is By (Unix) Design. getattr() needs to be called to: 1. check that the path component exists (if getattr() returns an error, it doesn't exist), and 2. check that the caller can access the contents of that directory. (Consider a directory /tmp/foo with 000 permissions -- no non-root user should be able to access the contents of that directory, and a getattr() call is needed in order to make this check.) > The problem is that this filesystem is designed to expose remote > files, possibly via links with high latency, so every getattr() call > costs me the roundtrip time to remote host (which can be up to a > second!!!), effectively making highly nested directory structures very > inefficient. The answer: local caching -- every getattr() call shouldn't be a round trip, you should cache the results of previous trips (and/or better, when you make your remote call get the results for several files at once, as you can be reasonably sure more getattr() calls will be coming soon...). - Jon ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- |