Re: [fuse-devel] Apparent caching problem

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello Hans,

Thank you for your input on this matter. I think our situation is different
as it does not sound consistent with the behavior you mentioned. In our
case the problem is intermittent. Miklos recommended that we use direct_io,
but when we do that we get a read error on every file we try to read over
NFS, as mentioned in my previous post. Is there any information about this?
It sounds like we should use direct_io, but it is not working for us. Is
there something we need to do to get it to work, or is there another route
we can go to avoid the problem we are facing? It would also be nice to get
some explanation of what is happening that causes this problem, just for my
own edification (i.e. what is happening in the code which decides whether
or not to call our read() callback function).

Thanks!
Ryan

|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Hans Beckérus <han...@gm...>                                                                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |fus...@li...                                                                                                                  |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Ryan J Minniear/Sacramento/IBM@IBMUS                                                                                                              |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |10/14/2011 03:04 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: [fuse-devel] Apparent caching problem                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|

Hi, I do not know if I am going to provide you any help, but I really
trapped on the NFS keyword ;)
I assume that you have your file system running remotely and then
mount it across NFSv3 ?

In that case there are things that will make this more tricky. There
is *nothing* wrong in FUSE when it comes to NFS, but your file system
implementation might need some extra thought. I myself bumped into
this problem ages ago.
The problem is that NFSv3 is stateless, that means that your file
system is going to received bursts of open->read->close cycles. In my
case that made everything break since I was depending on a byte stream
that could not be interrupted.
Since the open call started the stream extraction, the close call had
to make it stop by design. The next open call it all started all over
again and all context information was lost, and so my stream became
out-of-sync and corrupt from the read end perspective, that is the
reader located locally on the NFS client side. If you connect the two
peers over NFSv3 but  run your file system locally the problem no
longer applies since your NFS client takes care of the bursts
internally and all you see is a consecutive stream, with one open and
one close call.

So, maybe this does not apply in your case, but if it do, you should
consider the "NFS effect". I actually never solved this problem, but
maybe there is a solution to it. I sort of left it behind since eg.
CIFS is working fine in any scenario and I was fine with that. NFSv4
is supposed to be state aware, never tried it though.

Hans

On Thu, Oct 13, 2011 at 11:22 PM, Ryan J Minniear <rjm...@us...>
wrote:
>
> Hello Miklos,
>
> Thanks for the quick response. We have tried using the direct_io flag,
but
> for us NFS is critical and we run into read errors for every file over
NFS
> when we use the direct_io flag. Things appeared to still work locally,
but
> the code we have which exercises our fuse filesystem is run via NFS so we
> have been unable to run that with direct_io on.
>
> Here is NFS tracing showing what happened with direct_io enabled. Note
the
> nfs_readpage_result status is -1.
>
> Oct 12 10:55:50 blade4-9 kernel: NFS call  access
> Oct 12 10:55:50 blade4-9 kernel: NFS: nfs_update_inode(0:18/9193249 ct=1
> info=0x6)
> Oct 12 10:55:50 blade4-9 kernel: NFS reply access: 0
> Oct 12 10:55:50 blade4-9 kernel: NFS: permission(0:18/9193249), mask=0x1,
> res=0
> Oct 12 10:55:50 blade4-9 kernel: NFS: revalidating (0:18/9235820)
> Oct 12 10:55:50 blade4-9 kernel: NFS call  getattr
> Oct 12 10:55:50 blade4-9 kernel: NFS reply getattr: 0
> Oct 12 10:55:50 blade4-9 kernel: NFS: nfs_update_inode(0:18/9235820 ct=1
> info=0x6)
> Oct 12 10:55:50 blade4-9 kernel: NFS: (0:18/9235820) revalidation
complete
> Oct 12 10:55:50 blade4-9 kernel: NFS: nfs_lookup_revalidate
(test0/f999.blt)
> is valid
> Oct 12 10:55:50 blade4-9 kernel: NFS call  access
> Oct 12 10:55:50 blade4-9 kernel: NFS: nfs_update_inode(0:18/9235820 ct=1
> info=0x6)
> Oct 12 10:55:50 blade4-9 kernel: NFS reply access: 0
> Oct 12 10:55:50 blade4-9 kernel: NFS: permission(0:18/9235820),
mask=0x24,
> res=0
> Oct 12 10:55:50 blade4-9 kernel: NFS: open file(test0/f999.blt)
> Oct 12 10:55:50 blade4-9 kernel: NFS: read(test0/f999.blt, 32768@0)
> Oct 12 10:55:50 blade4-9 kernel: NFS: nfs_readpage (ffffe2000423b1c0
> 4096@0)
> Oct 12 10:55:50 blade4-9 kernel: NFS:     0 initiated read call (req
> 0:18/9235820, 62 bytes @ offset 0)
> Oct 12 10:55:50 blade4-9 kernel: NFS: nfs_readpage_result:  4279, (status
> -1)
> Oct 12 10:55:50 blade4-9 kernel: NFS: nfs_update_inode(0:18/9235820 ct=1
> info=0x6)
> Oct 12 10:55:50 blade4-9 kernel: NFS: read done (0:18/9235820 62@0)
> Oct 12 10:55:50 blade4-9 kernel: NFS: flush(test0/f999.blt)
> Oct 12 10:55:50 blade4-9 kernel: NFS: release(test0/f999.blt)
> Oct 12 10:55:50 blade4-9 kernel: NFS: dentry_delete(test0/f999.blt, 8)
>
> Is direct_io the normal way to get this problem to go away, or are there
> other avenues? What can we do to get direct_io to work? Also I am
wondering
> if you could explain a little bit about what is going on behind the
scenes
> (i.e. about the code that decides whether or not to call our read()
> callback function and how it makes that decision, etc.) We are perplexed.
>
> Thanks again for your help!
> Ryan
>
>
> |------------>
> | From:      |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>  |Miklos Szeredi <mi...@sz...>
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

> |------------>
> | To:        |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>  |Ryan J Minniear/Sacramento/IBM@IBMUS
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

> |------------>
> | Cc:        |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>  |fus...@li...
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

> |------------>
> | Date:      |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>  |10/13/2011 03:06 AM
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

> |------------>
> | Subject:   |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>  |Re: [fuse-devel] Apparent caching problem
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>
>
>
>
>
> Ryan J Minniear <rjm...@us...> writes:
>
>> Hello,
>>
>> I work on a product where we use FUSE to implement a filesystem which
>> delivers metafiles for our data files. Currently we have been working on
>> trying to figure out an issue where sometimes a file is read and the
data
>> is not as we expect. The problem occurs in roughly 1-2 files in a 100
> file
>> run. Note that during our testing there are 2 or more tasks which are
>> concurrently running the following processing steps against different
>> files.
>>
>> Here is the general sequence of events.
>>
>> 1) Open metafile and read the data.
>> 2) Add some information and write the metafile.
>> 3) Open the metafile again and read the data.
>> 4) Write additional information out to the metafile.
>>
>> It is at step #3 where we sometimes see this issue. The data in the file
> is
>> partially incorrect, and the end of the file appears to be padded with
> many
>> 0s which show up in vi as ???????? marks.
>>
>> Some background information which may be helpful is that our FUSE
>> filesystem implementation treats certain lines written into the metafile
> as
>> commands which affect changes in the metadata or on the underlying data
>> file. So when we write a certain command into the file in step 1, it
> should
>> not be in the file anymore when we open it again in step 3. For all
files
>> in the test case we are writing our _EVENT_commit_ command in step #2.
> When
>> we re-open the file in step #3, the _EVENT_commit_ command should no
> longer
>> be there based on how our code works, but in the erroneous cases it is,
>> along with the ????????? data at the end.
>
>
> Are you setting the 'direct_io' flag?
>
> >From the report it looks like you should but don't.
>
> Thanks,
> Miklos
>
>
>
>
------------------------------------------------------------------------------

> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> fuse-devel mailing list
> fus...@li...
> https://lists.sourceforge.net/lists/listinfo/fuse-devel
>
>