Re: [V9fs-developer] v9fs performance

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

I ran into some net/9p issues while working on performance changes that
were the original topic of this thread.  I wanted to check a couple of
things with the experts.  These seem like bugs to me (testing with trans=fd):

1) Signal handling in p9_client_rpc()

On receipt of a signal, first the transport cancel op is called, then
if that is unsuccessful, a Tflush is sent.  The Tflush is sent by recursing
into p9_client_rpc().  If another signal is received, the transport
cancel op is called on the Tflush.  If it is successful, we free the Tflush
request (having never sent it to the server) and unwind to the original
p9_client_rpc() call, where we eventually free the request abandoning
its tag.  Meanwhile if a response is received for that request, a
fatal "Unexpected packet tag" error will result.

It seems that signals should be ignored when handling the Tflush request.

2) Canceling Tclunk or Tremove in p9_fd_cancel()

When a Tclunk or Tremove is sent to the server, a side effect is to tell the
server to forget about a fid.  When one of these requests is sent and a
signal is received, the transport cancel op is called.  If succesful,
we free the Tclunk or Tremove request (having never sent it to the server).
There is an understanding, observed in higher layers of the client code,
that Tclunk or Tremove always frees its fid even if the request was
unsuccesful.  However in this case, the server never sees the request,
so upon seeing the fid reused in a "Twalk fid newfid" request for exmaple,
the server will fail the request.

It seems that p9_fd_cancel() should never cancel a Tclunk or Tremove.
They can be followed right up with a Tflush though.

Any thoughts on these issues?  So the first one came up when I got many
requests in flight with my performance changes and started hitting Ctrl-C
alot.  I have been seeing evidence of the second one prior to the performance
work.

I'll get some patches submitted if there is agreement that these are bugs.

Thanks,

Jim

On Sun, Jan 08, 2012 at 09:14:10AM -0800, Jim Garlick wrote:
> On Sun, Jan 08, 2012 at 09:19:39AM -0600, Eric Van Hensbergen wrote:
> 
> > The reason I said that is the most straightforward
> > approach to dealing with it is to enable readahead in the page routines so
> > that the code calls readpages instead of readpage.  The problem is that
> > this then tells the read/write code to go through the pagecache code which
> > automatically enables data caches.
> 
> I wasn't proposing to do that - just making readpage() asynchronous 
> (send in readpage(), recv/mark page valid in callback) seems
> to have the desired effect on exec performance.  I'll submit patches
> for comment early next week after more testing.
> 
> Incidentally launching the same executable on multiple cores in parallel
> triggers only one set of readpage calls even without my changes, so
> presumably those pages are locked in the page cache some way for the
> duration of the exec.  This is good news for me as parallel execs are
> common in our cluster environment.
> 
> I did not parameterize the readpage changes but I can if you want.
> I didn't see the need as I don't believe there is any effect other
> than performance.
> 
> > Now as far as implementing asynchrony within the 9p client (versus
> > exposing it to the page cache) -- that can probably work, however IIRC
> > the last time I looked at this, there are two issues:
> ...
> > Again - I have no problem with Linux optimized "short-cuts" for static
> > file systems (as cache=loose should suggest), but I'd like to keep them
> > parameterized so that they can be disabled when accessing more dynamic
> > name spaces.
> 
> OK, I'll make sure to parameterize any changes I submit for read/write.
> Although I have some code working now for read, I'm waffling on whether
> it's a good idea or not.  It massively boosts performance when using
> large I/O buffers on small msize, but long term I think I need a
> transport that supports a 1mb or larger msize to preserve atomicity of
> read/write syscalls as we've discussed.  If that happens then this
> change might not be all that beneficial.
> 
> Thanks,
> 
> Jim