Thread: Re: [Kosmosfs-users] [Kosmix Blog] New Comment Posted to 'Kosmos Filesystem Release'
Status: Alpha
Brought to you by:
sriramsrao
From: Sreekant S. <sv...@ya...> - 2007-12-27 00:50:15
|
Sriram, Following up on the mail you sent. I looked up the code in KfsClient.cc. It seem that each call to a public function locks the global mutex for the period of the call. I am guessing that this means that each call to a read or write is likely to lock the KfsClient instance for the period of the read operation. So if a read takes 20-25ms + the network transfer, we are talking about a 50ms wait on each request before another thread can use execute an operation. Isn't this a limitation? Why? The limitation seems to be the singleton instance of KfsClient. Is there a reason why you've designed it this way? What if there are multiple KfsClients running in a process. Does that impact the servers in any fashion? What are your thoughts? Regards, Sreekant. Sriram Rao <sr...@ko...> wrote: RE: [Kosmix Blog] New Comment Posted to 'Kosmos Filesystem Release' Sreekant, Please post this on the kosmosfs-users mailing list. Others will also benefit :-) =========== Sriram, I guess this post is misplaced. But I your forum on sourceforge does not seem to be active. I am evaluating KFS for an internal project. I noticed that your client API uses a global mutex for access. Does that mean that only a single thread can access the KFS from a particular client? If so, do you plan to change this in the near future? ============= There is a single mutex that protects each of the public methods of the client API. The intent was to serialize thread access thru the client object. Clearly, you can have multiple threads running thru the client code accessing KFS. Sriram --------------------------------- Looking for the perfect gift? Give the gift of Flickr! |
From: Sriram R. <sri...@gm...> - 2007-12-27 17:02:20
|
Sreekant, > It seem that each call to a public function locks the global mutex for the > period of the call. I am guessing that this means that each call to a read > or write is likely to lock the KfsClient instance for the period of the read > operation. Yes, that is correct. > So if a read takes 20-25ms + the network transfer, we are talking > about a 50ms wait on each request before another thread can use execute an > operation. Not necessarily... > Isn't this a limitation? Why? While this seems like a limitation, I am not sure this is significant. Here is why: There is caching/buffering at the client for read/write. The read/write buffer is 64MB in size (and this is on a per file basis). For reads, there is the overhead to pull data from the chunkserver on a cache miss; after that, until the read buffer is exhausted, reads are handled at memory speeds---acquire the mutex; memcpy; release the mutex. For writes, the client buffers the data (acquire mutex; memcpy, release the mutex) and flushes to the server either when the buffer is full or the application forces data to be flushed by executing a Sync() call. As long as your application reads/writes in amounts smaller than 64MB, you won't be hitting the network much. Most of the I/O will proceed at memory speeds. Consequently, the client buffering/caching will amortize the overhead of reading/writing data from/to the servers. > Is there a reason why you've designed it this way? We could've structured the system a bit differently---keep multiple requests outstanding to the server and then match request/responses (and avoid the need for the mutex). It was done for simplicity at the client end. What is the use case you have in mind with your applications? Does this issue of single mutex significantly impact performance? > What if there are multiple KfsClients running in a process. Does that impact > the servers in any fashion? No: The server side can support requests from multiple clients concurrently. The use case here is you spawn multiple processes (each with a single kfs-client instance) and they pull data concurrently. Sriram |