From: Elliot F. <efi...@gm...> - 2011-08-08 19:33:43
|
On Mon, Aug 8, 2011 at 6:52 AM, Robert Sandilands <rsa...@ne...> wrote: > Hi Michal, > > With a 2 GHz Xeon I am seeing scaling problems when you approach 94 > million files. I had another crash this weekend and had to increase > timeouts yet again. At this stage the master is unresponsive for at > least 5 minutes every hour. The graphs in the CGI look like a comb with > 0 activity on the hour every hour for about 5 minutes. That is except > for CPU usage on the master which spikes to 100% for the same period. We > did see an increase in performance and stability when we moved some > tasks from the master server to other machines but at this stage we > can't move more tasks off the master without buying more hardware. > During the time of 0 activity we see read and write timeouts and the > filesystem is completely unresponsive to users. > > I am convinced that part of the scalability issue is related to the fact > that everything is single threaded and that any single task that can > take a long time has the potential to cause problems affecting > scalability and stability. Robert, Metadata access is single threaded, but at the top of every hour when the metadata is stored, the mfsmaster process is essentially dual-threaded (or more accurately dual-processed). The process forks (or at least tries to) and the metadata is stored in a background process allowing the main process to continue to serve requests. If you only have a single core on your master, then obviously both processes will have to use it and thus it will spike every hour when the metadata is stored, but it should still continue to serve requests. If the 'fork' doesn't happen for any reason then the mfsmaster will stop serving requests and store the metadata, thus pausing all clients regardless of how many cores you have. And finally, if you have multiple cores and the fork works, you *should* be able to store the metadata and continue to serve client requests without a noticeable delay. Attached is a patch for filesystem.c that will indicate in your log file whether or not the fork was successful. I'd be curious to see the results. Elliot |