From: Marcus B. <mbl...@gm...> - 2003-08-29 09:39:12
|
Hi all, we are using UML successfully for different purposes for some time now, however there is one issue which cannot be solved either by Standalone- or UM-Linux. It's also popping up more or less regularly on linux-kernel. It is the problem of cache pollution on reading large files, even though we know that we will not reuse this data in reasonable time. Application examples: streaming media servers, fileservers for media content etc. This one bites all users no matter how much physical ram the machines have. The solutions proposed on kernel mailing list, such as O_DIRECT, fadvise(), O_STREAMING, are IMHO less than half the way if used as proclaimed. For them it is up to the system administrator to patch every single program involved (nfsd, smbd) to use the feature/flag correctly. I would like to solve the problem in the data source instead of the data serving application. Just forcing the O_DIRECT flag for all open()'s on a particular filesystem does not work because of the special access patterns imposed by using O_DIRECT. My idea was to not expose file content directly from the machine but utilize UML virtual hosts for file serving purposes. This adds a additional layer of indirection between the data to serve and the data serving program. One common way to import data into UML is through file- or partition-mapped uniform block devices. I'd like to solve the problem exactly here. Therefore using O_DIRECT on access to specific ubd's should suffice. I'm thinking of mimicing the infrastructure of the 'sync' command line flag. Is this doable? Do the access patterns of the udb driver in its current incarnation meet the O_DIRECT constrains? Best regards, Marcus |
From: Nuno S. <nun...@vg...> - 2003-08-29 16:52:38
|
[cross-posting to uml-user and uml-devel, sorry] Hi!! Marcus Blomenkamp wrote: > Hi all, > > we are using UML successfully for different purposes for some time now, > however there is one issue which cannot be solved either by Standalone- or > UM-Linux. It's also popping up more or less regularly on linux-kernel. It is > the problem of cache pollution on reading large files, even though we know > that we will not reuse this data in reasonable time. Application examples: > streaming media servers, fileservers for media content etc. This one bites > all users no matter how much physical ram the machines have. > > The solutions proposed on kernel mailing list, such as O_DIRECT, fadvise(), > O_STREAMING, are IMHO less than half the way if used as proclaimed. For them > it is up to the system administrator to patch every single program involved > (nfsd, smbd) to use the feature/flag correctly. > I have a partial solution for this: disable host's swap. Simple and very wise if: - You have enough memory to hold all your programs and a little bit extra; - You don't want to do disk-cache above your free physical-RAM. I like UML very much but, for this particular problem, I don't think that UML will help, because: - Adding an extra layer will make things slower and add complexity; - UML itself will use memory; - UML's overhead is almost nothing for non-system load but for "system load" UML has lots of overhead (from 10% to 50% depending on the workload); - If you want UML to do "disk" I/O it'll be much slower than the physical host because it's "system load". Anyway, try disabling the host's swap and try again! You'll be pleased :) Regards, Nuno Silva |