From: Gordan B. <go...@bo...> - 2009-02-13 16:12:00
|
I'm pondering what could be done to speed up mkinitrd, so thought I'd share some thoughts. Apart from removing the unnecessary files from it (omitting .pyo/.pyc files (is this filter included in the current preview), omitting unused kernel modules (diet patch)), there are two things that I can see as making a big difference to the initrd build speed. 1) Extracting file lists from the RPM DB This involces invoking rpm -q for each package, which is slow (lots of process startup latency and churn and not much CPU used, most of the time is spent starting up and tearing down processes. If this could somehow be combined it might just yield a signifficant speed-up. I'll test this theory over the weekend and report back. The one problem with this approach is that there would be no sensible way to apply per-package filtering. 2) Compression speed 2.1) Using a parallel gzip (http://www.zlib.net/pigz/) compressor, which should scale pretty much linearly with the number of CPU cores. A (source) RPM seems to be available, but only for SuSE (http://rpm.pbone.net/index.php3/stat/4/idpl/11044884/com/pigz-2.1.4-5.1.x86_64.rpm.html), so until it is more common, it may have to be made available via the comoonics yum repository. 2.2) Using a decent compiler (Intel's ICC) to squeeze more performance out of the compressor. Intel do provide an optimized gzip library sample (http://www.intel.com/cd/software/products/asmo-na/eng/219967.htm) which according to the docs seems to also be multi-threaded (will have to double check that). My previous tests on Pentium III indicated that ICC built gzip is about 20% faster than the GCC built one. Since IPP includes a highly optimized gzip module, it should do even better, and still stack with the multi-processor scaling. The only problem I can see with 2.2) is that ICC is only free for non-commercial use (OSS is mentioned as an example, and IIRC MySQL used to distribute an ICC built version of their community DB), so that part is something for you guys at Atix to figure out. :) I'll look into this and post some performance results over the weekend, but in the meantime, has anyone got any thoughts on this? Any reason why this might be deemed a bad idea? Gordan |
From: Gordan B. <go...@bo...> - 2009-02-13 18:42:49
|
Gordan Bobic wrote: > 2) Compression speed > > 2.1) Using a parallel gzip (http://www.zlib.net/pigz/) compressor, which > should scale pretty much linearly with the number of CPU cores. A (source) > RPM seems to be available, but only for SuSE > (http://rpm.pbone.net/index.php3/stat/4/idpl/11044884/com/pigz-2.1.4-5.1.x86_64.rpm.html), > so until it is more common, it may have to be made available via the > comoonics yum repository. OK, that was 100% pain-free. The SuSE src.rpm compiles cleanly on RHEL/CentOS 5.x. The performance scaling is completely linear (32 seconds with gzip on a quad core Core2, 8 seconds using pigz). The resulting compressed files aren't identical (pigz one is actually a tiny bit smaller), but when they are ungzipped (ungzipping works using standard ungzip), the decompressed files are the same, so it seems safe enough. :) Even without any further boosts in compression speed, this seems pretty worthwhile, and it has the advantage of being 100% clean, OSS, and extremely painless to implement (whereas ICC will involve opening a whole new can of worms for a much smaller benefit). So, if I may be so bold to ask - any chance of including pigz in the yum repository and adding a dependency on it for the comoonics-bootimage? I'd submit a create-gfs-initrd-lib.sh patch, but I can't help but feel that a patch as small as 2 lines would be a bit lame. :^) Gordan |
From: Marc G. <gr...@at...> - 2009-02-15 10:14:06
|
On Friday 13 February 2009 19:42:42 Gordan Bobic wrote: > Gordan Bobic wrote: > > 2) Compression speed > > > > 2.1) Using a parallel gzip (http://www.zlib.net/pigz/) compressor, which > > should scale pretty much linearly with the number of CPU cores. A > > (source) RPM seems to be available, but only for SuSE > > (http://rpm.pbone.net/index.php3/stat/4/idpl/11044884/com/pigz-2.1.4-5.1. > >x86_64.rpm.html), so until it is more common, it may have to be made > > available via the comoonics yum repository. > > OK, that was 100% pain-free. The SuSE src.rpm compiles cleanly on > RHEL/CentOS 5.x. The performance scaling is completely linear (32 > seconds with gzip on a quad core Core2, 8 seconds using pigz). ;-) > > The resulting compressed files aren't identical (pigz one is actually a > tiny bit smaller), but when they are ungzipped (ungzipping works using > standard ungzip), the decompressed files are the same, so it seems safe > enough. :) > > Even without any further boosts in compression speed, this seems pretty > worthwhile, and it has the advantage of being 100% clean, OSS, and > extremely painless to implement (whereas ICC will involve opening a > whole new can of worms for a much smaller benefit). > > So, if I may be so bold to ask - any chance of including pigz in the yum > repository and adding a dependency on it for the comoonics-bootimage? > > I'd submit a create-gfs-initrd-lib.sh patch, but I can't help but feel > that a patch as small as 2 lines would be a bit lame. :^) That's no problem. I will make a variable for the compression program that can be overwritten in /etc/comoonics/bootimage/bootimage.cfg. And yes that's only two lines. The more dramatic speed up I'm thinking about is the one with using only one rpm process and dieting wherever possible and wherever needed. This reduces the size of initrd and consequently the time for compressing it. But I need to check if the filters are compatible with that concept. If so that should be pretty easy if not the filters have to be rewritten. Which should also be possible. I'll give feetback. Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Gordan B. <go...@bo...> - 2009-02-15 14:36:25
|
Marc Grimme wrote: >> So, if I may be so bold to ask - any chance of including pigz in the yum >> repository and adding a dependency on it for the comoonics-bootimage? >> >> I'd submit a create-gfs-initrd-lib.sh patch, but I can't help but feel >> that a patch as small as 2 lines would be a bit lame. :^) > > That's no problem. I will make a variable for the compression program that can > be overwritten in /etc/comoonics/bootimage/bootimage.cfg. > And yes that's only two lines. Awesome, thanks. :) > The more dramatic speed up I'm thinking about is the one with using only one > rpm process and dieting wherever possible and wherever needed. This reduces > the size of initrd and consequently the time for compressing it. Indeed, that was the other thing I was thinking about. It is possible to list multiple packages on the same query, as per what is being done in get_filelist_from_installed_rpm(): rpm -q <package1> <package2> ... <packageN> --dump > But I need to check if the filters are compatible with that concept. If so > that should be pretty easy if not the filters have to be rewritten. Which > should also be possible. I can't see how the current filtering could work with this approach. Only one big list is returned, so all filtering would have to operate on that unified list. This means that filtering would also have to be unified, rather than per-package, which is what I was referring to in the previous post. Filtering out everything under /usr/share/[doc | man], /usr/include, etc. is simple enough, but it isn't as flexible as per-package filtering. I can't see a way around that, though. Gordan |