Thanks to some insight from Terrel, I have been looking at
speeding up the RPM indexing. There were two things that a
big effect. The first is telling the rpm command/library
not to try and verify the various signatures/digests on
the rpm. The second is using the rpm-python library.
I did some tests on a 1.4GHz Athlon, 1GB RAM and indexing
all three CDs of Redhat 8.0.
With the current UML Builder code, it takes 6 minutes.
[This code basically repeatedly calls the rpm command line
binary and parses its output]
Supply options to not check/digests and signatures cut the
time in half to 3 minutes.
Using the rpm-python library instead cut the time down to
6 seconds! This version does however use considerably
more memory. For example, at one point it holds the
entire filelisting for all RPMs in memory. This shouldn't
be that big a deal since the rpm command line binary
appears to do the same as far as I can tell, and your
machine by definition needs enough extra RAM to run
UML in the first place.
I also have rpm-python based version that does not consume
that extra memory. It takes one minute.
I'll be doing another 1.40 release of UML Builder soon
with this stuff in it. It is currently in CVS if anyone
wants to take a look.
Roger
|