Share

pkgbuild

Tracker: Bugs

5 [patch] Add requires: /path/to/file - ID: 1707525
Last Update: Comment added ( laca_ )

In order to do this, I've whipped up a quick module that parses
/var/sadm/install/contents a little. We only use it to map file ->
package, but it should be expandable to cover other things should the need
arise.

The two files attached are:

pkgdb.pm (the module which parses /var/sadm/install/contents)
a diff to the rest of the code ot use it


Mike Bristow ( mikebristow ) - 2007-04-25 15:59

5

Closed

Fixed

Laszlo (Laca) Peter

rpm compatibility

None

Public


Comments ( 12 )




Date: 2008-02-16 01:56
Sender: laca_Project Admin


I know how you feel about backticking, but considering the memory
consumption and performance of implementing this in perl,
I decided to do the hard work in C and use a backtick. To avoid the
problems with regexps, I'm not using grep, but wrote a simple
C program that finds exact file names in contents files and it does that
very fast.
Apart from that, I'm using almost all of your code and almost unchanged.

Thanks!



Date: 2007-05-09 12:55
Sender: mikebristow


I don't like backticking grep; it seems wrong. I would suggest the
newly-attached pkgdbgrepcore.pm.2; it has two advantages over using grep:

its now a shade faster
it copes with filenames with '+' (and other chars special to grep)
better.

for the latter point:

: michaelb@solaris-9-sparc ~/cvsed/thus.net/thus-pkgbuild; cat x.pl
use lib qw(pkgbuild);
use pkgdbgrepcore;
use mypkgdbgrep;
use Data::Dumper;

my $db1 = pkgdbgrepcore->new();
my $db2 = mypkgdbgrep->new();

my @a = $db1->file2pkgs('/var/nis/NIS+LDAPmapping.template');
my @b = $db2->file2pkgs('/var/nis/NIS+LDAPmapping.template');

print Dumper \@a;
print Dumper \@b;

: michaelb@solaris-9-sparc ~/cvsed/thus.net/thus-pkgbuild; perl x.pl
$VAR1 = [
'SUNWnisr'
];
$VAR1 = [];
: michaelb@solaris-9-sparc ~/cvsed/thus.net/thus-pkgbuild;


I tend to agree that using the original versions is silly unless the
average package build needs to run file2pkgs on ~50-100 filesystem objects
(and even then the memory cost looks like it'll be hard to reduce).

File Added: pkgdbgrepcore.pm.2


Date: 2007-05-04 01:40
Sender: laca_Project Admin


I've uploaded my version of dbgrep. Instead of reading contents line by
line in perl, I'm actually running /usr/bin/grep and also caching the
result.
This is really cheating if I'm using your benchmark, because it looks up
the same file again and again, so mydbgrep beats all other solutions in
that particular test.
Instead, I measured how many different files it can look up by the time
dbnew loads the contents file. The result was about 60.
I think it's good enough -- it's unlikely that a "normal" spec file uses
more than a dozen file dependencies.

Let me know what you think.
File Added: mypkgdbgrep.pm


Date: 2007-05-03 09:35
Sender: mikebristow


Here's the benchmark program I used, and the results:


Initialization times:
db took: 33 wallclock secs (29.66 usr + 1.52 sys = 31.18 CPU)
dbnew took: 18 wallclock secs (16.88 usr + 1.29 sys = 18.17 CPU)
dbgrep took: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
dbgrepcore took: 1 wallclock secs ( 0.94 usr + 0.24 sys = 1.18 CPU)
Data Sizes:
db: 80937285
dbnew: 61195293
dbgrep: 447
dbgrepcore: 13905246
Number of files found:
db: 103329
dbnew: 103329
Benchmark: running filespkgs, grepcorefile2pkgs, grepfile2pkgs,
newfile2pkgs, each for at least 5 CPU seconds...
pkgdb: 6 wallclock secs ( 5.18 usr + 0.00 sys = 5.18 CPU) @
38102.12/s (n=197369)
pkgdbnew: 5 wallclock secs ( 5.32 usr + 0.00 sys = 5.32 CPU) @
1.69/s (n=9)
pkgdbnewgrep: 5 wallclock secs ( 5.30 usr + 0.38 sys = 5.68 CPU) @
0.88/s (n=5)
pkgdbnewgrepcore: 5 wallclock secs ( 5.33 usr + 0.00 sys = 5.33 CPU) @
31332.08/s (n=167000)

File Added: benchmark-pkgdb


Date: 2007-05-03 09:30
Sender: mikebristow


File Added: pkgdbgrepcore.pm


Date: 2007-05-03 09:29
Sender: mikebristow


File Added: pkgdbgrep.pm


Date: 2007-05-03 09:29
Sender: mikebristow


File Added: pkgdb.pm


Date: 2007-05-03 09:29
Sender: mikebristow


Here are 4(!) variants of pkgdb.pm with benchmarking.

Note that in the following descriptions the times are for a slow sparc
machine with a ~13M contents file; the memory usage is from Devel::Size
rather than top (and therefore will be wrong due to malloc overheads and
memory fragmentation), but (hopefully) will be OK for comparison.

I suggest you decide which trade off you want, and pick your
implementation.

pkgdb.pm

on file2pkgs call, this will see if we have the results already (and if
so, return them). Otherwise, it'll parse /var/sadm/install/contents from
where we left off the last time, parsing each line completly.

Pros: if the result is cached (eg: we've reached the end of contents)
then it is blindingly fast (of the order of 38k calls/sec on my machine).
Cons: if the result is /not/ cached, it can spend an awful long time
parsing the contents file into a giant in-memory structure (~33sec on my
machine), and the memory usage is high (80M + malloc overheads on my
machine).


pkgdbnew.pm

Like pkgdb, but while reading the contents file only picks the contents
apart enough to understand what the file is called. Defers other parsing
to when we find the entry we're interested in (either by reading it from
the file, or from the in-memory semi-parsed cache).

Pro: If we've already read the relevant line into the in-memory cache,
this is very fast (of the order of 31k calls/sec on my machine). Uses less
memory than pkgdb.pm, and the time taken to parse into the memory cache is
quicker.
Con: Still slow when we have a cache miss (18sec to parse the whole file)
and fat (~60M).

pkgdbgrep.pm

Each call to file2pkg results in the module re-reading the contents file
with a simple regex; hits are then pulled apart (and if they really are
hits, returned).

Pro: Low memory usage. No cache miss overhead.
Con: Slow (0.8 calls/sec to file2pkg).


pkgdbgrepcore.pm

Like pkgdbgrep, but on object creation reads the contents file into memory
and searches that rather than re-reading from disk all the time

Pro: twice as fast as pkgdbgrep
Con: Slower than pkgdbgrep to initialize; uses 13M of memory



File Added: pkgdbnew.pm


Date: 2007-04-30 12:33
Sender: mikebristow


I was optimizing for repeat calls: my benchmarking says that you can call
files2pkg 41956 in a second, once the db is fully loaded... this means that
my implementation 'wins' when you have about 70 files to check (on my
machine).

However, we'd need to 'win' at about the dozen mark currently for my
approach to make sense, which means a lot of cunningness.


I've just spotted this on CPAN:

http://search.cpan.org/~chrisj/sol-inst-0.90a/

which I'll take a look at.




Date: 2007-04-27 20:23
Sender: laca_Project Admin


Hmm...
pkgdb.pm needs a bit of optimization.
On my laptop, heap grows to 270MB when the whole contents file is loaded
and takes 28 sec to load.
grep foo /var/sadm/install/contents only takes 0.46 sec.
So it seems cheaper to grep for each file when needed and parse the output
of grep only.




Date: 2007-04-26 09:26
Sender: mikebristow


The attached adds support for 'filenames' in buildrequires, too.
File Added: pkgbuild.addbuildrequires.patch


Date: 2007-04-25 15:59
Sender: mikebristow


File Added: pkgdb.pm


Log in to comment.




Attached Files ( 9 )

Filename Description Download
pkgbuild.addrequires.patch patch ot the rest of the code to use pkgdb.pm Download
pkgbuild.addbuildrequires.patch Download
pkgdbnew.pm Download
pkgdb.pm Download
pkgdbgrep.pm Download
pkgdbgrepcore.pm Download
benchmark-pkgdb Download
mypkgdbgrep.pm Laca's version of pkgdb that uses /usr/bin/grep Download
pkgdbgrepcore.pm.2 Download

Changes ( 17 )

Field Old Value Date By
close_date - 2008-02-16 01:56 laca_
resolution_id Accepted 2008-02-16 01:56 laca_
status_id Open 2008-02-16 01:56 laca_
File Added 228499: pkgdbgrepcore.pm.2 2007-05-09 12:55 mikebristow
File Added 227772: mypkgdbgrep.pm 2007-05-04 01:40 laca_
File Added 227647: benchmark-pkgdb 2007-05-03 09:35 mikebristow
File Added 227646: pkgdbgrepcore.pm 2007-05-03 09:30 mikebristow
File Added 227645: pkgdbgrep.pm 2007-05-03 09:29 mikebristow
File Added 227644: pkgdb.pm 2007-05-03 09:29 mikebristow
File Deleted 226596: 2007-05-03 09:29 mikebristow
File Added 227643: pkgdbnew.pm 2007-05-03 09:29 mikebristow
assigned_to nobody 2007-04-27 20:23 laca_
resolution_id None 2007-04-27 20:23 laca_
File Added 226693: pkgbuild.addbuildrequires.patch 2007-04-26 09:26 mikebristow
File Added 226596: pkgdb.pm 2007-04-25 15:59 mikebristow
category_id None 2007-04-25 15:59 mikebristow
File Added 226594: pkgbuild.addrequires.patch 2007-04-25 15:59 mikebristow