At Wed, 2 Oct 2002 08:36:36 -0400 (EDT),
> I'm not a developer, but what an EXCELLENT idea.
> I have a feeling that performance visualization will become very
> important pretty soon. Having all of the data in hdf5 format will
> greatly help. (Many visualization tools can read hdf/hdf5)
I mulled this over a bit and it I think in fact that the *database*
format isn't really appropriate for hdf5 use, since it's supposed to
record information coming out of the kernel, rather fast, and its form
is optimized to record lots of pc counts in a nice compact tree. hdf5
isn't really great at that sort of scenario.
however, once the database files are written out of the daemon, it's
quite a simple matter to post-process them into hdf5. so I wrote a
tool that does this (attached). I cannot connect to oprofile cvs at
the moment so it's just a single file rather than a proper diff; but
if you put it in the pp directory as an extra target like op_merge,
and link with -lhdf5, then it works.
as an example of the "fun" analyses you can do once you have this in a
more portable format, here is a little R script to load in a session
and construct the ratio between coincident pc values recorded by 2
different counters, vaguely like a CPI value, and plot those values
worse than 20 cpi (forgive the style, I'm still learning R):
x <- hdf5load("/tmp/test.h5", load=FALSE)
clocks <- x[['lib.libc-2.2.5.so.ctr0']]
insns <- x[['lib.libc-2.2.5.so.ctr7']]
isect <- intersect(clocks[,1],insns[,1])
clocks <- clocks[which(clocks[,1] %in% isect),]
insns <- insns[which(insns[,1] %in% isect),]
cpi <- data.frame(clocks[,1], clocks[,2] / insns[,2])
baddies <- cpi[ cpi[,2] > 20, ]
colnames(baddies) <- c("addr", "cpi")
xyplot(baddies$cpi ~ baddies$addr, type="h")
I wouldn't go replacing any of the existing pp tools with this
facility, but I think it makes a nice complement if you're trying
to do some peculiar (or bulk) analysis.
as an added benefit, the hdf5 chunked storage model (which I'm using
for efficiency sake) supports transparently deflating each chunk via
zlib, so the post-processed hdf5 files are quite a bit smaller than
the corresponding database files.