I've spent the last few days looking into data generation code. Justin
pointed me to a generic data generation tool -- that one was OK, but it
doesn't actually generate the specific distributions we need. I'm also
somewhat unclear about what algorithms to use. E.g., how does one
generate N uniformly distributed values out of which 100 are unique?
Does one have to force the other N-100 to not be unique? Also, the only
algorithm I found for generating Zipf distributions takes forever to run.
Somewhere around there I got disgusted with writing C code. So in case we
ever end up writing out own tool, maybe we want to consider writing it in
a different language that's a little better about automatic memory
management. (Scheme and Python came to my mind, but opinions vary.)
Peter Eisentraut peter_e@...