From: Magnus L. H. <ma...@he...> - 2002-04-18 15:47:56
|
Perry Greenfield <pe...@st...>: > [snip] > > In relation to what? Using dictionaries etc? Using the array module? > > No, in relation to operations on a 10K array. Basically, if an operation > on a 10K array spends half its time on set up, operations on a > 10 element array may only be twice as fast. I'm not making any claims > about speed in relation to any other data structure (other than Numeric) Aaah! Sorry to be so dense :) But the speedup in numeric between different sizes isn't as important to me as the speedup compared to other solutions (such as a dict-based one) of course... If a 10 element array is only twice as fast as a 10K array that's no problem if it's still faster than an alternative solution (though I'm sure it might not be...) The same goes for 10K element graphs -- the interesting point has to be whether it's faster than various alternatives (which I'm sure it is). > > [snip] > > > Before I go further, I need to find out if the preceeding has made > > > you gasp in horror or if the timescale is too slow for you to > > > accept. > > > > Hm. If you need 10000 elements before numarray pays off, I'm starting > > to wonder if I can use it for anything at all. :I > > > I didn't make clear that this threshold may improve in the future > (decrease). Right. Good. And -- on small graphs performance probably won't be much of a problem anyway. :) > The corresponding threshold for Numeric is probably > around 1000 to 2000 elements. (Likewise, operations on 10 element > Numeric arrays are only about twice as fast as for 1K arrays) > We may be able to eventually improve numarray performance to something > in that neighborhood (if we are luckly) but I would be surprised to > do much better (though if we use caching techniques, perhaps repeated > cases of arrays of identical shape, strides, type, etc. may run > much faster on subsequent operations). As usual, performance issues > can be complicated. You have to keep in mind that Numeric and numarray > provide much richer indexing and conversion handling feature than > something like the array module, and that comes at some price in > performance for small arrays. Of course. I guess an alternative (for the graph situation) could be to wrap the graphs with a common interface with various implementations, so that a solution more optimised for small graphs could be used (in a factory function) if the graph is small... (Not really an issue for me at the moment, but should be easy to do, I guess.) [snip] > > I'm not sure. A wide range, I should imagine. But with only 100 nodes, > > I'll get 10000 entries in the adjacency matrix, so perhaps it's > > worthwile anyway? > > > That's right, a 100 nodes is where performance is being competitive, > and if you feel you are worried about cases larger than that, then > it isn't a problem. Seems probable. For smaller problems I wouldn't be thinking in terms of numarray anyway, I think. (Just using plain Python dicts or something similar.) [snip] > > > On the other hand, since numarray has much better support for index > > > arrays, i.e., an array of indices that may be used to index another > > > array of values, index array(s), value array pair may itself serve > > > as a storage model for sparse arrays. > > > > That's an interesting idea, although I don't quite see how it would > > help in the case of adjacency matrices. (You'd still need at least one > > n**2 size matrix for n nodes, wouldn't you -- i.e. the index array... > > Right?) > > > Right. I might as well use a full adjacency matrix, then... So, the conclusion for now is that numarray may well be suited for working with relatively large (100+ nodes), relatively dense graphs. Now, the next interesting question is how much of the standard graph algorithms can be implemented with ufuncs and array operations (which I guess is the key to performance) and not straight for-loops... After all, some of them are quite sequential. -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org |