From: Joshua O'M. <jm...@ic...> - 2005-09-26 21:37:53
|
On 23 Sep 2005, at 1:16, Skye Bender-deMoll wrote: > We are working with large graphs with lots of data (and > potentially largish data) attached to them. In cases like this do > you think it is more efficient to keep all the data in nodes/edges > as UserData, or to store data outside the graph (in a hashtable > indexed by graph element, for example) and keep only a minimal set > in the elements' UserData? > > I'm curious how others have dealt with this, any thoughts, etc. Skye: The new (just-released) version of JUNG now allows you to supply your own implementation of UserData; take a look at the release notes for details. It also provides an alternate implementation, so that you can get an idea of different ways to provide that capability. Each of these implementations has advantages and disadvantages. The user data repository is convenient and very flexible, but is not necessarily always the most efficient way to store information. It's particularly useful for cases in which the data that is attached to your elements is heterogeneous (e.g., if you want to "tag" certain vertices or edges, or more generally if each vertex carries with it a different collection of metadata) and you don't want to pass Maps around. If you're writing code for your own use, and each vertex or edge has the same metadata, subclassing the existing vertex/edge classes can sometimes be the most convenient and efficient solution: if you know that all your vertices must store a name and an ID number, then you can create MyVertex extends SparseVertex (or whatever vertex class is appropriate) that has name and ID fields. More thoughts on this point may be found in the manual and in the journal article preprint on the JUNG documentation website. > Does anybody have a sense of which operations will tend to be more > expensive with lots of UserData? Depends on the UserData implementation. See the release notes for 1.7 for a brief discussion on this point. > Obviously graph copying, but is basic visualization likely to be > slowed? Presumably copying speed is impacted more by the number of > UserData items than their size? (if it is only copying object > refs..)? Basic visualization is unlikely to be affected by data storage in general, although it will be affected by the way that coordinates and other characteristics (colors, shapes, sizes, etc.) are stored/ calculated. If the visualization is slow to update, it's much more likely to be a combination of the network's size and the inefficiency of our layout algorithms. (Although I don't expect our next release to be soon, I hope that it will provide some upgrades to the layout algorithms that we currently provide.) Joshua jm...@ic......Obscurium Per Obscurius...www.ics.uci.edu/~jmadden Joshua O'Madadhain: Information Scientist, Musician, and Philosopher- At-Tall It's that moment of dawning comprehension that I live for--Bill Watterson My opinions are too rational and insightful to be those of any organization. |