Re: [Jung-support] question about UserData performance design

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 23 Sep 2005, at 1:16, Skye Bender-deMoll wrote:

>   We are working with large graphs with lots of data (and  
> potentially largish data) attached to them.  In cases like this do  
> you think it is more efficient to keep all the data in nodes/edges  
> as UserData,  or to store data outside the graph (in a hashtable  
> indexed by graph element, for example) and keep only a minimal set  
> in the elements' UserData?
>
> I'm curious how others have dealt with this, any thoughts, etc.

Skye:

The new (just-released) version of JUNG now allows you to supply your  
own implementation of UserData; take a look at the release notes for  
details.  It also provides an alternate implementation, so that you  
can get an idea of different ways to provide that capability.  Each  
of these implementations has advantages and disadvantages.

The user data repository is convenient and very flexible, but is not  
necessarily always the most efficient way to store information.  It's  
particularly useful for cases in which the data that is attached to  
your elements is heterogeneous (e.g., if you want to "tag" certain  
vertices or edges, or more generally if each vertex carries with it a  
different collection of metadata) and you don't want to pass Maps  
around.

If you're writing code for your own use, and each vertex or edge has  
the same metadata, subclassing the existing vertex/edge classes can  
sometimes be the most convenient and efficient solution: if you know  
that all your vertices must store a name and an ID number, then you  
can create MyVertex extends SparseVertex (or whatever vertex class is  
appropriate) that has name and ID fields.

More thoughts on this point may be found in the manual and in the  
journal article preprint on the JUNG documentation website.

> Does anybody have a sense of which operations will tend to be more  
> expensive with lots of UserData?

Depends on the UserData implementation.  See the release notes for  
1.7 for a brief discussion on this point.

> Obviously graph copying, but is basic visualization likely to be  
> slowed?  Presumably copying speed is impacted more by the number of  
> UserData items than their size?  (if it is only copying object  
> refs..)?

Basic visualization is unlikely to be affected by data storage in  
general, although it will be affected by the way that coordinates and  
other characteristics (colors, shapes, sizes, etc.) are stored/ 
calculated.  If the visualization is slow to update, it's much more  
likely to be a combination of the network's size and the inefficiency  
of our layout algorithms.  (Although I don't expect our next release  
to be soon, I hope that it will provide some upgrades to the layout  
algorithms that we currently provide.)

Joshua

jm...@ic......Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
Joshua O'Madadhain: Information Scientist, Musician, and Philosopher- 
At-Tall
   It's that moment of dawning comprehension that I live for--Bill  
Watterson
   My opinions are too rational and insightful to be those of any  
organization.

Re: [Jung-support] question about UserData performance design

Java graph/network library

Re: [Jung-support] question about UserData performance design