Menu

Rendering only a sub-portion of a graph?

2002-10-10
2002-10-22
  • Ramon M. Felciano

    Hi --

    I'm looking for a graph toolkit that will let me browse and view a very large graph, probably 50,000 nodes and 500,000 edges. This is obviously too big to view, layout, etc at once, so I'm looking for toolkits that provide mechanisms for extracting a subsection of the graph that is used as the "active" part of the graph to be working with.

    I could do this myself by providing an intermediate layer between a graph store (say an edge-list database) and the data structure actually being viewed in the graph tool (say a GML file). However, I'd obviously prefer a tool that handles both for me and gives me a some simple ways to focus in on a particular region of the graph. The obvious one for my current use would be to maintain a "focus node" and scope the graph based on e.g. reachability within N edges from the focus node.

    Can someone comment on whether Royere supports this directly, and if not, whether it has any particular architectural features that might simplify maintaining a local subview of a remote graph store?

    Thanks!

     
    • yugen

      yugen - 2002-10-11

      > I'm looking for a graph toolkit that will let me
      > browse and view a very large graph, probably
      > 50,000 nodes and 500,000 edges. This is
      > obviously too big to view, layout, etc at once,
      > so I'm looking for toolkits that provide
      > mechanisms for extracting a subsection of the
      > graph that is used as the "active" part
      > of the graph to be working with.

      Royere has the ability to create an "overview" graph based on various clustering strategies.  By clicking on a node in the overview graph, the main panel focuses on one region of the original graph.  However, this approach requires the entire graph to be in memory.

      Another approach is to define subgraphs within the original graph.  This is done by declaring certain nodes to be "metanodes".  A metanode looks like an ordinary node, but when a metanode is "opened" in the UI, it reveals a subgraph, i.e., nodes contained within the metanode.  This is one way to hide nodes until you want to view them.

      FYI, Royere has some test graphs with several hundred nodes, but it has not been tested with 50,000 nodes and 500,000 edges yet.

      > I could do this myself by providing an
      > intermediate layer between a graph
      > store (say an edge-list database) and the
      > data structure actually being viewed in
      > the graph tool (say a GML file).

      Royere is based on a "pipeline" model.  Specifically, Royere uses an InputManager to fault a graph from a graph data store, then it uses a FilterManager to restrict which part of the graph gets laid out, then it uses a LayoutManager to perform the layout (by assigning coordinates to every node), and finally it uses a ViewManager to handle the actual rendering.  One current limitation of this approach is that you can't impose a filter on a graph until after it has been entirely faulted into memory.  Depending on the graph sizes you are dealing with and the hardware you're running on, this may not be practical.

      So, it would be really good to add a feature to filter which nodes and edges get loaded from the graph store.  This specific ability is not in Royere yet, but would be generally useful. 

      > However, I'd obviously prefer a tool that
      > handles both for me and gives me some
      > simple ways to focus in on a particular
      > region of the graph. The obvious one for my
      > current use would be to maintain a
      > "focus node" and scope the graph based
      > on e.g. reachability within N edges from the
      > focus node.

      You can jump to any node by saying "center <node name>" in the Royere command line.  Then you can zoom in or out to see nodes in the immediate vicinity.  However, as I mentioned above, Royere assumes that the entire graph has been loaded into memory.

       
      • Ramon M. Felciano

        Thanks for the helpful writeup. There are two obvious followup questions:

        1. Do you have any numbers memory overhead for Royere graph structures so that I might estimate how reasonable it is to load our 50/500 graph into memory?

        2. Assume the above is not feasible, it sounds like the next approach is to write a custom graph loader that only faults a sub-graph from the backing store. One way to do this would be to wipe the graph from memory and reload the new subgraph every time the user centers on a new node. However, this would also wipe out any graph state information (e.g. node geometries and coordinates). Given that the subgraphs will likely have common sub-regions (assuming this re-centering technique is essentially a way of scrolling a fixed edge-distance window around the whole graph) we will want to re-associate that state with the newly-loaded nodes, so that e.g. node X will appear in red if it appeared in red on the previous subgraph. Does Royere provide any state maintenance mechanism like this that is sufficiently de-couplable that one might re-couple it with a newly loaded graph based on e.g. shared node IDs?

        Thanks again for the quick response!

        Ramon

         
        • yugen

          yugen - 2002-10-13

          > 1. Do you have any numbers memory
          > overhead for Royere graph structures
          > so that I might estimate how reasonable
          > it is to load our 50/500 graph into memory?

          I picked three test graphs from the Royere distribution and did three test runs each.  The results were very consistent:

          Graph        Size                                  Memory Footprint (M)
          Tree.xml     31 nodes, 30 edges          2.61
          IEEE.xml     599 nodes, 881 edges      8.02
          tree3k.xml  3255 nodes, 3254 edges   33.3

          This is almost perfectly linear with respect to the number of nodes (memory size = 2.32 + 0.0095 |V|).  Obviously, the edges must account for some of the memory usage, but Edge objects are much smaller than Node objects.  Extrapolating to 50,000 nodes yields an estimated memory footprint of about 477M.  Since you have 500,000 edges, you'd have to revise that figure upward somewhat, but that gives you a general idea of how much memory you would need.

           
        • M. Scott Marshall

          > 1. Do you have any numbers memory overhead for > Royere graph structures so that I might estimate
          > how reasonable it is to load our 50/500 graph into
          > memory?

          Note: Even if you managed to load the entire graph and Royere drew it, _interaction_ would slow to a standstill. I experienced a dramatic slowdown in interaction when I loaded 6K nodes and 12k edges.

           
    • yugen

      yugen - 2002-10-13

      > 2. Assume the above is not feasible, it
      > sounds like the next approach is to write a
      > custom graph loader that only faults a
      > sub-graph from the backing store.  One
      > way to do this would be to wipe the graph
      > from memory and reload the new subgraph
      > every time the user centers on a new node.
      > However, this would also wipe out any
      > graph state information (e.g. node
      > geometries and coordinates).

      That's true -- and I think it's unavoidable.  If you load a particular subgraph and lay it out somehow, and then try to load a second subgraph which happens to contain some of the nodes from the first subgraph, it would be hard to honor the node coordinates from the first layout, because layout algorithms generally assume that they get to determine the coordinates of every node. 

      Of course, nothing stops you from modifying an off-the-shelf layout algorithm to accommodate some predefined coordinates.

      As an aside, Royere does support the notion of "dependent layouts" that honor node coordinates assigned by other layouts, although they wouldn't really apply in this case.  See http://sourceforge.net/forum/forum.php?thread_id=627454&forum_id=126557 if you want to read more about dependent layouts.

      > Given that the subgraphs will likely have
      > common sub-regions (assuming this
      > re-centering technique is essentially a
      > way of scrolling a fixed edge-distance
      > window around the whole graph) we will
      > want to re-associate that state with
      > the newly-loaded nodes, so that e.g.
      > node X will appear in red if it appeared
      > in red on the previous subgraph.

      Just to clarify, visual information (like node color) is fundamentally different from layout information (node coordinates) and can be preserved more easily across overlapping subgraphs.

      > Does Royere provide any state maintenance
      > mechanism like this that is sufficiently
      > de-couplable that one might re-couple it
      > with a newly loaded graph based on e.g.
      > shared node IDs?

      When Royere was originally written, it assumed that all graphs would be stored in file format (e.g., GraphXML or GML).  The GraphXML DTD does support node attributes like color.

      Recently we have added the ability to persist graphs to a database.  Right now, we only store layout information to the database, not visual information, although that feature will be added fairly soon.

       

Log in to post a comment.