Menu

#192 vertices/edges implementation not optimal for large graps

Likely Bug
open
nobody
9
2013-06-24
2013-02-08
No

Hi,
If you are working with large graphs you will probably be surprised by slow performances of Jung library typically when calling graph.getVertex(v)
I think this is due to the not optimized implementation of vertices and edges fields in basic graph classes such as DirectedSparseMultigraph<V,E>.

In fact vertices and edges are implemented as HashMaps:
vertices = new HashMap<V, Pair<Set<E>>>();
edges = new HashMap<E, Pair<V>>();

However, HashMap provides constant-time performance for the basic operations (get and put) but not optimal access time. (see http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html\)

In contrast of HashMap, the TreeMap implementation provides guaranteed log(n) time cost for containsKey, get, put and remove operations (see http://docs.oracle.com/javase/1.5.0/docs/api/java/util/TreeMap.html\).
Thank you to take into consideration this issue in next Jung release.

Best regards.

Discussion

  • Lamjed Ben Jabeur

    • priority: 5 --> 9
     
  • Pradeep Gollakota

    I'm not entirely sure what you mean by "... but not optimal access time."

    HashMap's provide O(1) random access. Iteration over HashMaps is less performant because it's complexity is proportional to the capacity instead of it's size.

    In your example, you mentioned that the common access pattern is graph.getVertex(v) which would be considered a random access. The random access guarantee (O(log n)) for TreeMap is too slow for large graphs. A TreeMap would also introduce the requirement for V to be Comparable<V> which is probably inadvisable.

    If a more performant iteration over the vertex set is necessary, it may be worthwhile to change the implementation from a HashMap to a LinkedHashMap which provides O(1) random access and has faster iteration over entries in the Map.

    http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html

     

Log in to post a comment.