Re: [Tcllib-devel] GSoC Graph. Review revision 25

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Andreas Kupries wrote:
>  >  812:	proc ::struct::graph::op::createTwoSquaredGraph {G} {
>  >  813:		
>  >  814:		set H [struct::graph]
>  >  815:		set copyG [struct::graph]
>  >  816:		
>  >  817:		foreach v [$G nodes] {
>  >  818:			$H node insert $v
>  >  819:			$copyG node insert $v
>  >  820:		}
>  >  821:		
>  >  822:		foreach e [$G arcs] {
>  >  823:			set v [$G arc source $e]
>  >  824:			set u [$G arc target $e]
>  >  825:			
>  >  826:			$copyG arc insert $v $u [list $v $u]
>  >  827:			$copyG arc setweight [list $v $u] 1
>  >  828:		}
>  >  829:		
>  >  830:		foreach v [$copyG nodes] {
>  >  831:			foreach u [$copyG nodes] {
>  >  832:				if { [distance $copyG $v $u] <= 2 && ![$H arc exists [list $u $v]] 
> && ($u ne $v)} {

I thought about this a bit more, on the way home yesterday, and sleeping over 
it. Not to be meant as criticism, the code here feels like a direct translation 
of the mathematical definition of G**2 graphs. It basically works by generating 
all pairs of nodes and then testing the distance between them, where the 
distance code is particular inefficient by using a dijkstra which computes the 
data for all nodes, and then throws 99% of it away.

We can do it better ... In essence, generate the node pairs in such a way that 
a distance check is not required, because the way the generation works ensures 
that the distance <= 2 condition holds true.

For this we have to unravel the definition of distance a bit, i.e. what does 
distance <= 2 mean ? We can analyze this easier by splitting it into 2 cases, i.e.
	distance == 1,
and	distance == 2.

So, distance (u,v)== 1 means ?
	That node u can be reached from v in one hop
<=>	u and v are connected by an edge in G	(a)
<=>	u is a "neighbour of v in G".		(b)	(X)

In other words, the G**@ graph has all edges of the original graph, and we can 
generate them either through translating (a) or (b) into code, i.e.

(Ad a)	for all e edge in G, with end nodes u, v
		add edge (u,v) to G**2

(Ad b)	for all nodes u in G
		for all nodes v neighbour of u in G
			add edge (u,v) to G**2, if u != v

Of these two fragments I prefer (b), for reasons explained when I am dealing 
with distance 2.

(Ad X)	As the input graph is considered unidirectional its neighbourhood is
	all adjacent nodes, whether conected by incoming or outgoing arcs.

Now, distance(u,v) == 2 means ?

	That node u can be reached from v in two hops
<=>	It exists a node z in G connected to both u and v
	with edges in G.			(c)

<=>	u is a neighbour of a neighbour of v in G,
	but not a neighbour of v, nor v.	(d)

Definition (c) doesn't translate that well to code, but (d) does.

(Ad d)	for all nodes u in G
		for all nodes z neighbour of u in G
			for all nodes v neighbour of z in G
				add edge (u,v) to G**2
					if	u != v and
						no edge (u,v) existing already.

The if condition at the end takes care of the "but not ..." terms in (d).

This also looks very similar to the code for (b), indeed the two fragements can 
be merged into one fragment which takes of both at the same time. Which is why 
I prefered fragment (b) here.

what we are in essence doing is to execute the two steps of dijkstra needed to 
find the nodes at distances 1 and 2 for all nodes, but no more. As this 
directly generates the nodes at the distance we are interested in the distance 
test becomes irrelevant. It is always true, by construction.

While it doesn't look like the mathematical definition any longer, and as such 
needs more comments to make the relation to it clear, it should also be more 
efficient than the naive code.

Now, looking at the overall algorithm,, as described in the other mail, i.e one 
level higher, about the series of G(i) of G(i)**2 we are apparently searching 
from G(1), G(2), ... on up one important question to ask is

	Do we have to compute all the G(i), G(i)**2 from the ground up ?

Complementary phrasing:
	Can we generate the G(i), G(i)**2 incrementally from the previous
	graph G(i-1), G(i-1)**2 ?	

In our case the answer is yes, we can do this incrementally.

	Assuming G(i-1), G(i-1)**2 exist, then

		G(i) = G(i-1) + edge Ei.

		G(i)**2 = (G(i-1)**2 + Ei + additional edges.
		These additional edges can be computed from Ei!
		Let Ei = (u,v). Then the nodes
			z in neighbours(u,G)-v are distance 2 to v
		and	z in neighbours(v,G)-u are distance 2 to u
		Easy to ocmpute and add.

	And the induction is started with G(0) == G(0)**2 == The graph
	containing just the nodes of the input graph, and no edges.

	The upshot of this is, we need only create one graph X in the code,
	which we initialize to contain G(0)**2, and during the loop iterating
	over our sorted edges we then extend this graph in each round using the
	above rules with Ei and distance-2 edges to contain G(i)**2.

Instead of createTwoSquaredGraph a better helper procedure will be 
extendTwoSquaredGraph

Andreas