Re: [Tcllib-devel] Revision 37

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Andreas Kupries wrote:
> Michał Antoniewski wrote:
>> Hello.

And the promised review.

 > 1362:			
 > 1363:			set paths [ShortestsPathsByBFS $residualG $s paths]
 > 1364:			
 > 1365:			if { ($paths == 0) || (![dict exists $paths $t]) } break

I infer that ShortestsPathsByBFS can return either a number or a
dictionary. Not a good thing. The result type should be stable, only
one thing.

 > 1646:			
 > 1647:			#setting the path 'p' from 's' to 't'
 > 1648:			set paths [ShortestsPathsByBFS $Gf $s paths]
 > 1649:			
 > 1650:			#if there are no more paths, the search has ended
 > 1651:			if { ($paths == 0) || (![dict exists $paths $t]) } break

Ditto.

Now lets see why this happens.

 > 1707:	proc ::struct::graph::op::ShortestsPathsByBFS {G s outputFormat} {
 > 1708:		
 > 1709:		switch -exact -- $outputFormat {
 > 1710:			distances {
 > 1711:				set outputMode distances
 > 1712:			}
 > 1713:			paths {
 > 1714:				set outputMode paths
 > 1715:			}
 > 1716:			default {
 > 1717:				return -code error "Unknown output format \"$outputFormat\", 
expected graph, or tree"

graph, or tree ? Not distances, or paths ?

 > 1718:			}
 > 1719:		}
 > 1720:		
 > 1721:		set queue [list $s]
 > 1722:		set result {}
 > 1723:		
 > 1724:		#initialization of marked nodes, distances and predecessors
 > 1725:		foreach v [$G nodes] {
 > 1726:			$G node set $v marked 0

The marking can also be stored in a dict or array. As temporary data
it maybe should. Right now the 'marked' attribute is apparently
persistent, i.e. added to G by the operation, but not removed before
returning.

 > 1727:			dict set distances $v Inf
 > 1728:			dict set pred $v -1
 > 1729:		}
 > 1730:		
 > 1731:		#the s node is initially marked and has 0 distance to itself
 > 1732:		$G node set $s marked 1
 > 1733:		dict set distances $s 0
 > 1734:			
 > 1735:		#the main loop
 > 1736:		while { [llength $queue] != 0 } {
 > 1737:		
 > 1738:			#removing top element from the queue
 > 1739:			set v [lindex $queue 0]
 > 1740:			lremove queue $v

lremove is based on value, yet we know the element to remove is at
index 0. lrange and lreplace are applicable, no search required (as
done by lremove).

Note: The way lists are implemented in Tcl this type of operation
(removing an element at the beginning of the list) involves copying
the whole remainder of the list. This easily makes an algorithm O(n^2)
in the length of the list (queue).

A way to avoid that is to never remove elements, but use an index to
keep track how far we have processed the elements.

(1)	set queue {}
	set at 0 ;# next element to process
	while {$at < [llength $queue]} {
		set element [lindex $queue $at]
		incr at

		... process element, possibly extend the queue ...
	}

The memory used fby the 'queue' variable is larger, as nothing is
removed until the loops ends. It is finite however. Here it would be
bound by the number of nodes in the graph.

Another possibility is to use struct::queue (already required by some
other operations in the package).

	set q [struct::queue]
	while {[$q size]} {
		set element [$q get]
		... process element, possibly extend queue
	}
	$q delete

The implementation of struct::queue is similar to (1), with a bit more
code complexity to keep memory usage down without giving up the
performance (lappend is done into a second buffer, and buffers are
switched when the read buffer is exhausted per the read-index vs it
length).

 > 1741:			
 > 1742:			#for each arc that begins in v
 > 1743:			foreach arc [$G arcs -out $v] {
 > 1744:				
 > 1745:				set u [$G arc target $arc]
 > 1746:				set newlabel [ expr { [dict get $distances $v] + [$G arc getweight 
$arc] } ]
 > 1747:				
 > 1748:				if { $newlabel < [dict get $distances $u] } {
 > 1749:					
 > 1750:					dict set distances $u $newlabel
 > 1751:					dict set pred $u $v
 > 1752:					
 > 1753:					#case when current node wasn't placed in a queue yet -
 > 1754:					#we set u at the end of the queue
 > 1755:					if { [$G node set $u marked] == 0 } {
 > 1756:						lappend queue $u
 > 1757:						$G node set $u marked 1
 > 1758:					} else {
 > 1759:					
 > 1760:					#case when current node u was in queue before but it is not in it 
now -
 > 1761:					#we set u at the beginning of the queue
 > 1762:						if { [lsearch $queue $u] < 0 } {
 > 1763:							set queue [linsert $queue 0 $u]
 > 1764:						}

Ok, this makes the index method (1) more complex, because now the
'linsert' happens at $at, not 0.

For 'struct::queue' the method to insert at the front is
'unget'. Unfortunately there is no 'seach' method however, so using
struct::queue is not possible, contrary to what I thought before/above.

I have the feeling I should step the algorithm with an example to see
how this optimization works.

(The other branch is clear: Node not visited before, mark as visited
and put on queue for processing).

Your 'lsearch' <=>  '$u ni $queue'

 > 1765:					}
 > 1766:				}
 > 1767:			}
 > 1768:		}
 > 1769:		
 > 1770:		#if the outputformat is paths, we travel back to find shorests paths
 > 1771:		#to return sets of nodes for each node, which are their paths between
 > 1772:		#s and particular node
 > 1773:		dict set paths nopaths 1
 > 1774:		if { $outputMode eq "paths" } {
 > 1775:			foreach node [$G nodes] {	
 > 1776:				
 > 1777:				set path {}
 > 1778:				set lastNode $node
 > 1779:			
 > 1780:				while { $lastNode != -1 } {
 > 1781:					set currentNode [dict get $pred $lastNode]
 > 1782:					if { $currentNode != -1 } {
 > 1783:						lappend path $currentNode
 > 1784:					}
 > 1785:					set lastNode $currentNode
 > 1786:				}
 > 1787:				
 > 1788:				set path [lreverse $path]

Ah, this is different from dijkstra, this is why the caller did not
have to lreverse on their own.

	Any specific reason why this is better ?
	Or is dijkstra's form for the result better ?

	Or should we modify 'disjktra' to return the same as this
	procedure ?

 > 1789:				
 > 1790:				if { [llength $path] != 0 } {
 > 1791:					dict set paths $node $path
 > 1792:					dict unset paths nopaths
 > 1793:				}
 > 1794:			}
 > 1795:			
 > 1796:			if { ![dict exists $paths nopaths] } {
 > 1797:				return $paths
 > 1798:			} else {
 > 1799:				return 0

Better to return '{}' <=> an empty dictionary

That makes the return type always a dictionary, and the checks in
lines 1365 and 1651 become simpler (The ![dict exists] becomes
sufficient).

 > 1810:	proc ::struct::graph::op::BFS {G s outputFormat} {
 > 1811:		
 > 1812:		set queue [list $s]
 > 1813:			
 > 1826:		if { $outputMode eq "graph" } {
 > 1827:		#graph initializing
 > 1828:			set BFSGraph [struct::graph]
 > 1829:			foreach v [$G nodes] {
 > 1830:				$BFSGraph node insert $v
 > 1831:			}
 > 1832:		} else {
 > 1833:		#tree initializing
 > 1834:			set BFSTree [struct::tree]
 > 1835:			$BFSTree set root name $s
 > 1836:			$BFSTree rename root $s
 > 1837:		}
 > 1838:		
 > 1839:		#initilization of marked nodes
 > 1840:		foreach v [$G nodes] {
 > 1841:			$G node set $v marked 0

Note discussion of marking for 'ShortestPathByBFS'.

 > 1842:		}
 > 1843:		
 > 1844:		#start node is marked from the beginning
 > 1845:		$G node set $s marked 1
 > 1846:		
 > 1847:		#the main loop
 > 1848:		while { [llength $queue] != 0 } {
 > 1849:			#removing top element from the queue
 > 1850:			
 > 1851:			set v [lindex $queue 0]
 > 1852:			lremove queue $v

Note discussion of the queue handling above for 'ShortestPathByBFS'.

 > 1853:			
 > 1854:			foreach x [$G nodes -adj $v] {
 > 1855:				if { ![$G node set $x marked] } {
 > 1856:					$G node set $x marked 1
 > 1857:					lappend queue $x
 > 1858:					
 > 1859:					if { $outputMode eq "graph" } {				
 > 1860:						$BFSGraph arc insert $v $x [list $v $x]
 > 1861:					} else {
 > 1862:						$BFSTree insert $v end $x
 > 1863:						lappend result [list $v $x]

The variable 'result' seems to be superfluous. Not used by the loop,
not returned either. Leftover bits ?

 > 1864:					}				
 > 1865:				}
 > 1866:			}
 > 1867:		}
 > 1868:		
 > 1869:		
 > 1870:		if { $outputMode eq "graph" } {
 > 1871:			return $BFSGraph
 > 1872:		} else {
 > 1873:			return $BFSTree
 > 1874:		}
 > 1875:	}

 > 1877:	#Minimum Diameter Spanning Tree - MDST
 > 1878:	#
 > 1879:	#
 > 1880:	#The goal is to find for input graph G, the spanning tree that
 > 1881:	#has the minimum diameter worth.
 > 1882:	#
 > 1883:	#General idea of algorithm is to run BFS over all vertices in graph
 > 1884:	#G. If the diameter "d" of the tree is odd, then we are sure that tree
 > 1885:	#given by BFS is minimum (considering diameter value). When, diameter "d"
 > 1886:	#is even, then optimal tree can have minimum diameter equal to "d" or
 > 1887:	#"d-1".
 > 1888:	#
 > 1889:	#In that case, what algorithm does is rebuilding the tree given by BFS, by
 > 1890:	#adding a vertice between root node and root's child node (nodes), 
such that
 > 1891:	#subtree created with child node as root node is the greatest one (has the
 > 1892:	#greatests height). In the next step for such rebuilded tree, we run 
again BFS
 > 1893:	#with new node as root node. If the height of the tree didn't changed, 
we have found
 > 1894:	#a better solution.
 > 1895:	
 > 1896:	proc ::struct::graph::op::MinimumDiameterSpanningTree {G} {
 > 1897:	
 > 1898:		set min_diameter Inf
 > 1899:		set best_Tree [struct::graph]
 > 1900:		
 > 1901:		foreach v [$G nodes] {
 > 1902:		
 > 1903:			#BFS Tree
 > 1904:			set T [BFS $G $v tree]		
 > 1905:			#BFS Graph
 > 1906:			set TGraph [BFS $G $v graph]
 > 1907:				
 > 1908:			#Setting all arcs to 1 for diameter procedure
 > 1909:			$TGraph arc setunweighted 1
 > 1910:			
 > 1911:			#setting values for current Tree
 > 1912:			set diam [diameter $TGraph]
 > 1913:			set subtreeHeight [ expr { $diam / 2 - 1} ]
 > 1914:				
 > 1915:			##############################################
 > 1916:			#case when diameter found for tree found by BFS is even:
 > 1917:			#it's possible to decrease the diameter by one.
 > 1918:			if { [expr { $diam % 2 } ] == 0 } {

Nested expression superfluous

		if { ($diam % 2) == 0 } {

 > 1919:				
 > 1920:				#for each child u that current root node v has, we search

	Each child u of v ... Are you talking only about the direct children ?
	Or also about the child of children of v, etc. ?

 > 1921:				#for the greatest subtree(subtrees) with the root in child u.
 > 1922:				#
 > 1923:				foreach u [$TGraph nodes -adj $v] {
 > 1924:					set u_depth [$T depth $u]

	As u is a child of v the depth should be '1', always.

 > 1925:					set d_depth 0
 > 1926:					

 > 1927:					set descendants [$T descendants $u]
 > 1928:					
 > 1929:					foreach d $descendants {
 > 1930:						if { $d_depth < [$T depth $d] } {
 > 1931:							set d_depth [$T depth $d]
 > 1932:						}
 > 1933:					}

If I read this correctly, you are computing for each node d under u
the depth, i.e. distance to the root, and take the maximum. IIRC this
is the height of the tree under u, relative to root, is that correct ?

If yes, then this should be

		1 + [$T height $u]

I believe (struct::tree has a 'height' method).

 > 1934:					
 > 1935:					#depth of the current subtree
 > 1936:					set depth [ expr { $d_depth - $u_depth } ]

Now, with u_depth == 1, always, and d_depth = 1 + [$T height $u] we get

	depth == 1 + [$T height $u] - 1 == [$T height $u]

If my assumptions are correct then lines 1924 - 1936 can be collapsed into one 
command.

	set depth [$T height $u]

 > 1937:								
 > 1938:					#proceed if found subtree is the greatest one
 > 1939:					if { $depth >= $subtreeHeight } {
 > 1940:						
 > 1941:						#temporary Graph for holding potential better values
 > 1942:						set tempGraph [struct::graph]
 > 1943:						
 > 1944:						foreach node [$TGraph nodes] {
 > 1945:							$tempGraph node insert $node
 > 1946:						}
 > 1947:						
 > 1948:						#zmienic nazwy zmiennych zeby sie nie mylily
 > 1949:						foreach arc [$TGraph arcs] {
 > 1950:							set _u [$TGraph arc source $arc]
 > 1951:							set _v [$TGraph arc target $arc]
 > 1952:							$tempGraph arc insert $_u $_v [list $_u $_v]
 > 1953:						}

I believe I have seen this graph initialization (1941-1953, copy of
nodes, copy of structure with u/v arc naming convention) often enough
that we can puts this into a separate procedure we can call at all the
relevant places. It seems hat we need it for basically every operation

 > 1954:			
 > 1955:						if { [$tempGraph arc exists [list $u $v]] } {
 > 1956:							$tempGraph arc delete [list $u $v]
 > 1957:						} else {
 > 1958:							$tempGraph arc delete [list $v $u]
 > 1959:						}

 > 1960:						
 > 1961:						#for nodes u and v, we add a node between them
 > 1962:						#to again start BFS with root in new node to check
 > 1963:						#if it's possible to decrease the diameter in solution
 > 1964:						set node [$tempGraph node insert]
 > 1965:						$tempGraph arc insert $node $v [list $node $v]
 > 1966:						$tempGraph arc insert $node $u [list $node $u]
 > 1967:					
 > 1968:						set tempGraph [BFS $tempGraph $node graph]
 > 1969:								
 > 1970:						$tempGraph node delete $node
 > 1971:						$tempGraph arc insert $u $v [list $u $v]
 > 1972:						$tempGraph arc setunweighted 1
 > 1973:						
 > 1974:						set tempDiam [diameter $tempGraph ]
 > 1975:						
 > 1976:						#if better tree is found (that any that were already found)
 > 1977:						#replace it
 > 1978:						if { $min_diameter > $tempDiam } {
 > 1979:							set $min_diameter [diameter $tempGraph ]
 > 1980:							set best_Tree $tempGraph
 > 1981:						}

Hm. Here are some leaks ... If tempGraph does not become best_Tree it
should be released, correct ?  Further, if tempGraph does become
best_Tree then the old best_Tree (if defined) should be released too.

 > 1982:					}
 > 1983:					
 > 1984:				}
 > 1985:			}
 > 1986:			################################################################
 > 1987:			
 > 1988:			set currentTreeDiameter [ diameter $TGraph ]

set currentTreeDiameter $diam

See line 1912, before the humunguous if-block handling the even diameter.,
and the fact that this big if-block is not changing TGraph at all.

 > 1989:			
 > 1990:			if { $min_diameter > $currentTreeDiameter } {
 > 1991:				set min_diameter $currentTreeDiameter
 > 1992:				set best_Tree $TGraph

See leak discussion above (1981).

 > 1993:			}
 > 1994:		

The tree $T (line 1904) seems to leak as well.

 > 1995:		}
 > 1996:		
 > 1997:		return $best_Tree
 > 1998:	}
 > 1999:	
 > 2000:	#
 > 2001:	proc ::struct::graph::op::MinimumDegreeSpanningTree {G} {

 > 2022:		#main loop	
 > 2023:		foreach e [$G arcs] {
 > 2024:			
 > 2025:			set u [$G arc source $e]
 > 2026:			set v [$G arc target $e]
 > 2027:			

 > 2028:			$MST arc setunweighted 1

This command is indendent of 'e', should be run before the
loop. Around line 2021, after MST has gotten its arcs.

 > 2029:			
 > 2030:			#setting the path between nodes u and v in Spanning Tree MST
 > 2031:			set path [dict get [dijkstra $MST $u] $v]
 > 2032:			lappend path $v
 > 2033:			
 > 2034:			#if nodes u and v are neighbours, proceed to next iteration

This condition is quicker checked with 'arc exists' for the two
possible directions, see below. Doing so means that we can move the
expensive dijkstra into the if-block, hopefully using it less.

 > 2035:			if { [llength $path] > 2 } {

	if { !arc exists (u -> v) && !(arc exists v ->) }

 > 2036:				
 > 2037:				#search for the node in the path, such that its degree is greater 
than degree of any of nodes
 > 2038:				#u or v increased by one
 > 2039:				foreach node $path {
 > 2040:					if { [$MST node degree $node] > [ expr { [Max [$MST node degree 
$u] [$MST node degree $v]] + 1 } ] } {

Nested 'expr' superfluous, see also line 1918.

Andreas.