From: Michał A. <ant...@gm...> - 2009-07-18 22:32:48
|
Hello. Regarding previous ping mail, promised description. 1. BFS. After some attempts and different organisations of code I decided to divide that BFS procedure into 2 procedures - one for shortests paths finding (ShortestPathsByBFS) and standard BFS (BFS). Why and some notes about them: - creating one procedure for both, makes code quite complex - as both procedures can return dictionaries, trees and graphs, there would be a lot of repeated code or a lot of statements like if which make code doesn't look good ( I just wasn't satisfied at all when I looked at it ) - dividing into 2 procedures, lets us give easily some improvement to path finding : e.g. if node wasn't in queue yet, it's added at the end of the queue, but when it was added before and erased also from the queue, we add it at the beginning of the queue - it helps a lot cause a lot of not needed computation is omitted. - I just see it like 1000 times clearer, when there is such division - both, more intuitive and also considering implementation. Shortests path finding by BFS vs Dijkstra: - BFS can work with negative weights ( but negative cycles are of course unwelcome ) - it's quicker for not sparse graphs, with big amount of nodes - it also finds (like Dijkstra) paths from one node to others, so we don't have to use e.g Bellman's Ford, when negative weights are there. Path finding by BFS returns us: - dictionary containing distances between start node s and each other vertice - or dictionary containing paths between start node and each other vertice Notes to BFS path finding: - it's weighted, so to get unweighted version we just give at input graph with all weights set to 1.... it's allright or should there be considered unweighted option? - it works with graph using edge weights - should I change it to attributes and assume for further work that all code should be using only attributes? Standards BFS returns: - BFS tree found by algorithm (struct::tree) - BFS graph found by algorithm 2. MDST - Minimum diameter spanning tree Firstly, how algorithm works: - it searches for each node it's BFS tree - so, for each node there is found a tree, with root in that node - if, that tree has odd diameter, it's optimal solution - if not.... there can be better solution. For diameter d (diameter of the tree found by BFS), OPT can be "d" or "d-1" - now to find, if there is better solution, we do a little trick: a) we search for the deepest subtree of current BFS tree, considering only children of root node as roots of that subtree [*] b) for each subtree found, we do such operation: we add a node between root node of subtree and root node of BFS tree and run BFS again with the start node in a new node we have just added. If the depth of the tree won't change the better solution is found. - from all the trees that BFS found, we choose the one with the smallest diameter. [*] I've got one concern about that step. My resources say two things: one that we find as I wrote above - the deepest subtree, but also I found that we search for subtree, which height ( the greatest level in a tree -> depth + 1 ) is equal to diameter ( BFS Tree )/2 - 1 (lets call it X for further investigation). So, know I have an issue, if I can just put that equality with X or not, cause I found an example of tree that have even diameter and has bigger value of height than X. On the other hand, that example didn't force the improvement of solution....so, it's possible that this X value can bound the set of proper trees well....but as I'm not completely sure I've put in the code "height >= X" and will investigate it further. Implementation: - it uses both : graph found with BFS and tree found by BFS. - graph is used for diameter counting and other issues, tree is better for handling descendants and finding the greatest tree. - it forces adding struct::tree package require to graph ops....is that ok? - when diameter is odd, it just checks, if found solution is better than previous one - when diameter is even, it does those operations I described above. It creates additional graph for it - tempGraph. If tempGraph has better diameter after providing those changes - the better solution is set. 3. Flow problems Now both Flow Problems use BFS pathfinding procedure. I wrote a little about it in previous mails but to remind: - negative arcs - BFS can handle them - for not sparse networks BFS is better than Dijkstra - like I noted above. - Dijkstra after adding those "must-have" improvements, does a lot more computations. So generally, for typical flow problems BFS will get shorter times of computations - even for complete networks ( it's proven ). - as BFS has some pathological cases, when reaches exponential increase of amount of operations, but those networks won't occur in practise. I hope I remebered to write about everything. Just tell me what you think about such organisation. Tests are not full yet, so I will supplement the lacking ones. Best Regards, Michał Antoniewski |
From: Andreas K. <and...@ac...> - 2009-07-20 17:00:18
|
Michał Antoniewski wrote: > Hello. > > Regarding previous ping mail, promised description. > > > 1. BFS. > > After some attempts and different organisations of code I decided to > divide that BFS procedure into 2 procedures - one for shortests paths > finding (ShortestPathsByBFS) and standard BFS (BFS). > > Why and some notes about them: > - creating one procedure for both, makes code quite complex > - as both procedures can return dictionaries, trees and graphs, there > would be a lot of repeated code or a lot of statements like if which > make code doesn't look good ( I just wasn't satisfied at all when I > looked at it ) > - dividing into 2 procedures, lets us give easily some improvement to > path finding : e.g. if node wasn't in queue yet, it's added at the end > of the queue, but when it was added before and erased also from the > queue, we add it at the beginning of the queue - it helps a lot cause a > lot of not needed computation is omitted. > - I just see it like 1000 times clearer, when there is such division - > both, more intuitive and also considering implementation. I'll have a closer look at this then. > Shortests path finding by BFS vs Dijkstra: > - BFS can work with negative weights ( but negative cycles are of course > unwelcome ) > - it's quicker for not sparse graphs, with big amount of nodes > - it also finds (like Dijkstra) paths from one node to others, so we > don't have to use e.g Bellman's Ford, when negative weights are there. > Path finding by BFS returns us: > - dictionary containing distances between start node s and each other > vertice > - or dictionary containing paths between start node and each other vertice Like dijkstra then. > Notes to BFS path finding: > - it's weighted, so to get unweighted version we just give at input > graph with all weights set to 1.... it's allright or should there be > considered unweighted option? No need for the unweighted option, IMHO. > - it works with graph using edge weights - should I change it to > attributes Does it make sense for upcoming algorithms ? > and assume for further work that all code should be using > only attributes? Yes, IMHO. > Standards BFS returns: > - BFS tree found by algorithm (struct::tree) > - BFS graph found by algorithm > 2. MDST - Minimum diameter spanning tree > > Firstly, how algorithm works: > - it searches for each node it's BFS tree - so, for each node there is > found a tree, with root in that node > - if, that tree has odd diameter, it's optimal solution (*) > - if not.... there can be better solution. For diameter d (diameter of > the tree found by BFS), OPT can be "d" or "d-1" > - now to find, if there is better solution, we do a little trick: > a) we search for the deepest subtree of current BFS tree, > considering only children of root node as roots of that subtree [*] > b) for each subtree found, we do such operation: we add a node > between root node of subtree and root node of BFS tree and run BFS again > with the start node in a new node we have just added. If the depth of > the tree won't change the better solution is found. > - from all the trees that BFS found, we choose the one with the smallest > diameter. > [*] I've got one concern about that step. My resources say two things: > one that we find as I wrote above - the deepest subtree, but also I > found that we search for subtree, which height ( the greatest level in a > tree -> depth + 1 ) is equal to diameter ( BFS Tree )/2 - 1 (lets call > it X for further investigation). So, know I have an issue, if I can just > put that equality with X or not, cause I found an example of tree that > have even diameter and has bigger value of height than X. On the other > hand, that example didn't force the improvement of solution....so, it's > possible that this X value can bound the set of proper trees well....but > as I'm not completely sure I've put in the code "height >= X" and will > investigate it further. > > Implementation: > - it uses both : graph found with BFS and tree found by BFS. > - graph is used for diameter counting and other issues, tree is better > for handling descendants and finding the greatest tree. > - it forces adding struct::tree package require to graph ops....is that ok? Yes. > - when diameter is odd, it just checks, if found solution is better than > previous one Ah, so what you say at (*) above means 'optimal solution for that specific node', not 'optimal solution for the entire graph'. > - when diameter is even, it does those operations I described above. It > creates additional graph for it - tempGraph. If tempGraph has better > diameter after providing those changes - the better solution is set. > > > 3. Flow problems > > Now both Flow Problems use BFS pathfinding procedure. > > I wrote a little about it in previous mails but to remind: > - negative arcs - BFS can handle them > - for not sparse networks BFS is better than Dijkstra - like I noted above. > - Dijkstra after adding those "must-have" improvements, does a lot more > computations. So generally, for typical flow problems BFS will get > shorter times of computations - > even for complete networks ( it's proven ). > - as BFS has some pathological cases, when reaches exponential increase > of amount of operations, but those networks won't occur in practise. Can you describe the pathological cases ? This would make a good thing to note in the docs. > I hope I remebered to write about everything. Just tell me what you > think about such organisation. Review of rev 37+38 will be done sometime today. > Tests are not full yet, so I will supplement the lacking ones. Andreas. |
From: Andreas K. <and...@ac...> - 2009-07-20 18:24:01
|
Andreas Kupries wrote: > Michał Antoniewski wrote: >> Hello. And the promised review. > 1362: > 1363: set paths [ShortestsPathsByBFS $residualG $s paths] > 1364: > 1365: if { ($paths == 0) || (![dict exists $paths $t]) } break I infer that ShortestsPathsByBFS can return either a number or a dictionary. Not a good thing. The result type should be stable, only one thing. > 1646: > 1647: #setting the path 'p' from 's' to 't' > 1648: set paths [ShortestsPathsByBFS $Gf $s paths] > 1649: > 1650: #if there are no more paths, the search has ended > 1651: if { ($paths == 0) || (![dict exists $paths $t]) } break Ditto. Now lets see why this happens. > 1707: proc ::struct::graph::op::ShortestsPathsByBFS {G s outputFormat} { > 1708: > 1709: switch -exact -- $outputFormat { > 1710: distances { > 1711: set outputMode distances > 1712: } > 1713: paths { > 1714: set outputMode paths > 1715: } > 1716: default { > 1717: return -code error "Unknown output format \"$outputFormat\", expected graph, or tree" graph, or tree ? Not distances, or paths ? > 1718: } > 1719: } > 1720: > 1721: set queue [list $s] > 1722: set result {} > 1723: > 1724: #initialization of marked nodes, distances and predecessors > 1725: foreach v [$G nodes] { > 1726: $G node set $v marked 0 The marking can also be stored in a dict or array. As temporary data it maybe should. Right now the 'marked' attribute is apparently persistent, i.e. added to G by the operation, but not removed before returning. > 1727: dict set distances $v Inf > 1728: dict set pred $v -1 > 1729: } > 1730: > 1731: #the s node is initially marked and has 0 distance to itself > 1732: $G node set $s marked 1 > 1733: dict set distances $s 0 > 1734: > 1735: #the main loop > 1736: while { [llength $queue] != 0 } { > 1737: > 1738: #removing top element from the queue > 1739: set v [lindex $queue 0] > 1740: lremove queue $v lremove is based on value, yet we know the element to remove is at index 0. lrange and lreplace are applicable, no search required (as done by lremove). Note: The way lists are implemented in Tcl this type of operation (removing an element at the beginning of the list) involves copying the whole remainder of the list. This easily makes an algorithm O(n^2) in the length of the list (queue). A way to avoid that is to never remove elements, but use an index to keep track how far we have processed the elements. (1) set queue {} set at 0 ;# next element to process while {$at < [llength $queue]} { set element [lindex $queue $at] incr at ... process element, possibly extend the queue ... } The memory used fby the 'queue' variable is larger, as nothing is removed until the loops ends. It is finite however. Here it would be bound by the number of nodes in the graph. Another possibility is to use struct::queue (already required by some other operations in the package). set q [struct::queue] while {[$q size]} { set element [$q get] ... process element, possibly extend queue } $q delete The implementation of struct::queue is similar to (1), with a bit more code complexity to keep memory usage down without giving up the performance (lappend is done into a second buffer, and buffers are switched when the read buffer is exhausted per the read-index vs it length). > 1741: > 1742: #for each arc that begins in v > 1743: foreach arc [$G arcs -out $v] { > 1744: > 1745: set u [$G arc target $arc] > 1746: set newlabel [ expr { [dict get $distances $v] + [$G arc getweight $arc] } ] > 1747: > 1748: if { $newlabel < [dict get $distances $u] } { > 1749: > 1750: dict set distances $u $newlabel > 1751: dict set pred $u $v > 1752: > 1753: #case when current node wasn't placed in a queue yet - > 1754: #we set u at the end of the queue > 1755: if { [$G node set $u marked] == 0 } { > 1756: lappend queue $u > 1757: $G node set $u marked 1 > 1758: } else { > 1759: > 1760: #case when current node u was in queue before but it is not in it now - > 1761: #we set u at the beginning of the queue > 1762: if { [lsearch $queue $u] < 0 } { > 1763: set queue [linsert $queue 0 $u] > 1764: } Ok, this makes the index method (1) more complex, because now the 'linsert' happens at $at, not 0. For 'struct::queue' the method to insert at the front is 'unget'. Unfortunately there is no 'seach' method however, so using struct::queue is not possible, contrary to what I thought before/above. I have the feeling I should step the algorithm with an example to see how this optimization works. (The other branch is clear: Node not visited before, mark as visited and put on queue for processing). Your 'lsearch' <=> '$u ni $queue' > 1765: } > 1766: } > 1767: } > 1768: } > 1769: > 1770: #if the outputformat is paths, we travel back to find shorests paths > 1771: #to return sets of nodes for each node, which are their paths between > 1772: #s and particular node > 1773: dict set paths nopaths 1 > 1774: if { $outputMode eq "paths" } { > 1775: foreach node [$G nodes] { > 1776: > 1777: set path {} > 1778: set lastNode $node > 1779: > 1780: while { $lastNode != -1 } { > 1781: set currentNode [dict get $pred $lastNode] > 1782: if { $currentNode != -1 } { > 1783: lappend path $currentNode > 1784: } > 1785: set lastNode $currentNode > 1786: } > 1787: > 1788: set path [lreverse $path] Ah, this is different from dijkstra, this is why the caller did not have to lreverse on their own. Any specific reason why this is better ? Or is dijkstra's form for the result better ? Or should we modify 'disjktra' to return the same as this procedure ? > 1789: > 1790: if { [llength $path] != 0 } { > 1791: dict set paths $node $path > 1792: dict unset paths nopaths > 1793: } > 1794: } > 1795: > 1796: if { ![dict exists $paths nopaths] } { > 1797: return $paths > 1798: } else { > 1799: return 0 Better to return '{}' <=> an empty dictionary That makes the return type always a dictionary, and the checks in lines 1365 and 1651 become simpler (The ![dict exists] becomes sufficient). > 1810: proc ::struct::graph::op::BFS {G s outputFormat} { > 1811: > 1812: set queue [list $s] > 1813: > 1826: if { $outputMode eq "graph" } { > 1827: #graph initializing > 1828: set BFSGraph [struct::graph] > 1829: foreach v [$G nodes] { > 1830: $BFSGraph node insert $v > 1831: } > 1832: } else { > 1833: #tree initializing > 1834: set BFSTree [struct::tree] > 1835: $BFSTree set root name $s > 1836: $BFSTree rename root $s > 1837: } > 1838: > 1839: #initilization of marked nodes > 1840: foreach v [$G nodes] { > 1841: $G node set $v marked 0 Note discussion of marking for 'ShortestPathByBFS'. > 1842: } > 1843: > 1844: #start node is marked from the beginning > 1845: $G node set $s marked 1 > 1846: > 1847: #the main loop > 1848: while { [llength $queue] != 0 } { > 1849: #removing top element from the queue > 1850: > 1851: set v [lindex $queue 0] > 1852: lremove queue $v Note discussion of the queue handling above for 'ShortestPathByBFS'. > 1853: > 1854: foreach x [$G nodes -adj $v] { > 1855: if { ![$G node set $x marked] } { > 1856: $G node set $x marked 1 > 1857: lappend queue $x > 1858: > 1859: if { $outputMode eq "graph" } { > 1860: $BFSGraph arc insert $v $x [list $v $x] > 1861: } else { > 1862: $BFSTree insert $v end $x > 1863: lappend result [list $v $x] The variable 'result' seems to be superfluous. Not used by the loop, not returned either. Leftover bits ? > 1864: } > 1865: } > 1866: } > 1867: } > 1868: > 1869: > 1870: if { $outputMode eq "graph" } { > 1871: return $BFSGraph > 1872: } else { > 1873: return $BFSTree > 1874: } > 1875: } > 1877: #Minimum Diameter Spanning Tree - MDST > 1878: # > 1879: # > 1880: #The goal is to find for input graph G, the spanning tree that > 1881: #has the minimum diameter worth. > 1882: # > 1883: #General idea of algorithm is to run BFS over all vertices in graph > 1884: #G. If the diameter "d" of the tree is odd, then we are sure that tree > 1885: #given by BFS is minimum (considering diameter value). When, diameter "d" > 1886: #is even, then optimal tree can have minimum diameter equal to "d" or > 1887: #"d-1". > 1888: # > 1889: #In that case, what algorithm does is rebuilding the tree given by BFS, by > 1890: #adding a vertice between root node and root's child node (nodes), such that > 1891: #subtree created with child node as root node is the greatest one (has the > 1892: #greatests height). In the next step for such rebuilded tree, we run again BFS > 1893: #with new node as root node. If the height of the tree didn't changed, we have found > 1894: #a better solution. > 1895: > 1896: proc ::struct::graph::op::MinimumDiameterSpanningTree {G} { > 1897: > 1898: set min_diameter Inf > 1899: set best_Tree [struct::graph] > 1900: > 1901: foreach v [$G nodes] { > 1902: > 1903: #BFS Tree > 1904: set T [BFS $G $v tree] > 1905: #BFS Graph > 1906: set TGraph [BFS $G $v graph] > 1907: > 1908: #Setting all arcs to 1 for diameter procedure > 1909: $TGraph arc setunweighted 1 > 1910: > 1911: #setting values for current Tree > 1912: set diam [diameter $TGraph] > 1913: set subtreeHeight [ expr { $diam / 2 - 1} ] > 1914: > 1915: ############################################## > 1916: #case when diameter found for tree found by BFS is even: > 1917: #it's possible to decrease the diameter by one. > 1918: if { [expr { $diam % 2 } ] == 0 } { Nested expression superfluous if { ($diam % 2) == 0 } { > 1919: > 1920: #for each child u that current root node v has, we search Each child u of v ... Are you talking only about the direct children ? Or also about the child of children of v, etc. ? > 1921: #for the greatest subtree(subtrees) with the root in child u. > 1922: # > 1923: foreach u [$TGraph nodes -adj $v] { > 1924: set u_depth [$T depth $u] As u is a child of v the depth should be '1', always. > 1925: set d_depth 0 > 1926: > 1927: set descendants [$T descendants $u] > 1928: > 1929: foreach d $descendants { > 1930: if { $d_depth < [$T depth $d] } { > 1931: set d_depth [$T depth $d] > 1932: } > 1933: } If I read this correctly, you are computing for each node d under u the depth, i.e. distance to the root, and take the maximum. IIRC this is the height of the tree under u, relative to root, is that correct ? If yes, then this should be 1 + [$T height $u] I believe (struct::tree has a 'height' method). > 1934: > 1935: #depth of the current subtree > 1936: set depth [ expr { $d_depth - $u_depth } ] Now, with u_depth == 1, always, and d_depth = 1 + [$T height $u] we get depth == 1 + [$T height $u] - 1 == [$T height $u] If my assumptions are correct then lines 1924 - 1936 can be collapsed into one command. set depth [$T height $u] > 1937: > 1938: #proceed if found subtree is the greatest one > 1939: if { $depth >= $subtreeHeight } { > 1940: > 1941: #temporary Graph for holding potential better values > 1942: set tempGraph [struct::graph] > 1943: > 1944: foreach node [$TGraph nodes] { > 1945: $tempGraph node insert $node > 1946: } > 1947: > 1948: #zmienic nazwy zmiennych zeby sie nie mylily > 1949: foreach arc [$TGraph arcs] { > 1950: set _u [$TGraph arc source $arc] > 1951: set _v [$TGraph arc target $arc] > 1952: $tempGraph arc insert $_u $_v [list $_u $_v] > 1953: } I believe I have seen this graph initialization (1941-1953, copy of nodes, copy of structure with u/v arc naming convention) often enough that we can puts this into a separate procedure we can call at all the relevant places. It seems hat we need it for basically every operation > 1954: > 1955: if { [$tempGraph arc exists [list $u $v]] } { > 1956: $tempGraph arc delete [list $u $v] > 1957: } else { > 1958: $tempGraph arc delete [list $v $u] > 1959: } > 1960: > 1961: #for nodes u and v, we add a node between them > 1962: #to again start BFS with root in new node to check > 1963: #if it's possible to decrease the diameter in solution > 1964: set node [$tempGraph node insert] > 1965: $tempGraph arc insert $node $v [list $node $v] > 1966: $tempGraph arc insert $node $u [list $node $u] > 1967: > 1968: set tempGraph [BFS $tempGraph $node graph] > 1969: > 1970: $tempGraph node delete $node > 1971: $tempGraph arc insert $u $v [list $u $v] > 1972: $tempGraph arc setunweighted 1 > 1973: > 1974: set tempDiam [diameter $tempGraph ] > 1975: > 1976: #if better tree is found (that any that were already found) > 1977: #replace it > 1978: if { $min_diameter > $tempDiam } { > 1979: set $min_diameter [diameter $tempGraph ] > 1980: set best_Tree $tempGraph > 1981: } Hm. Here are some leaks ... If tempGraph does not become best_Tree it should be released, correct ? Further, if tempGraph does become best_Tree then the old best_Tree (if defined) should be released too. > 1982: } > 1983: > 1984: } > 1985: } > 1986: ################################################################ > 1987: > 1988: set currentTreeDiameter [ diameter $TGraph ] set currentTreeDiameter $diam See line 1912, before the humunguous if-block handling the even diameter., and the fact that this big if-block is not changing TGraph at all. > 1989: > 1990: if { $min_diameter > $currentTreeDiameter } { > 1991: set min_diameter $currentTreeDiameter > 1992: set best_Tree $TGraph See leak discussion above (1981). > 1993: } > 1994: The tree $T (line 1904) seems to leak as well. > 1995: } > 1996: > 1997: return $best_Tree > 1998: } > 1999: > 2000: # > 2001: proc ::struct::graph::op::MinimumDegreeSpanningTree {G} { > 2022: #main loop > 2023: foreach e [$G arcs] { > 2024: > 2025: set u [$G arc source $e] > 2026: set v [$G arc target $e] > 2027: > 2028: $MST arc setunweighted 1 This command is indendent of 'e', should be run before the loop. Around line 2021, after MST has gotten its arcs. > 2029: > 2030: #setting the path between nodes u and v in Spanning Tree MST > 2031: set path [dict get [dijkstra $MST $u] $v] > 2032: lappend path $v > 2033: > 2034: #if nodes u and v are neighbours, proceed to next iteration This condition is quicker checked with 'arc exists' for the two possible directions, see below. Doing so means that we can move the expensive dijkstra into the if-block, hopefully using it less. > 2035: if { [llength $path] > 2 } { if { !arc exists (u -> v) && !(arc exists v ->) } > 2036: > 2037: #search for the node in the path, such that its degree is greater than degree of any of nodes > 2038: #u or v increased by one > 2039: foreach node $path { > 2040: if { [$MST node degree $node] > [ expr { [Max [$MST node degree $u] [$MST node degree $v]] + 1 } ] } { Nested 'expr' superfluous, see also line 1918. Andreas. |