Re: [Tcllib-devel] FordFulkerson

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Micha? Antoniewski wrote:
> Hello.
>  
> After some fights with it, Ford-Fulkerson is finally working.... I think 
> the rest of flow problems will now go quicker.
>  
> Code is at SVN with new revision. It surely still needs some 
> optimizations and changes, but I'm going to add now more test cases, so 
> those optimizations should go smoothly now. And also I think it will be 
> easier to get familiar with whole problem, when code is available.

Updating now to rev 33 ...

> A little description:
>  
> We are given graph (weighted, directed) G, and nodes s and t, which are 
> source and sink for that graph.

Give the suggestive naming of these two nodes, do they have special 
requirements / constraints ?

Like, for example:
	"source must not have incoming arcs".
	"sink   must not have outgoing arcs".
?

> The goal is to find the maximum flow for a flow network -> the 
> max possible sum of weights for edges coming out from the source node s.
>  
> I arranged it like this:
> - input graph G : weights at edges are max possible throughputs for each 
> single edge
> - of course nodes s and t at input

> - there is dictionary f, which holds all currently used throughputs for 
> each edge ( e.g, for edge 'uv' which has maximum throughput 20, f will
> hold information, that 12 of this 20 is already used ).

Ok ... I see from the code that this is internal to the algorithm. From the 
description here I thought that it may have been an input.

I note that for each edge u->v the dictionary holds data for uv and vu both ?
Does this mean that the algorithm uses undirected edges, not directed arcs ?

Oh, the wikipedia entry tells me that the residual graph may allow a backward 
flow even if not allowed in the input graph. See also (**).

> - subprocedure createResidualGraph: it takes at input graph G ( we need 
> maximum throughputs for all particular edges) and current state of
> used throughputs f ( dictionary I meant line above)
> This subprocedure creates a residual graph used by FordFulkerson.
>  
> In each iteration of main loop, FordFulkerson checks, if there is any 
> augmenting path ( just any path from source to sink ).
> If there is one, it selects the maximum throughput for that path - the 
> minimum value of throughput ( weight of an edge ) considering
> all edges that are contained in the path.
> When maximum throughput for a path is found, we can update the 
> dictionary f ( we increase the value of throughput for each edge in the 
> path, by the value we just found).
>  
> When f is found, we can create new residual graph (createResidualGraph 
> proc) and proceed to next iteration -> check if for that graph, any 
> augmenting path does exists.
>  
>  
> Generally, algorithm looks easy, but there are some cases, I run into 
> during implementation, that surely must be included in test cases....
>  
> Here is the link that can be useful....there is a applet where it's 
> possible to run algorithm step by step for any graph:
> http://links.math.rpi.edu/applets/appindex/graphtheory.html

That is an interesting applet. May I propose that we do something like this 
after the project, as an example ?

... There seems to be a bug in there ... I used the pre-made graph 
"Ford-Fulkerson or Dijsktra" and used 'Ford-Fulkerson-Algorithm'. When I 
stepped through it, be it manually, or in the 'auto' mode the _last_ augmenting 
path they show contains the "weight 4 arc", but against its direction! That 
left me wondering.

(**) Ok, seems that this is possible, due to backward flow in the residual 
graph, per wikipedia on Ford-Fulkerson.

Time to look at the rev 33 code ...

	You are right, the set of tests is a bit sparse.

 > 1300:	#Ford's Fulkerson algorithm - computing maximum flow in a flow network
 > 1301: 
#-------------------------------------------------------------------------------------
 > 1302:	#
 > 1303:	#The general idea of algorithm is finding augumenting paths in graph 
G, as long
 > 1304:	#as they exist, and for each path updating the edge's weights along 
that path,
 > 1305:	#with maximum possible throughput. The final (maximum) flow is found
 > 1306:	#when there is no other augumenting path from source to sink.
 > 1307:	#
 > 1308:	#Input:
 > 1309:	#graph G - weighted and directed graph. Weights at edges are considered as
 > 1310:	#maximum throughputs that can be carried by that link (edge).

Regarding the weights the wikipedia reference tells us that the
algorithm may not converge for irrational weights ... Oh, ok, is not a
problem for an implementation on a computer. Our 'float/double' values
are always rational, due to finite precision.

 > 1311:	#s - the node that is a source for graph G
 > 1312:	#t - the node that is a sink for graph G
 > 1313:	#
 > 1314:	#Output:
 > 1315:	#Procedure returns the dictionary contaning throughputs for all edges. For
 > 1316:	#each key ( the edge between nodes u and v in the for of list u v ) 
there is
 > 1317:	#a value that is a throughput for that key. Edges where throughput values
 > 1318:	#are equal to 0 are not returned ( it is like there was no link in the 
flow network
 > 1319:	#between nodes connected by such edge).
 > 1320:	#
 > 1321:	#Reference: http://en.wikipedia.org/wiki/Ford-Fulkerson_algorithm
 > 1322:	
 > 1323:	proc ::struct::graph::op::FordFulkerson {G s t} {

Missing checks: Are s and t nodes of G ?
Also, should the source s have incoming arcs ?
Further, should sink t have outgoing arcs ?

 > 1324:	
 > 1325:		#initilization
 > 1326:		foreach e [$G arcs] {
 > 1327:			set u [$G arc source $e]
 > 1328:			set v [$G arc target $e]
 > 1329:			dict set f [list $u $v] 0
 > 1330:			dict set f [list $v $u] 0

Was first confused by this, looking as if the edges are undirected,
flow-wise, but wikipedia put me straight i.e. that the residual graph
may allow backward flows.

Should maybe add a comment here about this to clarify.

 > 1331:		}
 > 1332:		
 > 1333:		#setting the residual graph for the first iteration
 > 1334:		set residualG [createResidualGraph $G $f]
 > 1335:		
 > 1336:		#deleting the arcs that are 0-weighted
 > 1337:		foreach e [$residualG arcs] {
 > 1338:			if { [$residualG arc getweight $e] == 0 } {
 > 1339:				$residualG arc delete $e
 > 1340:			}
 > 1341:		}
 > 1342:		
 > 1343:		#the main loop - works till the path between source and the sink can 
be found
 > 1344:		while {[dict exists [struct::graph::op::dijkstra $residualG $s 
-arcmode directed] $t]} {

'struct::graph::op::dijkstra' can be shortened to 'dijkstra', we are in the 
right namespace.

 > 1345:		
 > 1346:			#setting the path from source to sink
 > 1347:			set path [lreverse [dict get [struct::graph::op::dijkstra $residualG 
$s -arcmode directed] $t]]

We are doing the dijsktra in the loop condition a second time ? Yes
... Quite a performance penalty. A C compiler could optimize this
automatically I guess, Tcl doesn't.

Here is an equivalent loop head which does dijkstra only once.

	      while {1} {
		    set paths [struct::graph::op::dijkstra $residualG $s -arcmode directed]

		    if {![dict exists $paths $t]} break

		    set path [lreverse [dict get $paths $t]]

 > 1348:		
 > 1349:			#adding sink to path
 > 1350:			lappend path $t
 > 1351:			
 > 1352:			#finding the throughput of path p - the smallest value of c(f) among
 > 1353:			#edges that are contained in the path
 > 1354:			set maxThroughput Inf
 > 1355:		
 > 1356:			for {set i 0} {$i < [llength $path]-1 } { incr i } {
 > 1357:				set u [lindex $path $i]
 > 1358:				set v [lindex $path [expr { $i+1 }]]

Here we can use foreach, in one of the advanced forms. The second advanced form 
actually ...

General syntax:
      foreach var1 list1 var2 list2 { ... }

Here specifically:
	     foreach \
		     u [lrange $path 0 end-1]
		     v [lrange $path 1 end] \
		     {

I extract shifted node lists which pairs up the adjacent nodes, like the 
lsiding window you programmed explicitly.

 > 1359:				if { $maxThroughput > [$residualG arc getweight [list $u $v]] } {
 > 1360:					set maxThroughput [$residualG arc getweight [list $u $v]]
 > 1361:				}
 > 1362:			}
 > 1363:		
 > 1364:			#increase of throughput using the path p, with value equal to 
maxThroughput	
 > 1365:			for {set i 0} {$i < [llength $path]-1 } { incr i } {
 > 1366:				set u [lindex $path $i]
 > 1367:				set v [lindex $path [expr { $i+1 }]]

Same way of iterating over the nodes as above.

 > 1368:				
 > 1369:				#if maximum throughput that was found for the path p 
(maxThroughput) is bigger than current throughput
 > 1370:				#at the edge not contained in the path p (for current pair of nodes 
u and v), then we add to the edge
 > 1371:				#which is contained into path p the maxThroughput value decreased 
by the value of throughput at
 > 1372:				#the second edge (not contained in path). That second edge's 
throughtput value is set to 0.
 > 1373:				
 > 1374:				if { $maxThroughput >= [dict get $f [list $v $u]] } {
 > 1375:					dict set f [list $u $v] [ expr { [dict get $f [list $u $v]] + 
$maxThroughput - [dict get $f [list $v $u]] } ]
 > 1376:					dict set f [list $v $u] 0
 > 1377:				} else {
 > 1378:				
 > 1379:				#if maxThroughput is not greater than current throughput at the 
edge not contained in path p (here - v->u),
 > 1380:				#we add a difference between those values to edge contained in the 
path p (here u->v) and substract that
 > 1381:				#difference from edge not contained in the path p.
 > 1382:					set difference [ expr { [dict get $f [list $v $u]] - 
$maxThroughput } ]
 > 1383:					dict set f [list $u $v] [ expr { [dict get $f [list $u $v]] + 
$difference } ]
 > 1384:					dict set f [list $v $u] [ expr { [dict get $f [list $v $u]] - 
$difference } ]
 > 1385:				}
 > 1386:			}
 > 1387:		
 > 1388:			#when the current throughput for the graph is updated, we generate 
new residual graph
 > 1389:			#for new values of throughput
 > 1390:			set residualG [createResidualGraph $G $f]
 > 1391:		
 > 1392:			foreach e [$residualG arcs] {
 > 1393:				if { [$residualG arc getweight $e] == 0 } {
 > 1394:					$residualG arc delete $e
 > 1395:				}
 > 1396:			}

I see that both times a residual graph is created we immediately drop
the zero-weight arcs.  This could be moved into the createResidual
procedure. On the other hand, the createResidual likely has a specific
meaning. So maybe not.

 > 1397:				
 > 1398:		}
 > 1399:		
 > 1400:		#removing 0-weighted edges from solution
 > 1401:		foreach e [dict keys $f] {
 > 1402:			if { [dict get $f $e] == 0 } {
 > 1403:				set f [dict remove $f $e]
 > 1404:			}
 > 1405:		}
 > 1406:		
 > 1407:		return $f
 > 1408:	}
 > 1409:	
 > 1410:	#subprocedure for FordFulkerson's algorithm, which creates
 > 1411:	#for input graph G and given throughput f residual graph
 > 1412:	#for further operations to find maximum flow in flow network
 > 1413:	proc ::struct::graph::op::createResidualGraph {G f} {
 > 1414:		
 > 1415:		#initialization
 > 1416:		set residualG [struct::graph]
 > 1417:		set Gf [struct::graph]

Reading the code below Gf seems to be an exact copy of G, just with
the arcs renamed to match the expectations for residualG, i.e. quick
adressing of the weights by node-pairs (u,v), is that correct ?

A less weighty structure would be an array or dictionary which holds
just this information.

   array set w {}
   set w([list $u $v]) [$G arc getweight $e]

Later 'info exists w([list $u $v])' for 'Gf arc exists'
and $w([list $u $v]) for 'Gf arc getweight'

Or equivalent dictionary.

Furthermore, this data can be computed once, in the caller
(i.e. FordFulkerson) as it doesn't seem to change.

 > 1418:		
 > 1419:		foreach v [$G nodes] {
 > 1420:			$residualG node insert $v
 > 1421:			$Gf node insert $v
 > 1422:		}
 > 1423:		
 > 1424:		foreach e [$G arcs] {
 > 1425:			set u [$G arc source $e]
 > 1426:			set v [$G arc target $e]
 > 1427:			$Gf arc insert $u $v [list $u $v]
 > 1428:			$Gf arc setweight [list $u $v] [$G arc getweight $e]
 > 1429:		}
 > 1430:		
 > 1431:		
 > 1432:		foreach e [$Gf arcs] {
 > 1433:			
 > 1434:			set u [$Gf arc source $e]
 > 1435:			set v [$Gf arc target $e]
 > 1436:			

		set uv [list $u $v]
		set vu [list $v $u]
		set we [$Gf arc getweight $e]

Now the myriad of [list $u $v] / [list $v $u] and other repetitive
commands can be replaced with nice short variable names. This is an
optimization we have to do on our own. The bytecode ocmpiler of Tcl is
not intelligent enough to pull out and coalesce constant expressions.

 > 1437:			if { ![$residualG arc exists [list $u $v]] } {
 > 1438:				$residualG arc insert $u $v [list $u $v]
 > 1439:			}
 > 1440:			
 > 1441:			if { ![$residualG arc exists [list $v $u]] } {
 > 1442:				$residualG arc insert $v $u [list $v $u]
 > 1443:			}
 > 1444:					
 > 1445:			#new value of c_f(u,v) for residual Graph is a max flow value for 
this edge
 > 1446:			#minus current flow on that edge
 > 1447:			if { ![$residualG arc hasweight [list $u $v]] } {
 > 1448:				if { [$Gf arc exists [list $v $u]] } {
 > 1449:					$residualG arc setweight [list $u $v] [ expr { [$Gf arc getweight 
$e] - [dict get $f [list $u $v]] + [dict get $f [list $v $u]] } ]
 > 1450:				} else {
 > 1451:					$residualG arc setweight [list $u $v] [ expr { [$Gf arc getweight 
$e] - [dict get $f [list $u $v]] } ]
 > 1452:				}
 > 1453:			}
 > 1454:			
 > 1455:			
 > 1456:			if { [$Gf arc exists [list $v $u]] } {
 > 1457:				#when double arcs in graph G (u->v , v->u)
 > 1458:				#so, x/y i w/z    y-x+w
 > 1459:				if { ![$residualG arc hasweight [list $v $u]] } {
 > 1460:					$residualG arc setweight [list $v $u] [ expr { [$Gf arc getweight 
[list $v $u]] - [dict get $f [list $v $u]] + [dict get $f [list $u $v]]} ]
 > 1461:				}
 > 1462:			} else {
 > 1463:					$residualG arc setweight [list $v $u] [ expr { [dict get $f [list 
$u $v]]} ]
 > 1464:			}
 > 1465:			
 > 1466:		}	
 > 1467:		
 > 1468:		$Gf destroy
 > 1469:		return $residualG
 > 1470:	}

Andreas.