Re: [Tcllib-devel] ping gsoc graph work ...

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Andreas Kupries wrote:
> And for me it is now time to look at the max-cut code you committed yesterday.
> After a bit of regular work.

 > 572:	#Maximum Cut - 2 approximation algorithm
 > 573:	#----------------------------------------------------------------------
 > 574:	#
 > 575:	#
 > 576:	#
 > 577:	#
 > 578:	#
 > 579:	#Reference:
 > 580:	#
 > 581:	
 > 582:	proc ::struct::graph::op::MaxCut {G U V} {
 > 583:	
 > 584:		upvar $U _U
 > 585:		upvar $V _V
 > 586:		set nodeNumber [llength [$G nodes]]
 > 587:		set _U {}
 > 588:		set _V {}
 > 589:		set counter 0
 > 590:		
 > 591:		foreach v [lsort -dict [$G nodes]] {
 > 592:			if { [expr {$counter % 2}] eq 0 } {

	if {($counter % 2) == 0} {

 > 593:				lappend _U $v
 > 594:			} else {
 > 595:				lappend _V $v
 > 596:			}
 > 597:			incr counter
 > 598:		}

Actually, the whole block (lines 591-598) can be made simpler, by using an 
advanced form of the foreach command. We can iterate using multiple variables. 
Of the two possible forms we will want

	foreach {u v} [lsort -dict [$G nodes]] {
		lappend _U $u
		if {$v eq ""} continue
		lappend _V $v
	}

For the list {a b c d e} the variables u and v are assigned

	u=a, v=b
	u=c, v=d
	u=e, v=""

This should also explain the check done before extending the list _V. For a 
list of odd length the last round through the loop will leave the variable v empty.

 > 599:		
 > 600:		set val 1
 > 601:		set ALG 0
 > 602:		while {$val>0} {
 > 603:			set val [cut $G _U _V [countEdges $G $_U $_V]]
 > 604:			if { $val > $ALG } {
 > 605:				set ALG $val
 > 606:			}
 > 607:		}
 > 608:		return $ALG
 > 609:	}
 > 610:	
 > 611:	#procedure replaces nodes between sets and checks if that change is 
profitable
 > 612:	proc ::struct::graph::op::cut {G U V param} {
 > 613:		
 > 614:		upvar $U var1
 > 615:		upvar $V var2

I believe that it is more idiomatic to write

	proc ::struct::graph::op::cut {G Uvar Vvar param} {
		upvar $Uvar U
		upvar $Vvar V

I.e. to make explicit that the arguments are variable names, in their names, 
and to name the variables they are linked to in the scope after them. 'var1' 
and 'var2' are very non-explanatory names.

 > 616:		set _V {}
 > 617:		set _U {}
 > 618:		set value 0
 > 619:		
 > 620:		set maxValue $param

 > 621:		set _U [reset $var1]
 > 622:		set _V [reset $var2]

See my notes blow on 'reset', leaving mean to wonder why not just

		set _U $var1
		set _V $var2

because that is what is currently happening with the current definition of 'reset'.

 > 623:			
 > 624:		foreach v [$G nodes] {
 > 625:			
 > 626:			if { [lsearch $_U $v] <0 } {

As we are using Tcl 8.5 we can use the 'in' and 'ni' operators. The latter 
means 'not in'. They are applicable because we do not care about the exact 
position, only about the contained/!contained information.

	if { $v ni $_U } {	; # <=> v not in U ?

 > 627:				lappend _U $v
 > 628:				lremove _V $v
 > 629:				set value [countEdges $G $_U $_V]
 > 630:			} else {
 > 631:				lappend _V $v
 > 632:				lremove _U $v
 > 633:				set value [countEdges $G $_U $_V]
 > 634:			}
 > 635:			
 > 636:			if { $value > $maxValue } {
 > 637:				set var1 [reset $_U]
 > 638:				set var2 [reset $_V]
 > 639:				set maxValue $value
 > 640:			} else {
 > 641:				set _V [reset $var2]
 > 642:				set _U [reset $var1]
 > 643:			}
 > 644:		}

In the above, is it necessary to have a single loop ?

If not, we could maybe two loops, one over the nodes in V, the other over the 
nodes in U, obviating the need to perform the 'lsearch check'.

The one trouble I see with my idea is that the changes (moves of nodes to the 
other set) made by the first loop would affect the second one. Or, to avoid 
this, more book-keeping is needed to know which nodes moved and can be ignored 
by the second loop. So at second glance maybe better not, it doesn't feel 
anymore as if the code would be simpler. Same complexity I guess. Ah well.

Even so, some other thing to consider in the longer-term future would be an 
investigation into optimization of the countEdges calls in the loop. Because, 
as only one node N has moved, only its edges can affect the count (all outgoing 
edges of N which were to the same set are now to the other set (+), and all 
edges to the other set are now to the same set (-)). This would avoid a full 
recount. And also, actually allow us to compute the new value before moving the 
node at all.

 > 645:	
 > 646:		set value $maxValue
 > 647:		
 > 648:		if { $value > $param } {
 > 649:			return $value
 > 650:		} else {
 > 651:			return 0
 > 652:		}
 > 653:	}
 > 654:	#Removing element from the list - auxiliary procedure
 > 655:	proc ::struct::graph::op::lremove {listVariable value} {
 > 656:	    upvar 1 $listVariable var
 > 657:	    set idx [lsearch -exact $var $value]
 > 658:	    set var [lreplace $var $idx $idx]
 > 659:	}

Alternative to lremove is to add

	package require struct::list

and then use
	struct::list delete

Doesn't need to be done yet. If we have only one place where we need this it is 
more effective to stay with the local 'lremove' instead of pulling in another 
big package we use only a very small part of.

 > 660:	#Auxiliary procedure to set again proper values of two cut sets
 > 661:	proc ::struct::graph::op::reset {U} {
 > 662:		set value {}
 > 663:		foreach u $U {
 > 664:			lappend value $u
 > 665:		}
 > 666:		return $value
 > 667:	}

This whole procedure seems to be equivalent to

	proc ::struct::graph::op::reset {U} {
		return $U
	}

I see an iteration over the list U which is reconstructed in the variable 
'value' and then returned. So value == U, thus we can return it the argument 
directly. And at this point I have to ask if the procedure is needed at all.

 > 668:	#procedure counts edges that link two sets of nodes
 > 669:	proc ::struct::graph::op::countEdges {G U V} {
 > 670:		
 > 671:		set value 0
 > 672:		
 > 673:		foreach e [$G arcs] {
 > 674:			set v [$G arc source $e]
 > 675:			set u [$G arc target $e]
 > 676:			
 > 677:			if { [lsearch $U $v] >=0 && [lsearch $V $u] >=0 } {

See comments regarding line 626. Operations 'in' and ni'.

	if {($v in $U) && ($u in $V)} { ; # e is arc U -> V

 > 678:				incr value
 > 679:			}
 > 680:			if { [lsearch $U $u] >=0 && [lsearch $V $v] >=0 } {

	if {($u in $U) && ($v in $V)} { ; # e is arc V -> U

 > 681:				incr value
 > 682:			}
 > 683:		}
 > 684:		
 > 685:		return $value
 > 686:	}

Hm. Lets try an alternate ...

	foreach u $U {
		foreach e [$G arcs -out $u] {
			set v [$G arc target $e]
			if {$v ni $V} continue
			incr value
		}
	}
	foreach v $V {
		foreach e [$G arcs -out $v] {
			set u [$G arc target $e]
			if {$u ni $U} continue
			incr value
		}
	}

The main point to the alternate is that the iteration over U and V eliminates 
one lsearch/in/ni query, because we know that the node is in the set, and which ...

Time complexity of the original: O(#arcs * #nodes). The #nodes because of the 
lsearch, which is linear.
Time complexity of the alternate: O(#nodes * ?). The ? could be #arcs, as we go 
over all arcs, just scattered across the nodes. The time constants might be 
lower (because of the reduced number of lsearch/in ops).

Andreas