From: Andreas K. <and...@ac...> - 2009-06-18 18:46:10
|
Michał Antoniewski wrote: Administrivia: I reworked the timeline at the bottom of http://wiki.tcl.tk/23203 into a table, and added a status column where we can track what is done, what is being worked on, etc. I have filled in the data for the previous weeks, as far as I knew it. Please check my edits for inaccuracies. We also have to put the graph coloring algorithms we wish to implement instead of the matrix multiplication method into the timeline. > Update 1 (Revision 12): New version of Adjacency List. > > What's new: > - changed the implementation considering your regards - now undirected > case is simpler > - lsearch isn't used now --> lsort > - and other lesser changes. > > You were of course right about implementation, now it looks much > simpler. I would not say 'of course'. I can be wrong as well. I will have a look either this afternoon, or tomorrow. > However, now I've got one concern about weighted version. > Suppose we've got a graph G with edges from node A to B and B to A, but > with different wages. > For that case, there is a question: how the AdjacencyList should look > when using -weights option: > 1) should it place for each node two occurances of an edge with > different weight > 2) trying to pick one edge looks like bad solution.... > 3) graphs such as G are considered as wrong. To me (1) looks like the right solution. What does adjacency list look look like if the we have such edges, with identical weight ? Similarly, what does the result look like for parallel edges A -> B of some and/or different weight ? > My call would be first solution and current version works like this: > - if we got edges edgeAB and edgeBA between nodes A and B, with weights > successively X and Y, for node A there will exist {{B X} {B Y}} and for > node B {{A X} {A Y}} in their solutions. When X = Y, it's considered as > a single edge. Ok. > Pros: it prevents from cutting the set of possible graphs for procedure. > Cons: - a bit unusual look of AdjacencyList (due to but option -weights > is supplementary, so maybe it's not bad. > - inaccuracy: if for different weights we've got both arcs, shouldn't > there be also both arcs in solution for cases with same weights? It might be more consistent, yes. I.e. when we have -weights, represent all edges without special cases. > So the main question is, do we make a restriction for procedure > AdjacencyList to work only with simple graphs ( graphs without loops and > doubled edges) - which apparently in my opinion isn't so bad, as most > graphs considered are simple graphs. > Or we consider those double edges, which isn't a problem as well. I would prefer to handle loops and (anti-)parallel edges. I am erring on the side of having more information in the result rather than less. An algorithm using adjacency information and needing/wanting less can then reduce as it sees fit. Whereas if it needs information and doesn't have it it is stuck. Andreas. |
From: Andreas K. <and...@ac...> - 2009-06-18 18:46:15
|
Michał Antoniewski wrote: > Update 2: Floyd-Warshall Algorithm. > > Implementation + tests. > > Possible extension can be enabling option to use Floyd-Warshall's as > negative-cycle finder. Now it finds those cycles occurances and throws > an error - the same as previous algorithms. Will look into this either in the afternoon, or tomorrow. Updated the wiki timeline info. > Next updates are TSP algorithms. > > > Best Regards, > Michał Antoniewski |
From: Andreas K. <and...@ac...> - 2009-06-19 18:34:34
|
Andreas Kupries wrote: > Michał Antoniewski wrote: >> Update 2: Floyd-Warshall Algorithm. >> >> Implementation + tests. >> >> Possible extension can be enabling option to use Floyd-Warshall's as >> negative-cycle finder. Now it finds those cycles occurances and throws >> an error - the same as previous algorithms. > > Will look into this either in the afternoon, or tomorrow. > Updated the wiki timeline info. The code and tests look good. Andreas. |
From: Andreas K. <and...@ac...> - 2009-06-19 20:55:02
|
Andreas Kupries wrote: > Now, I noticed that you changed the tests for adjlist a bit. It seems that the > results did not change in their essence, but only in the order nodes were > sorted in the various (sub)lists, right ? > > In other words the changed results are in the same equivalence class as the > previous ones. There is no significant difference between > > {n1 {n2 n3} n2 n1 n3 n1} > and {n2 n1 n1 {n3 n2} n3 n1}, > > just order, which is irrelevant for our dictionaries. > > This is another issue often seen with test suites. That we check results for > exact identity, causing the rejection of perfectly correct results because they > are not exactly identical to whatever the previous algorithm created as its result. > > > This was particular seen in test suites, here in Tcllib, when the Tcl core went > from 8.4 to 8.5. The hash table implementation of the core changed in that > transition, causing the results of 'array get' and 'array names' to change, > i.e. elements were returned in a different order. This tripped all the > testsuites which were not prepared to handle the equivalence class of correct > results and tested exactly instead. > > The solution we implemented was to 'lsort' and 'dictsort' (*) the results of > the tested commands, projecting all results in an equivalence class to a > canonical form for that class. This canonical form can then be tested for exactly. > > For the two adjlist algorithms you implemented it was simply details in the > implementation which changed the order. > > I do not believe that we have to update the testsuite right now to take this > possibility into account, however please be aware this in the future. And as a very concrete example I ran the testsuite of modules/struct, with the critcl implementation of struct::graph active (*), here is an excerpt: ==== graphop-gcritcl-scritcl-stcritcl-qcritcl-BellmanFord-1.9 BellmanFord, some weights set to 0 FAILED ==== Contents of test case: SETUP_PARTIALLYZEROWEIGHTED set result [struct::graph::op::BellmanFord mygraph node1] mygraph destroy set result ---- Result was: node4 1 node3 0 node2 0 node1 0 ---- Result should have been (exact matching): node2 0 node3 0 node4 1 node1 0 ==== graphop-gcritcl-scritcl-stcritcl-qcritcl-BellmanFord-1.9 FAILED The result is correct, it simply presents the nodes in a different order. For this simple data structure (dictionary) our projection to canonical can be done by the helper 'dictsort', i.e. by changing set result [struct::graph::op::BellmanFord mygraph node1] to set result [dictsort [struct::graph::op::BellmanFord mygraph node1]] and changing the expected result to {node1 0 node2 0 node3 0 node4 1} i.e. sorting it. ========================================================== NOTE: This is _not_ a request to fix the testsuite now. I rather prefer to catch up on our timeline a bit This is polish, and can be done at/near the end. ========================================================== ~~~ (*) It is an implementation of struct::graph in C, and part of the differences to the Tcl implementation is a the changed order of nodes/arcs returned by [G nodes], [G arcs], etc. Andreas |
From: KATO K. <k.k...@gm...> - 2009-06-20 02:20:54
|
If you have a hard time to describe/manipulate graph data, do you use the huddle library? I use this for the yaml library :) Ciao! |
From: Andreas K. <and...@ac...> - 2009-06-22 16:45:34
|
KATO Kanryu wrote: > If you have a hard time to describe/manipulate graph data, > do you use the huddle library? > > I use this for the yaml library :) I am not sure what you are thinking of here. If this is about the serialization of a graph structure for storeage and/or transfer, then huddle certainly a possibility, among others, like various XML notations. However, we are here talking about the manipulation of graphs in memory, to compute things like distances between nodes, and "standard" representations, like Adjacency matrices and lists. I do not see how huddle could of use for that. It is, as far as I understand it, a helper for serialization. Andreas |
From: KATO K. <k.k...@gm...> - 2009-06-22 20:36:56
|
Hi, > However, we are here talking about the manipulation of graphs in memory, > to compute things like distances between nodes, > and "standard" representations, like Adjacency matrices and lists. huddle can contain the data has plural different structures, and can support for additional data structures e.g. graphs and tree. (However there is needed to improbe for self-recursive data structure) If we need to talk about this more deeply, I have to read codes for the graph libarary... |
From: Andreas K. <and...@ac...> - 2009-06-22 18:21:40
|
> Ok. > > Next update will be on Sunday - both TSP algorithms, implementation + > tests. Ok. Although I am a bit behind on my mail here. Looking at the time this mail missed me by a few minutes on Friday. And I am not reading office mail from the home system (*). Can I entice you to send your mails not only to me, but Tcllib Developers <tcl...@li...> as well ? That will reach me at home too, as I subscribed both office and personal email adresses to the list. > Considering approximation algorithms there are some inconvenient cases > about testing. For example, when I was trying to perform tests showing > case with Max Approximation factor, it strongly depends of what > algorithms ( which approximation algorithms use in their implementation > ), are going to return, e.g. in 2-approximation algorithm for TSP - when > we create input graph that is tight example, it will give the worst > solution, if the spanning tree found by for example Prim Algorithm will > have a proper look. Hence, for graphs I tested it found sligthly better > solutions. Can you explain this in shorter sentences ? I tried to read and understand it several times and keep getting confused. Likely the problem is on my side, it is Monday morning, I did not sleep that well, my brain is buggered. > Of course those Tight Example graphs, should and will be taken into > account in documentation. > > About finding spanning tree - should there be option where user can > specify, if he wants use kruskal or prim to find minimum spanning tree? So the approximation algorithm is finding a spanning tree as part of its work ? > It would make tests more resistant to changes, for example changing > spanning tree algorithm in implementation won't make all tests crush. My understanding here is that by making the spanning tree algorithm STA used by the approx algorithm AA a configurable option of AA you can use both STAs we have in the tests of AA and thus make sure that the results are not influenced by the choice of STA ? > 2-approximation algorithm is done. Christofides algorithm uses in it's > implementation Max Matching algorithm so I'm trying to simulate it for > now, as Max Matching implementation is placed in last week of the > timeline. A bit of chiding here. A reason that the mentors ask the students for the timeline of their project is to have them think about and become aware of such dependencies. > I was thinking about implementing, only for now, greedy Max > Matching but it doesn't return optimal solution, so it will decrease > good approximation of Christofides ( 3/2 ). Another option would be to move Christofides back in the timeline after the Max-Matching and then move some other algorithm to the free spot. I see from the most recent mail (I got today) that you have chosen to do Christofides, and faking the results you need from max-matching. Certainly also a way forward, with the understanding that Christofides needs re-checking when we have the actual max-matching implementation in Week 9+ Greedy max-matching ... Is my understanding correct that this would find a matching, but not necessarily the maximal one ? Or not the best maximal per the weights ? ~~~ (Ad *) I could now with our IMAP system, if I were to install Thunderbird at home. Except that at home I do wish to be away from work. Note here that the GSoC mentoring does not count as work, that is something I can do at home as well, and during the weekend. Andreas. |
From: Andreas K. <and...@ac...> - 2009-06-22 18:22:03
|
Michał Antoniewski wrote: > Hello. > > I'm writing to clear some matters about Christofides Algorithm: Ok. Note, please update the algorithm status info in the timeline at http://wiki.tcl.tk/23203 > I implemented it with a empty space for MaxMatching use. It still needs > some refactoring but I want to clear some things first. Ok, I will have a look. > Generally, TSP is problem concerning complete graphs, but there is a way > of handling other graphs by adding edges with very big weights Can these big weights be 'Inf'inity ? Or do they have to be tailired to the existing weights in some way ? > and then > working on complete graph - those high costing added edges won't be > taken under consideration in finding optimal tour. Would you like to > handle those graphs at the beginning of procedure or just assume we get > complete graph at input? > Now it is working, assuming that graph is complete but I can add this > feature - I think it is good idea. Having the code handling non-complete graphs automatically is likely a major convenience for users of the command. > Christofides implementation: > It uses some graphs: firstly it creates minimum spanning tree graph > (TGraph), next is graph for MAx MAtching created by nodes with odd > degree from TGraph (oddTGraph). Then oddTGraph is changed to have only > edges we need to add to TGraph, and in next step we do M + T, so TGraph > gets new edges created by MaxMatching to assure its eulerian. The rest > goes like before. > > As max matching is not availible yet I'm testing it by placing correct > values returned by max-matching, and checking if all code paths work > good. I think for now good idea would be to implement some greedy > max-matching, which can be helpful to place some test code (Graphs for > tests are ready). How complicated is the max-matching ? Which is an indirect way of asking, how much effort would have to be put into implementing it ? > > Both 2-approximation algorithm and Christofides uses some similar steps > but I thought it's not worth to divide them into procedures - this way > it looks more clear and intuitive IMHO. I see. Can't comment on that yet, will do after I had my look through the new code. > > Next update - MAX-CUT. I want to catch up with timeline, but I'm still > working on those TSP too. Andreas. |
From: Andreas K. <and...@ac...> - 2009-06-22 19:55:08
Attachments:
graphops.tcl.numbered
|
Andreas Kupries wrote: > Michał Antoniewski wrote: >> Hello. Comments between the lines of code > 433: #Metric Travelling Salesman Problem (TSP) - 2 approximation algorithm > 434: #------------------------------------------------------------------------------------- > 435: #Travelling salesman problem is a very popular problem in graph theory, where > 436: #we are trying to find minimal Hamilton cycle in weighted complete graph. In other words: > 437: #given a list of cities (nodes) and their pairwise distances (edges), the task is to find > 438: #a shortest possible tour that visits each city exactly once. > 439: #TSP problem is NP-Complete, so there is no efficient algorithm to solve it. Greedy methods > 440: #are getting extremely slow, with the increase in the set of nodes. > 441: # > 442: #For this algorithm we consider a case when for given graph G, the triangle inequality is > 443: #satisfied. So for example, for any three nodes A, B and C the distance between A and C must > 444: #be at most the distance from A to B plus the distance from B to C. What's important > 445: #most of the considered cases in TSP problem will satisfy this condition. > 446: # > 447: #Input: weighted and complete graph G > 448: #Output: approximated solution of minimum Hamilton Cycle - closed path visiting all nodes, > 449: #each exactly one time. > 450: # > 451: #Reference: http://en.wikipedia.org/wiki/Travelling_salesman_problem > 452: # > 453: > 454: proc ::struct::graph::op::MetricTravellingSalesman { G } { > 455: > 456: #checking if all weights are set > 457: VerifyWeightsAreOk $G > 458: > 459: #create minimum spanning tree for graph G > 460: set T [struct::graph::op::prim $G] > 461: #graph algorithm is working on - spanning tree of graph G > 462: set TGraph [struct::graph] > 463: > 464: #fill TGgraph with nodes > 465: foreach n [$G nodes] { > 466: $TGraph node insert > 467: } > 468: > 469: #fill TGraph with arcs > 470: foreach e $T { > 471: set v [$G arc source $e] > 472: set u [$G arc target $e] > 473: $TGraph arc insert $u $v $u$v > 474: $TGraph arc insert $v $u $v$u > 475: $TGraph arc setweight $u$v [$G arc getweight $e] > 476: $TGraph arc setweight $v$u [$G arc getweight $e] > 477: } This block of lines, 461-477, looks to me as something which should be put into its own command. I see that Christofides uses a variant of this (lines 503-517, without 505), just putting in the arcs instead of duplicating them. That can be folded into a single helper, and the duplication of arcs controlled by a single boolean argument. > 478: > 479: set tourvar 0 Why is the variable initialized to a number, when it becomes a list after the call ? > 480: isEulerian? $TGraph tourvar > 481: lappend result [$TGraph arc source [lindex $tourvar 0]] > 482: > 483: foreach i $tourvar { > 484: set u [$TGraph arc target $i] > 485: > 486: if { [lsearch $result $u] < 0 } { > 487: lappend result $u > 488: set hh [$TGraph arc getweight $i] Where is the value 'hh' used ? I do not see anything. If it is not used we can eliminate this I would say. On the other hand, if it should have been used, then something else is missing. > 489: } > 490: } > 491: lappend result [$TGraph arc source [lindex $tourvar 0]] > 492: return $result > 493: > 494: } My understanding of this algorithm is that a min-spanning tree is made, its arcs are then duplicated, and the euler tour through that graph is then reduced into a hamilton cycle by ignoring every node already put into the result. And the approximation comes from the use of the minimal spanning tree. Missing: Deletion of the temporary graph TGraph. This is a memory leak. > 496: proc ::struct::graph::op::Christofides { G } { > 497: #checking if all weights are set > 498: VerifyWeightsAreOk $G > 499: > 500: #create minimum spanning tree for graph G > 501: set T [prim $G] > 502: > 503: #graph algorithm is working on - spanning tree of graph G > 504: set TGraph [struct::graph] > 506: #fill TGgraph with nodes > 507: foreach n [$G nodes] { > 508: $TGraph node insert > 509: } > 510: #fill TGraph with arcs > 511: foreach e $T { > 512: set v [$G arc source $e] > 513: set u [$G arc target $e] > 514: $TGraph arc insert $u $v $u$v > 515: $TGraph arc setweight $u$v [$G arc getweight $e] > 516: } > 517: See my comments above regarding lines 461-477. > 505: set oddTGraph [struct::graph] I moved this out of the block setting up TGraph to make the separation of the two graphs clearer and also that the block can be factored into a helper for clarity. > 518: ################# > 519: > 520: set oddNodes {} > 521: foreach v [$TGraph nodes] { > 522: if { [ expr {[$TGraph node degree $v] % 2} ] eq 1 } { Superfluous nested use of expr, and bogus use of string equality operator 'eq' over regular (numeric) equality operator '=='. Strings => 'eq' Numbers => '==' > 523: $oddTGraph node insert $v > 524: } > 525: } > 526: > 527: #create complete graph > 528: foreach v [$oddTGraph nodes] { > 529: foreach u [$oddTGraph nodes] { > 530: if { ![$oddTGraph arc exists $u$v] & ![expr {$u eq $v}] } { Superfluous nested 'expr' command. Just use (...). Another problem, maybe: This queries oddTGraph for the existence of arcs, but at this point oddTGraph contains only nodes (added by lines 521-524). This means that the first part of the condition is always true and superfluous, right ? Or should this condition have queried the TGraph ? > 531: $oddTGraph arc insert $v $u $v$u > 532: } > 533: } > 534: } > 535: #### > 536: # MAX MATCHING HERE!!! > 537: #### > 538: set M {} > 539: > 540: foreach e [$oddTGraph arcs] { > 541: if { [lsearch $M $e] < 0 } { Possible alternative: ![struct::set contains $M $e] (The struct::set commands may have hashtable underneath, making the check faster). > 542: $oddTGraph arc delete $e > 543: } > 544: } > 545: > 546: #operation: M + T > 547: foreach e [$oddTGraph arcs] { > 548: set v [$oddTGraph arc source $e] > 549: set u [$oddTGraph arc target $e] > 550: $TGraph arc insert $u $v $u$v > 551: } > 552: > 553: ################# > 554: > 555: set tourvar 0 > 556: isEulerian? $TGraph tourvar > 557: > 558: lappend result [$TGraph arc source [lindex $tourvar 0]] > 559: > 560: foreach i $tourvar { > 561: set u [$TGraph arc target $i] > 562: > 563: if { [lsearch $result $u] < 0 } { > 564: lappend result $u > 565: } > 566: } > 567: > 568: return $result > 569: > 570: } Missing: Deletion of the temporary graph TGraph. This is a memory leak. Ditto missing deletion of oddTGraph. Also, identical to the end of MetricTravelingSalesMan (MTS), worthy of putting into a common helper command for both. And Christofides becomes a bit clearer to me. Start with minimal spanning tree, then create a max matching over the nodes of odd degree for side-ways connections, instead of duplicating each arc in the tree. From this we then construct euler tour and hamilton cycle like in the 'MetricTravelingSalesMan'. (Yes, you mentioned this in your mail, however that was a bit low-level a description and I had not seen the code yet). >> I'm writing to clear some matters about Christofides Algorithm: > > Ok. Note, please update the algorithm status info in the timeline at > http://wiki.tcl.tk/23203 > >> I implemented it with a empty space for MaxMatching use. It still needs >> some refactoring but I want to clear some things first. I agree that the last part of MTS and Christofides, i.e. construction of euler tour and hamilton cycle can be put into a helper command common to both. >> >> Both 2-approximation algorithm and Christofides uses some similar steps >> but I thought it's not worth to divide them into procedures - this way >> it looks more clear and intuitive IMHO. > > I see. Can't comment on that yet, will do after I had my look through the new code. After seeing the code I believe the opposite to be true. And some of the operations, like M+T, look like they could be of very general use, and maybe something fro th graph structure itself. We can postpone that however. This also ties a bit in Lars's message about possible functional groups, which I still haven't answered. >> >> Next update - MAX-CUT. I want to catch up with timeline, but I'm still >> working on those TSP too. Andreas. |
From: Michał A. <ant...@gm...> - 2009-06-23 16:26:36
|
This block of lines, 461-477, looks to me as something which should be put > into its own command. I see that Christofides uses a variant of this (lines > 503-517, without 505), just putting in the arcs instead of duplicating them. > That can be folded into a single helper, and the duplication of arcs > controlled by a single boolean argument. Done. > 478: > 479: set tourvar 0 >Why is the variable initialized to a number, when it becomes a list after the call ? When I tested this implementation, sometimes occured that graph wasn't Eulerian, so this was my way to prevent it from crushing when variable tourvar is not assigned. As I mentioned before this code needed still some refactoring, now line 479 isn't needed, forgot to delete it. > 480: isEulerian? $TGraph tourvar > 481: lappend result [$TGraph arc source [lindex $tourvar 0]] > 482: > 483: foreach i $tourvar { > 484: set u [$TGraph arc target $i] > 485: > 486: if { [lsearch $result $u] < 0 } { > 487: lappend result $u > 488: set hh [$TGraph arc getweight $i] >> Where is the value 'hh' used ? I do not see anything. If it is not used >> we can eliminate this I would say. On the other hand, if it should have >> been used, then something else is missing. The same as previous one :) Corrected. > 489: } > 490: } > 491: lappend result [$TGraph arc source [lindex $tourvar 0]] > 492: return $result > 493: > 494: } >>My understanding of this algorithm is that a min-spanning tree is made, its arcs are then duplicated, and the euler tour >>through that graph is then reduced into a hamilton cycle by ignoring every node already put into the result. >>And the approximation comes from the use of the minimal spanning tree. Correct. Duplicating arcs gives us confidence that graph is Eulerian. Finding Eulerian tour gives us sequence of visiting nodes in Hamilton cycle. >>Missing: Deletion of the temporary graph TGraph. This is a memory leak. Corrected. > 496: proc ::struct::graph::op::Christofides { G } { >>See my comments above regarding lines 461-477. Corrected. Other code problems that were noted, corrected as well. > 523: $oddTGraph node insert $v > 524: } > 525: } > 526: > 527: #create complete graph > 528: foreach v [$oddTGraph nodes] { > 529: foreach u [$oddTGraph nodes] { > 530: if { ![$oddTGraph arc exists $u$v] & ![expr {$u eq $v}] } { >>nother problem, maybe: This queries oddTGraph for the existence of arcs, but at this point oddTGraph contains only nodes >>(added by lines 521-524). This means that the first part of the condition is always true and superfluous, right ? Or should this >>condition have queried the TGraph ? This procedure adds only edges, and when there is an edge between pair of nodes we don't want another - that's why line 530 looks like this. So, before adding (e.g. for node X and Y) we are checking if this edge were added before ( for Y and X ). Loops are also unwelcome. >>And Christofides becomes a bit clearer to me. Start with minimal spanning tree, then create a max matching over the nodes >>of odd degree for side-ways connections, instead of duplicating each arc in the tree. From this we then construct euler tour >>and hamilton cycle like in the 'MetricTravelingSalesMan'. Right. So summarizing: - added new commands for both TSP algorithms: findHamiltonCycle and createTGraph. - correcting code regardning remarks that were mentioned Now I'm going to add adding edges to handle uncomplete graphs. As a consequence those graphs can have repeated nodes in the return value ( the same nodes can occur more than once ). Best Regards, Michał Antoniewski |
From: Lars H. <Lar...@re...> - 2009-06-27 13:08:01
|
Michał Antoniewski skrev: > Hello. > > I updated some tests for Max-Cut ( testing subprocedures seperately ). > > And now TSP. > > After some rethinking I came up with such solution to handle those > uncomplete graphs: > - first step filling graph with edges to have a complete graph. At the same > time saving set of original Edges. Weights on new edges are set to Inf. [ In > practise algorithm is avoiding them anyway, so maybe it's even not needed to > take that step ]. As Andreas has already remarked, this will typically violate the triangle inequality, thus rendering the TSP instance non-metric. But the proof that converting a tour to a cycle by taking shortcuts doesn't increase the combined weight requires the problem to be metric, so you'd be in trouble here. [snip] > - what I described assumes that we accept doubled occurances of nodes in > solution. Going back to definition of the problem - it says that there > shouldn't be such situations ( TSP problem is finding shortest Hamilton > Cycle, with only one occurance of each vertex ). So, now it's time to > decide, what to do: give that solution I describe above, or in such case > return error that it's not Hamilton's graph ( don't know if I wrote it well, > but I mean graph without proper Hamilton cycle ). A graph with a Hamilton cycle is said to be /Hamiltonian/, so I suppose the term you were looking for here is "non-Hamiltonian". "Hamilton's graph" _might_ be taken as a reference to the dodekahedron, because of the icosian game: http://en.wikipedia.org/wiki/Icosian_game Lars Hellström |
From: Lars H. <Lar...@re...> - 2009-06-23 10:10:09
|
Andreas Kupries skrev: > Michał Antoniewski wrote: >> Generally, TSP is problem concerning complete graphs, but there is a way >> of handling other graphs by adding edges with very big weights > > Can these big weights be 'Inf'inity ? Or do they have to be tailired to the > existing weights in some way ? One thing this would depend on is whether the route found is required to be a cycle, or can visit vertices more than once. Finding a Hamiltonian cycle in the complete graph is easy, finding one in a sparser graph is not so easy. For metric TSP (i.e., the triangle inequality holds), you're not allowed to exceed the weight of the shortest path between the endpoints anyway. There's also the issue that the size of (number of bits needed to encode) the weights sometimes becomes the deciding factor for the complexity of graph algorithms, but that mostly figures in arguments that goes: Well you've constructed an example where this algorithm appears to have a higher complexity than we claimed it did, but that's just because you've used gigantic weights, and if you do your math carefully you'll see that the basic bitsize of your example grows much faster than the number of edges in it. Probably not so relevant for TSP, but (if I recall correctly) important for some classical max-flow algorithms. 8-)} > Greedy max-matching ... Is my understanding correct that this would find a > matching, but not necessarily the maximal one ? Or not the best maximal per the > weights ? Note the following fine distinction: MaximUM matching: One that is at least as great (in e.g. size or total weight) as any other matching. MaximAL matching: One that is not a proper subset of any other matching, and thus maximal w.r.t. set inclusion in the set of all matchings. A greedy algorithm would probably return a maximal matching, with no guarantee of it being maximum. (The "no guarantee" part is not so surprising; checking whether a given matching is maximum is easily seen to be a problem in co-NP.) Lars Hellström |
From: Andreas K. <and...@ac...> - 2009-06-23 16:52:30
|
Lars Hellström wrote: > Andreas Kupries skrev: >> Michał Antoniewski wrote: >>> Generally, TSP is problem concerning complete graphs, but there is a >>> way of handling other graphs by adding edges with very big weights >> >> Can these big weights be 'Inf'inity ? Or do they have to be tailired >> to the existing weights in some way ? > > One thing this would depend on is whether the route found is required to > be a cycle, or can visit vertices more than once. Finding a Hamiltonian > cycle in the complete graph is easy, finding one in a sparser graph is > not so easy. From my look at the two algorithms it seems to create a cycle where vertices are visited once. >> Greedy max-matching ... Is my understanding correct that this would >> find a matching, but not necessarily the maximal one ? Or not the best >> maximal per the weights ? > > Note the following fine distinction: > > MaximUM matching: One that is at least as great (in e.g. size or total > weight) as any other matching. > > MaximAL matching: One that is not a proper subset of any other matching, > and thus maximal w.r.t. set inclusion in the set of all matchings. > > A greedy algorithm would probably return a maximal matching, with no > guarantee of it being maximum. (The "no guarantee" part is not so > surprising; checking whether a given matching is maximum is easily seen > to be a problem in co-NP.) Thanks for the clarification. Andreas. |
From: Michał A. <ant...@gm...> - 2009-06-23 14:18:32
|
> Ok. Note, please update the algorithm status info in the timeline at >> http://wiki.tcl.tk/23203 > > Updated. >>Ok. Although I am a bit behind on my mail here. Looking at the time this mail missed me by a few minutes on Friday. And I >>am not reading office mail from the home system (*). Can I entice you to send your mails not only to me, but >> Tcllib Developers <tcl...@li...> >>as well ? That will reach me at home too, as I subscribed both office and personal email adresses to the list. No problem:) I didn't write at your home address, cause I didn't want to bother you during weekend too...:) >>A bit of chiding here. A reason that the mentors ask the students for the timeline of their project is to have them think about >>and become aware of such dependencies. I know it's not a good sequence that TSP using MaxMatching is before MaxMatching itself, but I strongly wanted to have MaxMatching at the end, cause as far as I know it's implementation can be troublesome, so when I have everything other done and there is only Max Matching it would easier to focus and it won't cause any delays in timeline, as I estimated it could when done earlier. Maybe, it will turn out that implementation will go smoothly, without any problems and those "safety measures" were not required ( I hope:) ). About TSP: >>One thing this would depend on is whether the route found is required to be a cycle, or can visit vertices more than once. I would stick to definition of the problem, which says that we are looking for path (with minimal cost) which visits each node exactly once, and ends in the starting node. One more way to handle MaxMatching case can be implementing greedy MaxMatching, for graphs with even number of nodes. Then, it's possible to write test cases, forcing test graphs to have even number of nodes in their minimal spanning tree found in the first step of the algorithm. That way, we can test the TSP procedure and only add some tests when proper MaxMatching is done ( tight examples for Christofides also have even number of nodes in min spanning tree). Regarding your notes about both TSP algorithms, I'm going to divide them into some auxiliary procedures. Best Regards, Michał Antoniewski |
From: Andreas K. <and...@ac...> - 2009-06-23 16:52:32
|
Michał Antoniewski wrote: > > Ok. Note, please update the algorithm status info in the > timeline at http://wiki.tcl.tk/23203 > Updated. Thank you. > >>Ok. Although I am a bit behind on my mail here. Looking at the time > this mail missed me by a few minutes on Friday. And I >>am not reading > office mail from the home system (*). Can I entice you to send your > mails not only to me, but > >> Tcllib Developers <tcl...@li... > <mailto:tcl...@li...>> > >>as well ? That will reach me at home too, as I subscribed both office > and personal email adresses to the list. > > No problem:) I didn't write at your home address, cause I didn't want to > bother you during weekend too...:) That is no trouble. Either I am outside walking some park, without net connection (= not bothered at all), or I am inside reading, and then I can reading about graphs as well. > >>A bit of chiding here. A reason that the mentors ask the students for > the timeline of their project is to have them think about >>and become > aware of such dependencies. > > I know it's not a good sequence that TSP using MaxMatching is before > MaxMatching itself, but I strongly wanted to have MaxMatching at the > end, cause as far as I know it's implementation can be troublesome, so > when I have everything other done and there is only Max Matching it > would easier to focus and it won't cause any delays in timeline, as I > estimated it could when done earlier. Maybe, it will turn out that > implementation will go smoothly, without any problems and those "safety > measures" were not required ( I hope:) ). I see. Sort of what I often do ... Get the small stuff/fry out of the way first, working up to the more complex pieces. So it was intentional, not an oversight. For future projects to note, I believe that it is a good idea to explicitly document such decisions to avoid questions later. > About TSP: > > >>One thing this would depend on is whether the route found is required > to be a cycle, or can visit vertices more than once. > > I would stick to definition of the problem, which says that we are > looking for path (with minimal cost) which visits each node exactly > once, and ends in the starting node. > > > One more way to handle MaxMatching case can be implementing greedy > MaxMatching, for graphs with even number of nodes. > Then, it's possible to write test cases, forcing test graphs to have > even number of nodes in their minimal spanning tree found in the first > step of the algorithm. That way, we can test the TSP procedure and only > add some tests when proper MaxMatching is done ( tight examples for > Christofides also have even number of nodes in min spanning tree). > > Regarding your notes about both TSP algorithms, I'm going to divide them > into some auxiliary procedures. Looking forward to it. Andreas. |
From: Andreas K. <and...@ac...> - 2009-06-23 16:52:38
|
Michał Antoniewski wrote: > > > > > 478: > > 479: set tourvar 0 > > >Why is the variable initialized to a number, when it becomes a list > after the call ? > > When I tested this implementation, sometimes occured that graph wasn't > Eulerian, so this was my way to prevent it from crushing when variable to crush to make something very compact by putting pressure on it from all sides. like - to crush a car into a block of metal. to crash to abort the execution of the application in non-standard manner. in the context of cars, having a accident. like to crash the car into a railing, wall, or other obstacle. > tourvar is not assigned. As I mentioned before this code needed still > some refactoring, now line 479 isn't needed, forgot to delete it. Ah, I see. > > 523: $oddTGraph node insert $v > > 524: } > > 525: } > > 526: > > 527: #create complete graph > > 528: foreach v [$oddTGraph nodes] { > > 529: foreach u [$oddTGraph nodes] { > > 530: if { ![$oddTGraph arc exists $u$v] & > ![expr {$u eq $v}] } { > > >>nother problem, maybe: This queries oddTGraph for the existence of > arcs, but at this point oddTGraph contains only nodes >>(added by lines > 521-524). This means that the first part of the condition is always true > and superfluous, right ? Or should this >>condition have queried the > TGraph ? > > This procedure adds only edges, and when there is an edge between pair > of nodes we don't want another - that's why line 530 looks like this. > So, before adding (e.g. for node X and Y) we are checking if this edge > were added before ( for Y and X ). Loops are also unwelcome. Ah. Right, of course. We start with nodes and no arcs, and any arc added is an arc the condition can trigger later on. Not much thinking on my side when looking at this, apparently. This also explains why you are naming the arcs, i.e. the $u$v, this allows you to quickly check without additional data structures, right ? I was wondering, but decided it wasn't important enough to ask. Now that I am focused on this I do however see one possible problem with the naming. Consider two arcs X and Y, the first between the nodes 'a' and 'bc' and the second between 'ab' and 'c. X => $u$v = 'a' + 'bc' = 'abc' Y => $u$v = 'ab' + 'c' = 'abc' The condition will wrongly prevent the addition of Y. It believes that ythe arcs are parallel while they are not. This is like the index into an array or dictionary, where you had to define the key as set key [list $u $v] to have enough structure to prevent such accidental identities. It was for the Bellman-Ford/Johnson algorithm's where we talked about that before, if I remember correctly. > Now I'm going to add adding edges to handle uncomplete graphs. As a > consequence those graphs can have repeated nodes in the return value ( > the same nodes can occur more than once ). Ok. Good luck. And for me it is now time to look at the max-cut code you committed yesterday. After a bit of regular work. Andreas. |
From: Andreas K. <and...@ac...> - 2009-06-23 18:00:48
|
Andreas Kupries wrote: > And for me it is now time to look at the max-cut code you committed yesterday. > After a bit of regular work. > 572: #Maximum Cut - 2 approximation algorithm > 573: #---------------------------------------------------------------------- > 574: # > 575: # > 576: # > 577: # > 578: # > 579: #Reference: > 580: # > 581: > 582: proc ::struct::graph::op::MaxCut {G U V} { > 583: > 584: upvar $U _U > 585: upvar $V _V > 586: set nodeNumber [llength [$G nodes]] > 587: set _U {} > 588: set _V {} > 589: set counter 0 > 590: > 591: foreach v [lsort -dict [$G nodes]] { > 592: if { [expr {$counter % 2}] eq 0 } { if {($counter % 2) == 0} { > 593: lappend _U $v > 594: } else { > 595: lappend _V $v > 596: } > 597: incr counter > 598: } Actually, the whole block (lines 591-598) can be made simpler, by using an advanced form of the foreach command. We can iterate using multiple variables. Of the two possible forms we will want foreach {u v} [lsort -dict [$G nodes]] { lappend _U $u if {$v eq ""} continue lappend _V $v } For the list {a b c d e} the variables u and v are assigned u=a, v=b u=c, v=d u=e, v="" This should also explain the check done before extending the list _V. For a list of odd length the last round through the loop will leave the variable v empty. > 599: > 600: set val 1 > 601: set ALG 0 > 602: while {$val>0} { > 603: set val [cut $G _U _V [countEdges $G $_U $_V]] > 604: if { $val > $ALG } { > 605: set ALG $val > 606: } > 607: } > 608: return $ALG > 609: } > 610: > 611: #procedure replaces nodes between sets and checks if that change is profitable > 612: proc ::struct::graph::op::cut {G U V param} { > 613: > 614: upvar $U var1 > 615: upvar $V var2 I believe that it is more idiomatic to write proc ::struct::graph::op::cut {G Uvar Vvar param} { upvar $Uvar U upvar $Vvar V I.e. to make explicit that the arguments are variable names, in their names, and to name the variables they are linked to in the scope after them. 'var1' and 'var2' are very non-explanatory names. > 616: set _V {} > 617: set _U {} > 618: set value 0 > 619: > 620: set maxValue $param > 621: set _U [reset $var1] > 622: set _V [reset $var2] See my notes blow on 'reset', leaving mean to wonder why not just set _U $var1 set _V $var2 because that is what is currently happening with the current definition of 'reset'. > 623: > 624: foreach v [$G nodes] { > 625: > 626: if { [lsearch $_U $v] <0 } { As we are using Tcl 8.5 we can use the 'in' and 'ni' operators. The latter means 'not in'. They are applicable because we do not care about the exact position, only about the contained/!contained information. if { $v ni $_U } { ; # <=> v not in U ? > 627: lappend _U $v > 628: lremove _V $v > 629: set value [countEdges $G $_U $_V] > 630: } else { > 631: lappend _V $v > 632: lremove _U $v > 633: set value [countEdges $G $_U $_V] > 634: } > 635: > 636: if { $value > $maxValue } { > 637: set var1 [reset $_U] > 638: set var2 [reset $_V] > 639: set maxValue $value > 640: } else { > 641: set _V [reset $var2] > 642: set _U [reset $var1] > 643: } > 644: } In the above, is it necessary to have a single loop ? If not, we could maybe two loops, one over the nodes in V, the other over the nodes in U, obviating the need to perform the 'lsearch check'. The one trouble I see with my idea is that the changes (moves of nodes to the other set) made by the first loop would affect the second one. Or, to avoid this, more book-keeping is needed to know which nodes moved and can be ignored by the second loop. So at second glance maybe better not, it doesn't feel anymore as if the code would be simpler. Same complexity I guess. Ah well. Even so, some other thing to consider in the longer-term future would be an investigation into optimization of the countEdges calls in the loop. Because, as only one node N has moved, only its edges can affect the count (all outgoing edges of N which were to the same set are now to the other set (+), and all edges to the other set are now to the same set (-)). This would avoid a full recount. And also, actually allow us to compute the new value before moving the node at all. > 645: > 646: set value $maxValue > 647: > 648: if { $value > $param } { > 649: return $value > 650: } else { > 651: return 0 > 652: } > 653: } > 654: #Removing element from the list - auxiliary procedure > 655: proc ::struct::graph::op::lremove {listVariable value} { > 656: upvar 1 $listVariable var > 657: set idx [lsearch -exact $var $value] > 658: set var [lreplace $var $idx $idx] > 659: } Alternative to lremove is to add package require struct::list and then use struct::list delete Doesn't need to be done yet. If we have only one place where we need this it is more effective to stay with the local 'lremove' instead of pulling in another big package we use only a very small part of. > 660: #Auxiliary procedure to set again proper values of two cut sets > 661: proc ::struct::graph::op::reset {U} { > 662: set value {} > 663: foreach u $U { > 664: lappend value $u > 665: } > 666: return $value > 667: } This whole procedure seems to be equivalent to proc ::struct::graph::op::reset {U} { return $U } I see an iteration over the list U which is reconstructed in the variable 'value' and then returned. So value == U, thus we can return it the argument directly. And at this point I have to ask if the procedure is needed at all. > 668: #procedure counts edges that link two sets of nodes > 669: proc ::struct::graph::op::countEdges {G U V} { > 670: > 671: set value 0 > 672: > 673: foreach e [$G arcs] { > 674: set v [$G arc source $e] > 675: set u [$G arc target $e] > 676: > 677: if { [lsearch $U $v] >=0 && [lsearch $V $u] >=0 } { See comments regarding line 626. Operations 'in' and ni'. if {($v in $U) && ($u in $V)} { ; # e is arc U -> V > 678: incr value > 679: } > 680: if { [lsearch $U $u] >=0 && [lsearch $V $v] >=0 } { if {($u in $U) && ($v in $V)} { ; # e is arc V -> U > 681: incr value > 682: } > 683: } > 684: > 685: return $value > 686: } Hm. Lets try an alternate ... foreach u $U { foreach e [$G arcs -out $u] { set v [$G arc target $e] if {$v ni $V} continue incr value } } foreach v $V { foreach e [$G arcs -out $v] { set u [$G arc target $e] if {$u ni $U} continue incr value } } The main point to the alternate is that the iteration over U and V eliminates one lsearch/in/ni query, because we know that the node is in the set, and which ... Time complexity of the original: O(#arcs * #nodes). The #nodes because of the lsearch, which is linear. Time complexity of the alternate: O(#nodes * ?). The ? could be #arcs, as we go over all arcs, just scattered across the nodes. The time constants might be lower (because of the reduced number of lsearch/in ops). Andreas |
From: Andreas K. <and...@ac...> - 2009-06-26 18:04:43
|
Andreas Kupries wrote: > > 812: proc ::struct::graph::op::createTwoSquaredGraph {G} { > > 813: > > 814: set H [struct::graph] > > 815: set copyG [struct::graph] > > 816: > > 817: foreach v [$G nodes] { > > 818: $H node insert $v > > 819: $copyG node insert $v > > 820: } > > 821: > > 822: foreach e [$G arcs] { > > 823: set v [$G arc source $e] > > 824: set u [$G arc target $e] > > 825: > > 826: $copyG arc insert $v $u [list $v $u] > > 827: $copyG arc setweight [list $v $u] 1 > > 828: } > > 829: > > 830: foreach v [$copyG nodes] { > > 831: foreach u [$copyG nodes] { > > 832: if { [distance $copyG $v $u] <= 2 && ![$H arc exists [list $u $v]] > && ($u ne $v)} { I thought about this a bit more, on the way home yesterday, and sleeping over it. Not to be meant as criticism, the code here feels like a direct translation of the mathematical definition of G**2 graphs. It basically works by generating all pairs of nodes and then testing the distance between them, where the distance code is particular inefficient by using a dijkstra which computes the data for all nodes, and then throws 99% of it away. We can do it better ... In essence, generate the node pairs in such a way that a distance check is not required, because the way the generation works ensures that the distance <= 2 condition holds true. For this we have to unravel the definition of distance a bit, i.e. what does distance <= 2 mean ? We can analyze this easier by splitting it into 2 cases, i.e. distance == 1, and distance == 2. So, distance (u,v)== 1 means ? That node u can be reached from v in one hop <=> u and v are connected by an edge in G (a) <=> u is a "neighbour of v in G". (b) (X) In other words, the G**@ graph has all edges of the original graph, and we can generate them either through translating (a) or (b) into code, i.e. (Ad a) for all e edge in G, with end nodes u, v add edge (u,v) to G**2 (Ad b) for all nodes u in G for all nodes v neighbour of u in G add edge (u,v) to G**2, if u != v Of these two fragments I prefer (b), for reasons explained when I am dealing with distance 2. (Ad X) As the input graph is considered unidirectional its neighbourhood is all adjacent nodes, whether conected by incoming or outgoing arcs. Now, distance(u,v) == 2 means ? That node u can be reached from v in two hops <=> It exists a node z in G connected to both u and v with edges in G. (c) <=> u is a neighbour of a neighbour of v in G, but not a neighbour of v, nor v. (d) Definition (c) doesn't translate that well to code, but (d) does. (Ad d) for all nodes u in G for all nodes z neighbour of u in G for all nodes v neighbour of z in G add edge (u,v) to G**2 if u != v and no edge (u,v) existing already. The if condition at the end takes care of the "but not ..." terms in (d). This also looks very similar to the code for (b), indeed the two fragements can be merged into one fragment which takes of both at the same time. Which is why I prefered fragment (b) here. what we are in essence doing is to execute the two steps of dijkstra needed to find the nodes at distances 1 and 2 for all nodes, but no more. As this directly generates the nodes at the distance we are interested in the distance test becomes irrelevant. It is always true, by construction. While it doesn't look like the mathematical definition any longer, and as such needs more comments to make the relation to it clear, it should also be more efficient than the naive code. Now, looking at the overall algorithm,, as described in the other mail, i.e one level higher, about the series of G(i) of G(i)**2 we are apparently searching from G(1), G(2), ... on up one important question to ask is Do we have to compute all the G(i), G(i)**2 from the ground up ? Complementary phrasing: Can we generate the G(i), G(i)**2 incrementally from the previous graph G(i-1), G(i-1)**2 ? In our case the answer is yes, we can do this incrementally. Assuming G(i-1), G(i-1)**2 exist, then G(i) = G(i-1) + edge Ei. G(i)**2 = (G(i-1)**2 + Ei + additional edges. These additional edges can be computed from Ei! Let Ei = (u,v). Then the nodes z in neighbours(u,G)-v are distance 2 to v and z in neighbours(v,G)-u are distance 2 to u Easy to ocmpute and add. And the induction is started with G(0) == G(0)**2 == The graph containing just the nodes of the input graph, and no edges. The upshot of this is, we need only create one graph X in the code, which we initialize to contain G(0)**2, and during the loop iterating over our sorted edges we then extend this graph in each round using the above rules with Ei and distance-2 edges to contain G(i)**2. Instead of createTwoSquaredGraph a better helper procedure will be extendTwoSquaredGraph Andreas |
From: Michał A. <ant...@gm...> - 2009-06-23 16:54:42
|
Just to avoid confusion: >>About TSP: >>One thing this would depend on is whether the route found is required to be a cycle, or can visit vertices more than once. >>I would stick to definition of the problem, which says that we are looking for path (with minimal cost) which visits each node >>exactly once, and ends in the starting node. I meant it considering complete graph, on which algorithm is working. For graphs that are not complete it's possible that algorithm will pick an edge that was not in that graph before ( generally, it shouldn't ... that's why we add huge weights on additional edges ) and there will be a jump from node to node with unexisting arc. That's why at the end those jumps have to be changed into shortest paths between those nodes using existing edges ( total cost can't be worser -> metric ). In that case we have some nodes occur more than once in the final solution. Best Regards, Michał Antoniewski |
From: Andreas K. <and...@ac...> - 2009-06-23 19:30:15
Attachments:
graphops.tcl.numbered
|
Review of changes to the TSP code in revision 19 ... > 462: set TGraph [struct::graph] > 463: > 464: #filling graph TGraph with edges and nodes > 465: set TGraph [createTGraph $G $T 0] We seem to create TGraph twice here. ... Yes, createTGraph also creates the graph. So we still have a leaked graph object, created on line 462. I guess that line 462 can be deleted, was forgotten. > 488: #graph algorithm is working on - spanning tree of graph G > 489: set TGraph [struct::graph] > 490: set TGraph [createTGraph $G $T 1] See above, about lines 462, 465. TGraph created twice, object of line 489 leaked. > 501: #create complete graph > 502: foreach v [$oddTGraph nodes] { > 503: foreach u [$oddTGraph nodes] { > 504: if { ![$oddTGraph arc exists $u$v] & !($u eq $v) } { Ah, I missed something in my first review. I didn't see the ! operator, I guess due to the nested expr nearby. The complement to the eq'ual operator is 'ne' for 'not equal. if { ![$oddTGraph arc exists $u$v] & ($u ne $v) } { Also a reminder about the $u$v problem, nodes a/bc vs ab/c giving us identical arc names. > 534: #Finds Hamilton cycle in given graph G > 535: #Procedure used by Metric TSP Algorithms: > 536: #Christofides and Metric TSP 2-approximation algorithm > 537: > 538: proc ::struct::graph::op::findHamiltonCycle {G} { > 539: isEulerian? $G tourvar > 540: > 541: lappend result [$G arc source [lindex $tourvar 0]] > 542: > 543: foreach i $tourvar { > 544: set u [$G arc target $i] > 545: > 546: if { [lsearch $result $u] < 0 } { 'ni' operator. > 547: lappend result $u > 548: } > 549: } > 550: lappend result [$G arc source [lindex $tourvar 0]] > 551: return $result > 552: } > 553: > 554: #Creating graph from sets of given nodes and edges > 555: #In option param we decide if we want edges to be > 556: #duplicated or not > 557: #0 - duplicated (Metric TSP 2-approximation algorithm) > 558: #1 - single (Christofides Alogrithm) > 559: > 560: proc ::struct::graph::op::createTGraph {G Edges param} { Naming. param => 'doublearcs' or similar. > 561: set TGraph [graph] Does that (graph) work? Shouldn't it be 'struct::graph' ? Because we are in the struct::graph::op namespace, which doesn't have a struct::graph::op::graph command. Nor do we have a ::graph command. Well, we should not. We might have one in the testsuite, masking the problem. Andreas. |
From: Michał A. <ant...@gm...> - 2009-06-24 16:28:08
|
> > > Actually, the whole block (lines 591-598) can be made simpler, by using an > advanced form of the foreach command. We can iterate using multiple > variables. Of the two possible forms we will want > > foreach {u v} [lsort -dict [$G nodes]] { > lappend _U $u > if {$v eq ""} continue > lappend _V $v > } > > Oh, thanks. Corrected. Now it looks simpler indeed. > 611: #procedure replaces nodes between sets and checks if that change is profitable > 612: proc ::struct::graph::op::cut {G U V param} { > 613: > 614: upvar $U var1 > 615: upvar $V var2 I believe that it is more idiomatic to write proc ::struct::graph::op::cut {G Uvar Vvar param} { upvar $Uvar U upvar $Vvar V I.e. to make explicit that the arguments are variable names, in their names, and to name the variables they are linked to in the scope after them. 'var1' and 'var2' are very non-explanatory names. Corrected. Sometimes, I tend to come up with a lot of names, which at that moment I recon as not well suiting and then I choose something as bad as "param", which says nothing to reader :) > 623: > 624: foreach v [$G nodes] { > 625: > 626: if { [lsearch $_U $v] <0 } { As we are using Tcl 8.5 we can use the 'in' and 'ni' operators. The latter means 'not in'. They are applicable because we do not care about the exact position, only about the contained/!contained information. if { $v ni $_U } { ; # <=> v not in U ? Ok. Corrected for all cases. >>Even so, some other thing to consider in the longer-term future would be an investigation into optimization of the countEdges >>calls in the loop. Because, as only one node N has moved, only its edges can affect the count (all outgoing edges of N which >>were to the same set are now to the other set (+), and all edges to the other set are now to the same set (-)). This would >>avoid a full recount. And also, actually allow us to compute the new value before moving the node at all. I think good idea, would be storing all optimizations and other tasks planned somewhere in the future, in a document. I'm going to create such doc and add to SVN, so going back to those tasks will be easier. > 654: #Removing element from the list - auxiliary procedure > 655: proc ::struct::graph::op::lremove {listVariable value} { > 656: upvar 1 $listVariable var > 657: set idx [lsearch -exact $var $value] > 658: set var [lreplace $var $idx $idx] > 659: } >>Alternative to lremove is to add >> package require struct::list >>and then use >> struct::list delete >>Doesn't need to be done yet. If we have only one place where we need this it is more effective to stay with the local 'lremove' >>instead of pulling in another big package we use only a very small part of. Okey, so I am not changing it for now. > 660: #Auxiliary procedure to set again proper values of two cut sets > 661: proc ::struct::graph::op::reset {U} { > 662: set value {} > 663: foreach u $U { > 664: lappend value $u > 665: } > 666: return $value > 667: } >>I see an iteration over the list U which is reconstructed in the variable 'value' and then returned. So value == U, thus we can >>return it the argument directly. And at this point I have to ask if the procedure is needed at all. Actually, before this procedure was doing more stuff... After some cuts, now it's useless, so good point:) I'm erasing it, didn't notice it before.... if {($v in $U) && ($u in $V)} { ; # e is arc U -> V > 678: incr value > 679: } > 680: if { [lsearch $U $u] >=0 && [lsearch $V $v] >=0 } { if {($u in $U) && ($v in $V)} { ; # e is arc V -> U > 681: incr value > 682: } > 683: } > 684: > 685: return $value > 686: } >>Hm. Lets try an alternate ... foreach u $U { foreach e [$G arcs -out $u] { set v [$G arc target $e] if {$v ni $V} continue incr value } } foreach v $V { foreach e [$G arcs -out $v] { > > set u [$G arc target $e] > if {$u ni $U} continue > incr value > } > } > > The main point to the alternate is that the iteration over U and V > eliminates one lsearch/in/ni query, because we know that the node is in the > set, and which ... > > Time complexity of the original: O(#arcs * #nodes). The #nodes because of > the lsearch, which is linear. > Time complexity of the alternate: O(#nodes * ?). The ? could be #arcs, as > we go over all arcs, just scattered across the nodes. The time constants > might be lower (because of the reduced number of lsearch/in ops). > > Right. Corrected. About tests, I created some tests showing reaching max approximation factor, and also graphs which show when the algorithm makes mistakes (for example same graph but forced to choose different sets U, V at start). TSP algorithms corrected too, regarding your remarks. > 561: set TGraph [graph] >>Does that (graph) work? Shouldn't it be 'struct::graph' ? I changed back to struct::graph just to be sure, but when I was running tests everything worked well... Next updates: improved TSP, and some more tests for Max-Cut. I'm starting to work on k-center problems too. Best Regards, Michał Antoniewski |
From: Andreas K. <and...@ac...> - 2009-06-24 19:48:45
|
Michał Antoniewski wrote: > > Actually, the whole block (lines 591-598) can be made simpler, by > using an advanced form of the foreach command. We can iterate using > multiple variables. Of the two possible forms we will want > > foreach {u v} [lsort -dict [$G nodes]] { > lappend _U $u > if {$v eq ""} continue > lappend _V $v > } > > Oh, thanks. Corrected. Now it looks simpler indeed. You are welcome. > > 623: > > 624: foreach v [$G nodes] { > > 625: > > 626: if { [lsearch $_U $v] <0 } { > > As we are using Tcl 8.5 we can use the 'in' and 'ni' operators. The > latter means 'not in'. They are applicable because we do not care about > the exact position, only about the contained/!contained information. > > if { $v ni $_U } { ; # <=> v not in U ? > > Ok. Corrected for all cases. I wonder if I should try to compare in/ni (lists) struct::set contains (sets) and explicit list + array combination, or only an array They are all about checking if a list/set contains some element, but with different tradeoffs regarding time complexity O(.), code complexity, readbility, code size, memory use ... Which one to use is partly dependent on the size of lists we expect, but also if we are treating one thing sometimes as set, sometimes as list (which affects internal shimmering), and then comes in the judgment calls on whether to 'package require' some supporting package or not. Yes, I think so, I should. WEll, not immediately. This could become lengthy, like the essay I wrote regarding testing. A general overview not only on what options we have available for a reasonable small task, but also to show how to evaluate the differences ... Now I am reminded of RosettaCode, a nice site for exmaple tasks implemented by lots of languages for comparison. Tcl's page is http://www.rosettacode.org/wiki/Category:Tcl And http://www.rosettacode.org/wiki/Tasks_not_implemented_in_Tcl shows that Tcl managed to implement all 316 tasks. > > 660: #Auxiliary procedure to set again proper values of two cut sets > > 661: proc ::struct::graph::op::reset {U} { > > 662: set value {} > > 663: foreach u $U { > > 664: lappend value $u > > 665: } > > 666: return $value > > 667: } > > >>I see an iteration over the list U which is reconstructed in the > variable 'value' and then returned. So value == U, thus we can >>return > it the argument directly. And at this point I have to ask if the > procedure is needed at all. > > Actually, before this procedure was doing more stuff... After some cuts, > now it's useless, so good point:) I'm erasing it, didn't notice it > before.... Good to have reviews then ... More eyes ... > About tests, I created some tests showing reaching max approximation > factor, and also graphs which show when the algorithm makes mistakes > (for example same graph but forced to choose different sets U, V at start). These tests are maxcut-...-1.0/1.1 ? Yes ... > TSP algorithms corrected too, regarding your remarks. > > > 561: set TGraph [graph] > >>Does that (graph) work? Shouldn't it be 'struct::graph' ? > > I changed back to struct::graph just to be sure, but when I was running > tests everything worked well... Hm. I'll see what I can find out about this. > Next updates: improved TSP, and some more tests for Max-Cut. > I'm starting to work on k-center problems too. Ok. Revision 22 reviewed. Andreas. |
From: Michał A. <ant...@gm...> - 2009-07-01 23:16:41
|
Vertices Cover algorithm is availiable - implementation + tests. After implementing standard version, I tested it and it reached max approximation factor unlikely often. So, I came up with a bit of improvement and now it finds far better solutions. Generally, algorithm works by picking an edge, adding source and target nodes to solution and erasing those two nodes ( so, all edges adjecent to them also disappear ). My idea, was to sort at the beginning pairs of nodes (only those between which there is an edge ) with degrees and then do the algorithm steps. This gives nice boost because, at first we add those nodes to solution, that when erased the most of the edges will disappear. I extended also procedure sortEdges into sortGraph, which now takes graph and sort option at input, and returns sorted dictionary ( edges and weights or nodes and degrees, I think can be useful ). Maybe, it will be good idea to put the switch there, and more option parameters can appear to make this general sorting procedure... Just wondering....:) Now I remembered that you've written your implementation for sorting too, in one of previous mails..... I'm gonna review it tomorrow and do the possible change. Moreover, I think it could also be useful to implement procedure "graphName node degrees" (in the same fashion the "graphName arc weights" is constructed). If you're interested in it, we can add it e.g. at the end of timeline ( when it turns out that at the end, there will be other tasks to do, I can do it later, if you like... ). Going back to Vertices Cover, there is also other way to implement it (with the same approximation factor) using Greedy Max Matching (which is implemented already). But I think that version will be less suitable for other improvements that can be possibly done ( like that degree improvement ). Tomorrow, I hope full K-Center and starting with flow algorithms. Best Regards, Michal Antoniewski |
From: Andreas K. <and...@ac...> - 2009-06-24 20:04:46
|
Andreas Kupries wrote: >> > 561: set TGraph [graph] >> >>Does that (graph) work? Shouldn't it be 'struct::graph' ? >> >> I changed back to struct::graph just to be sure, but when I was running >> tests everything worked well... > > Hm. I'll see what I can find out about this. In the testsuite, we run the file XOpsControl which sources the file XSetup which runs namespace import -force struct::graph as its first command. This command makes the command ::struct::graph available as ::graph The test environment masked the bug. As reminder for myself to get rid of this https://sourceforge.net/tracker/?func=detail&aid=2811747&group_id=12883&atid=112883 Andreas. |