From: Andreas K. <and...@ac...> - 2009-05-25 16:13:34
|
Michał Antoniewski wrote: > Hello. > > Today, I started my work and I've got some questions. Ok. I will answer to the best of my abilities. I may explicitly draw in other people and their expertise, depending on the context. As of now I will copy our conversation to tcllib-devel, allowing the community to chime in as well. > As it is in > timeline I started with Bellman-Ford and Johnson's. > > 1. What about infinity values... In my implementation of Bellman Fords I > just added string "inf", as a value of infinity ( in initialization we > assign value of infinity on each distance we are searching for). Is it > okay, or you've got other way to assign those values? Let me check something ... I believe that in Tcl 8.5 we have standard strings for the infinities already. Matching these would be a good thing, even if we are working with Tcl 8.4 ... % tclsh % info patchlevel 8.5.7 % expr log(0) -Inf Yes. The standard strings are "Inf" and "-Inf". Kevin, do you have anything to add here, as one of the guys knowing the math stuff in the Tcl core better than most ? > > 2. Documentation. Should I just extend those html files in tcllib ( > graphops.html ) or write my documantation? As we talked it over before - > documentation will be made at the end. Looking over the Tcllib sources I have here I do not see any file named "graphops.html". Can you tell me where you found it ? The documentation format for Tcllib modules and packages is 'doctools'. The packages needed to handle this format are in Tcllib itself. All the files ending in '.man' in Tcllib are in this format. The 'sak' tool has some commands to validate this format, and to convert from doctools to HTML, nroff, etc. > 3. Generally, I'm implementing algorithms as normal procedures. Then > after writing tests and checking if those procedures work well, they > should be placed in tcllib ( just like hierarchy at SVN says). Do you > accept this approach? Let me have a look at what you committed so far and then I will come back to you on that. In other words, I would like to defer an answer on this one until I had a look at the code and have a seen a more concrete example of what you are talking about here. > 4. Tests ( with tcltest package ). I'm going to construct some special > cases and build some graphs, where especially algorithm could fail. For > approximation algorithms, surely will be needed Tight Examples ( cases > where algorithm gets it's maximum approximation factor ). If you can > give me some remarks about tests I would appreciate. I will definitely try. This one is also a point where I have to think a bit more before writing, thus not an immediate answer in this mail. Andreas. |
From: Andreas K. <and...@ac...> - 2009-05-25 19:54:34
|
Michał Antoniewski wrote: > Hello. > 3. Generally, I'm implementing algorithms as normal procedures. Then > after writing tests and checking if those procedures work well, they > should be placed in tcllib ( just like hierarchy at SVN says). Do you > accept this approach? Having seen the code now I believe I begin to understand what you are doing and asking here. In essence you are writing each algorithm X as a set of global procedures in a file X.tcl, including basic test cases at the bottom of file. Then, when you are satisfied with the implementation you plan to transfer the set of procedures to the struct/graphops.tcl file, and rename them so that they are in the proper namespace. The test code you further plan to transform into a set of regular test cases which are then put into a .test file under struct/graph/tests directory. Is this understanding of your planning correct ? The question you should ask yourself here is What are the advantages, if any, of this approach over the alternative, i.e. of working directly within the struct/graphops.tcl file, and the tests files under struct/graph/tests ? Conversely, what are the disadvantages of this approach ? One disadvantage, in my opinion, is that you are doing something in two steps (code, and transfer) where one step would be sufficient (code). This is a disadvantage in my eyes because the transfer step is simply one more place where bugs can be introduced. Does that make sense to you ? Andreas. |
From: Andreas K. <and...@ac...> - 2009-05-27 21:28:33
|
Michał Antoniewski wrote: > Hello. > > At first, I would like to thank you for your responses. They helped me a > lot and made a lot of things clear. > I will send new files at SVN tomorrow (28 May), containing updated > Tcllib version and some test cases. Ok. > (0) Main Plan of the work > > You understood it correctly. And of course you're right. I've also > wanted to go with implementing code directly to graphops.tcl file and > test files but for first week I decided to go with that plan. > > In fact, I'm now working like you said - straight on tcllib files. > Bellman-Ford and Johnson's implementations are now in graphops.tcl ( and > they work ). > > Thanks a lot for README.developer - it was very helpful. This file is actually in Tcllib, in the toplevel directory, although the CVS head is still a bit older than what I gave you. > More about that matter in (5). > (1) Johnson's - return value. > > Like you said, setting [list $node1 $node2] was solution to this > problem. I've already corrected it. > > I've got one more concern about returning values in both : Johnson's and > Bellman-Ford's. Thanks for info about Infinity value, I changed it. But > what about situation when there is no connection between two nodes? > There are two options: > A) return Inf value for each pair of nodes between which path doesn't exist. > B) return only pairs of nodes (or in case of Bellman-Ford nodes) between > which path does exist > > Latter rule (B) cuts the return value only to connections in graph that > we are interested in (only existing ones). On the other hand > ::struct::graph::op::dijkstra is made according to A rule. What solution > do you prefer? Hm ... I would prefer to stay consistent with 'dijkstra', that way the three commands are (roughly) interchangeable from the API point of view. So, (A). The other form, i.e. (B), can be generated from that, by dropping all pairs with distance 'Inf'. Should we run into scaling trouble (i.e. large graphs), we can still switch to (B). > (2) Bug with GraphTransformation proc > > It is now corrected. Procedure now returns node that was added, so > it can be deleted from the graph when it is needed ( without a risk that > node names will overlap ). Ok, so you are going for modification of the input in-place, with reversal of the changes at the end, so the input looks to be unchanged by the algorithm. That works too. > I arranged it as auxiliary-procedure in graph::op ( it is not contained > in namespace ). Hm. That I will have to see, I believe, before I comment. > As I've noticed that in graph::op, there aren't many procedures like > this, and it shouldn't be useful also for other algorithms, maybe > dividing Johnsons procedure isn't good idea. I think I'm going to place > GraphTransformation into Johnson's proc - this will be more intuitive > and algorithm will be in one proc, which we can analyze step by step. It depends on the complexity of the actions a bit. Even if the code is not used by anything but Johnson's it can make sense to place the commands into (a) procedure(s), to make/keep the main function (more) readable/short. Otherwise complex setup commands may distract from the main algorithm. This is a bit of a judgment call, and may depend on personal aesthetics. > (3) TclChecker > > Thanks for that remark - it is now corrected. Moreover, > http://wiki.tcl.tk/10225 was very interesting. Our Wiki is a wonderful and invaluable resource in general, IMHO. > About TclDevKit license - of course I'm interested in it. Thank you very > much for that - I'm sure it will be very helpful. Ok. The Tcl Dev Kit can be retrieved at http://www.activestate.com/tcl_dev_kit/ (Use the 'Try It' link). and I will start the process for the license here. > (4) Tcl 8.5 - integration > > Yes, it's very important case. As in first week, there is a lot to > discuss, I would like to get back to it, straight after I will be sure > about some other things. Sure. > > (5) Tests > Some remarks: > > As I see in other implementations, each algorithm implementation should > firstly check if graph given at input is valid. > So, surely it should be added to algorithm code and there should be > created tests for it - e.g. checking if graph has weights set on all edges. IIRC you handled that case by forcing weights on the edges without them (set allunweighted ...). I have some thoughts about tests circulating in my mind ever since your Monday mail, which I really should write up. They should be mostly coherent by now, I believe (I usually try to not think consciously about it, but let my subconscious work things out before starting to write). > As I said before tests I created will be availible tomorrow at SVN - I > will try to comment them well to set clearly, what each of them is for. Ok. Looking forward to it. Andreas. |
From: Andreas K. <and...@ac...> - 2009-05-28 16:13:50
|
Michał Antoniewski wrote: > Update at Assembla is done. > > Check : "Tcllib/ver2" folder for updates I enumarated last time. > > Some tests are done for smaller graphs just to check some single cases, > some of them base on more complex graphs and offer more complex tests as > well ( all those graphs were checked by me, algorithms were firstly done > on paper step by step, then I debugged my implementations step by step > to see if they make each operation properly ). > > So, this is how it looks for now:) Ok and thank you. I will look through this over the day and then come back to you. In the mean time I also send out my treatise on writing test cases, a minute ago, I guess it crossed paths with your mail. Andreas. |
From: Andreas K. <and...@ac...> - 2009-05-28 18:05:05
|
Michał Antoniewski wrote: > Update at Assembla is done. > > Check : "Tcllib/ver2" folder for updates I enumarated last time. > > Some tests are done for smaller graphs just to check some single cases, > some of them base on more complex graphs and offer more complex tests as > well ( all those graphs were checked by me, algorithms were firstly done > on paper step by step, then I debugged my implementations step by step > to see if they make each operation properly ). Lets go through the code. > 106: #Bellman's Ford Algorithm > 107: # > 108: #Input: > 109: #Graph - G > 110: #Edge weights - w > 111: #Start node - s > 112: # > 113: #Output: > 114: #Array d[u] of distances of each node from start node > 115: #Returns 0 if there is a cycle with negative sum of weights Do you have web references about the algorithm, at wikipedia or other website ? If yes, adding them to the documentation would be appreciated, as it makes looking it up much easier. > 116: > 117: proc ::struct::graph::op::BellmanFord { G startnode } { > 118: > 119: #checking if all edges have their weights set > 120: VerifyWeightsAreOk $G > 121: > 122: #checking if the startnode exists in given graph G > 123: if {![$G node exists $startnode]} { > 124: return -code error "node \"$startnode\" does not exist in graph \"$G\"" > 125: } The indentation doesn't look too nice, this however is a nit, i.e. this can be fixed later. Even so, it looks nicer if it properly indented from the beginning. May I ask what editor you use ? If it is emacs your buffer mode should be Tcl, and the command ESC C-\, aka 'indent-region', can be used to indent a whole block (marker to cursor) nicely. Other editors may have similar commands. > 126: > 127: # nodes of graph G > 128: set V [$G nodes] > 129: set E [$G arcs] > 130: > 131: #initialization > 132: foreach i $V { > 133: dict set distances $i Inf > 134: } > 135: > 136: dict set distances $startnode 0 > 137: > 138: #main loop > 139: for { set i 1 } { $i <= [ expr { [dict size $distances]-1 } ] } { incr i } { The second argument of 'for' is an expression already, making nested 'expr' command superfluous and just the code less readable. I.e. the { $i <= [ expr { [dict size $distances]-1 } ] } should be written as { $i <= ([dict size $distances]-1) } > 140: > 141: foreach j $E { > 142: set u [$G arc source $j] ;# start node of edge j > 143: set v [$G arc target $j] ;# end node of edge j > 144: > 145: if { [dict get $distances $v] > [expr [ dict get $distances $u ] + [$G arc getweight $j]] } { Ditto here, the first argument of 'if' is an expression, and nested expr commands are superfluous. > 146: dict set distances $v [expr [dict get $distances $u] + [$G arc getweight $j]] Brace your expressions. > 147: } > 148: } > 149: } > 150: > 151: #checking if there exists cycle with negative sum of weights > 152: foreach i $E { > 153: set u [$G arc source $i] ;# start node of edge i > 154: set v [$G arc target $i] ;# end node of edge i > 155: > 156: if { [dict get $distances $v] > [expr [ dict get $distances $u ] + [$G arc getweight $i]] } { > 157: return 0 Uh. Compare to line 161. Your command has a varying return type. 0 for this case, versus a dictionary. Looking back I see it documented in lines 114 and 115. Even so, this is not a good API design. The caller will have to go through contortions to handle this type of thing ... I.e. check for length of result, if 1 then it is likely this case, so we check for the 0. Otherwise it is a dict. Now, is this case of a 'cycle with negative sum of weights' an error case of the algorithm ? Then it is likely better to actually throw an error, i.e. 'return -code error ...' Even returning an empty dictionary, i.e. {} is better as we can then be sure that the return-type is always a dictionary. > 158: } > 159: } > 160: > 161: return $distances > 162: > 163: } > 166: #Johnsons Algorithm > 167: #------------------------------------------------------------------------- > 168: #Input: > 169: # graph G, possible options: > 170: # -cutdisplay ( returns only existing distances, cuts all Inf values ) > 171: # > 172: #Output: > 173: # dictionary containing distances from each pair of vertices > 174: # returns 0 if cycle with negative sum of weights at edges is found Same thing I as mentioned for BellmanFord, either throw an error or at least use the empty dictionary {} as result for the exceptional case, to have a single return-type, not a mix > 175: > 176: proc ::struct::graph::op::Johnsons { G args} { > 177: > 178: #options for procedure The procedure argument 'args' is always a list, and should be processed with list commands. > 179: if { $args eq "-cutdisplay" } { > 180: set param 1 > 181: } elseif { $args eq "" } { > 182: set param 0 > 183: } else { > 184: return -code error "Bad option given. Available options: -cutdisplay" > 185: } More canonical would be set param 0 ; # default foreach option args { switch -exact -- $option { -cutdisplay { set param 1 } default { return -code error "Bad option \"$option\". Expected -cutdisplay" } } } Other notes * The name of the variable for an option should reflect a connection to the option. When I read 'param' later I will not think of '-cutdisplay'. * The name -cutdisplay does not evoke (for me) a connection to the removal of infinities either. > 186: > 187: #checking if all edges have their weights set > 188: VerifyWeightsAreOk $G > 189: > 190: #Transformation of graph G - adding one more node connected with > 191: #each existing node with an edge, which weight is 0 > 192: set V [$G nodes] > 193: set s [$G node insert] > 194: > 195: foreach i $V { > 196: if { $i ne $s } { > 197: $G arc insert $s $i > 198: } > 199: } > 200: > 201: $G arc setunweighted > 202: > 203: #potential values > 204: set h [BellmanFord $G $s] > 205: > 206: #are there any negative cycles? > 207: if { $h eq 0 } { > 208: return 0 > 209: } else { Remember my note regarding throwing an error for the 'cycle with negative weight'. Using the empty dict the check changes to if { ![llength $h] } { When an error is thrown this error has to be caught and re-thrown, to allow us to undo the addition of the special start node $s (see my note below about that) > 210: > 211: $G node delete $s Should this not happen for the error case above as well ? I.e. right now the graph is left modified if the input had a negative cycle. > 212: > 213: #setting new weights for edges in graph G > 214: foreach i [$G arcs] { > 215: set u [$G arc source $i] > 216: set v [$G arc target $i] > 217: > 218: $G arc setweight $i [ expr [$G arc getweight $i] + [dict get $h $u] - [dict get $h $v] ] Brace your expressions. We should document that the weights on the graph are changed ... Hm, reading further I see that you undo the re-weighting in the distances you are pulling out of dijkstra. Does that mean we should undo this change to the graph as well ? If this re-weighting is like the special node we added, then yes. > 219: } > 220: > 221: #finding distances between all pair of nodes > 222: foreach i [$G nodes] { > 223: set dijkstra [::struct::graph::op::dijkstra $G $i -arcmode directed -outputformat distances] As Johnson is defined in the same namespace as dijkstra we can use just 'dijkstra' as the command name. See also line 204 where this is done. > 224: > 225: foreach j [$G nodes] { > 226: if { $i ne $j } { > 227: if { $param eq 1 } { As 'param' is a boolean (values 0, 1) a comparison is not required. Writing if { $param } { ... is sufficient and more readable as well. Having the variable now named to evoke a connection to -cutdisplay and the code is a bit more self-documenting as well, no need to remember what it stood for. > 228: if { [dict get $dijkstra $j] ne "Inf" } { > 229: dict set values [list $i $j] [ expr [ dict get $dijkstra $j] - [dict get $h $i] + [dict get $h $j] ] Brace your expressions. > 230: } > 231: } else { > 232: dict set values [list $i $j] [ expr [ dict get $dijkstra $j] - [dict get $h $i] + [dict get $h $j] ] Brace your expressions. > 233: } > 234: } > 235: } > 236: > 237: } > 238: } > 239: > 240: return $values > 241: } > 242: Next up, the test cases ... |
From: Andreas K. <and...@ac...> - 2009-05-28 18:24:33
|
Andreas Kupries wrote: > Next up, the test cases ... I found only a quibble and a typo ... '...UNCOHERENT...' in XOpsSetup ... How about '...PARTIALLYCONNECTED...' instead. Thats the quibble. Running the tests through SAK using a Tcl 8.5.7 shell ... % ./sak.tcl test run -l X -s ~/Abuild/bin/t modules/struct [ ] Starting ... [ ] [8.5.7] struct ~~ FAILS T 2568 P 2541 S 26 F 1 % grep FAILED X.log ==== graphop-gtcl-stcl-sttcl-qtcl-Johnsons-3.1 Johnsons, bad option used FAILED ==== graphop-gtcl-stcl-sttcl-qtcl-Johnsons-3.1 FAILED Looking into X.log ---- Result was: Bad option given. Available options: -cutdisplay ---- Result should have been (exact matching): Bad option given. Availible options: -cutdisplay ==== graphop-gtcl-stcl-sttcl-qtcl-Johnsons-3.1 FAILED And that's the typo, in the test case, in the expected result. Andreas. |
From: Kevin K. <ke...@ac...> - 2009-05-25 22:24:18
|
Andreas Kupries wrote: > 1. What about infinity values... In my implementation of Bellman Fords I >> just added string "inf", as a value of infinity ( in initialization we >> assign value of infinity on each distance we are searching for). Is it >> okay, or you've got other way to assign those values? > > Let me check something ... I believe that in Tcl 8.5 we have standard strings > for the infinities already. Matching these would be a good thing, even if we > are working with Tcl 8.4 ... > > % tclsh > % info patchlevel > 8.5.7 > % expr log(0) > -Inf > > Yes. The standard strings are "Inf" and "-Inf". > > Kevin, do you have anything to add here, as one of the guys knowing the math > stuff in the Tcl core better than most ? Nothing you didn't already say. Inf and -Inf are parsed as floating point numbers. -- 73 de ke9tv/2, Kevin |
From: Andreas K. <and...@ac...> - 2009-05-25 22:32:14
|
Kevin Kenny wrote: > Andreas Kupries wrote: >> Yes. The standard strings are "Inf" and "-Inf". >> >> Kevin, do you have anything to add here, as one of the guys knowing the math >> stuff in the Tcl core better than most ? > > Nothing you didn't already say. Inf and -Inf are parsed as floating > point numbers. Ok. Thank you for your time. Andreas. |
From: Andreas K. <and...@ac...> - 2009-05-28 16:10:52
|
Andreas Kupries wrote: >> 4. Tests ( with tcltest package ). I'm going to construct some special >> cases and build some graphs, where especially algorithm could fail. For >> approximation algorithms, surely will be needed Tight Examples ( cases >> where algorithm gets it's maximum approximation factor ). If you can >> give me some remarks about tests I would appreciate. > > I will definitely try. This one is also a point where I have to think a bit > more before writing, thus not an immediate answer in this mail. The promised treatise on writing test cases ... Thinking about past tests and testsuites I am currently seeing three levels of tests ... First, "drudgery" tests. Written to check absolutely basic assumptions which should never fail. Example: For a command FOO taking two arguments, three tests calling it with zero, one, and three arguments. The basic checks that the command fails if it has not enough arguments, or too many. After that come the tests checking things based on our knowledge of the command, about its properties and assumptions. Some examples: ** The BellmanFord takes a _startnode_ as argument, and this node should be a node of the graph. equals one test case checking the behavior when the specified node is not a node a graph. This often gives rise to code in the implementation which explicitly checks the assumption and throws a nice error. Instead of letting the algorithm fails later in some weird non-deterministic way. Such checks cannot be done always. The graph argument for example is just a command in itself, and while we expect it to exhibit a certain interface, i.e. set of sub-commands aka methods, we cannot check that it has them, except by actually trying to use them. That is done by the algorithm anyway, so an explicit check is just overhead we can get by without. ** IIRC one of the distinguishing characteristic of either BellmanFord and/or Johnson is that they are able to handle negative weights. Whereas Dijkstra requires positive weights. This makes three testcases ... Graph with all positive weights, all negative, and a mix of positive and negative weights. Thinking about it, do the algorithm handle weight '0' as well ? Another test case, or several, if we mix zero with positive and negative weights. ** The two algorithms we are currently thinking about are about distances between nodes, and distance can be 'Inf'inity, i.e. nodes may not be connected. This means that good test cases are (1) Strongly connected graph (2) Connected graph (3) Disconnected graph. At the extremes of (1) and (3) we have the fully connected graphs and graphs without edges, only nodes, i.e. completely disconnected. ** IIRC both of the algorithms take weighted arcs, and fill in a default if arcs are left unweighted in the input graph. This also induces three test cases: (1) Graph will all arcs with explicit weights. (2) Graph without weights at all. (3) Graph with mixture of weighted and unweighted graphs. What I have described above via examples is called 'black-box' testing. Test cases are designed and written based on our knowledge of the properties of the algorithm and its inputs, without referencing a particular implementation. Your query about tests for 'maximum approximation factor' falls into this category. Your algorithm has extrema, so we should have tests which forces the algorithm to these extrema. If we know (how to construct) inputs which do so. Going further, a complement to 'black-box' testing is 'white-box'. For this we know the implementation of the algorithm, we look at it and design our tests cases so that they force the code through all possible paths in the implementation. Wherever a decision is made we have a test cases forcing a specific direction of the decision, for all possible directions. In practice I often hope that the black-box tests I have made are enough to cover all the paths, obviating the need for white-box tests. So, if you now believe that writing tests for an algorithm takes at least as much time as coding the algorithm, and often more time, then you are completely right. It does. An interesting connection is to documentation. In one direction, the properties you are checking with black-box testing are properties which should be documented in the algorithm man page. And conversely, if you have documentation of properties of an algorithm then this is a good reference to base black-box tests on. In practice test cases and documentation often get written together, cross-influencing each other. And the actual writing of test cases is a mix of black and white box, possibly influencing the implementation while writing the tests. Like writing test for 'startnode not in input graph' serving as reminder to put in a check for this into the code. Andreas. |
From: Lars H. <Lar...@re...> - 2009-05-30 15:40:12
|
Andreas Kupries skrev: > Michał Antoniewski wrote: >> Hello. >> >> Today, I started my work and I've got some questions. > > Ok. I will answer to the best of my abilities. I may explicitly draw in other > people and their expertise, depending on the context. As of now I will copy our > conversation to tcllib-devel, allowing the community to chime in as well. Chiming in (perhaps a bit late), then... Is there a reason this is titled graph /manipulations/, when most of it is about implementing algorithms that take a graph as input but don't aim to change it in any way? Not that I mind (I'm more comfortable modelling graphs as immutable data than as mutable objects), but often one wants higher level operations that modify graphs than just adding/removing one edge/vertex. As an example, one kind of manipulation that seems useful (especially when you get to the flow algorithms, in a couple of weeks) would be an operation that splits every vertex in two, where one part receives the incoming edges and the other gets the outbound edges, with a new edge from in-part to out-part. (This is a classical trick for reducing flow problems with capacities on vertices to flow problems with capacities only on the edges.) > > As it is in >> timeline I started with Bellman-Ford and Johnson's. In general, I think time is better spent implementing one algorith each for doing A, B, and C, than three different algorithms for doing A, but YMMV. Also, I find the rule applied in struct::graph::op about naming commands after algorithms that is a bit curious. There's certainly a point in doing so when you've got many algorithms for doing the same thing, but on the other hand it renders code employing these commands more obscure. >> 2. Documentation. Should I just extend those html files in tcllib ( >> graphops.html ) or write my documantation? As we talked it over before - >> documentation will be made at the end. > > Looking over the Tcllib sources I have here I do not see any file named > "graphops.html". Can you tell me where you found it ? Perhaps http://tcllib.sourceforge.net/doc/graphops.html? That one does not seem to be listed in http://tcllib.sourceforge.net/doc/index.html, though... ;-) Regarding that previously existing manpage, I notice it documents both [struct::graph:op::kruskal] and [struct::graph:op::prim] as computing a minimum spanning tree (I presume that should be a minimum /weight/ spanning tree) for a graph, but fails to give any distinguishing trait, other than the names of the algorithms. While this may be sufficient, for a reader with access to the literature, I think the average tcllib user would appreciate a bit more information; in particular, I think the respective complexities of the commands should be given the manpage, especially since the results varies depending on how cleverly an algorithm has been implemented. (E.g., looking in a textbook I see a naive O(mn) implementation of Kruskal, but also a more efficient O(m \log n) implementation, and a note that O(m \alpha(m,n)) is possible.) IOW, when documenting this kind of command, please give (a good estimate of) its complexity as well. Also, regarding the API provided by struct::graph, I find it a bit curious that "weight" is singled out as a special property of arcs, with its own methods for setting, getting, and so on. I suspect a more practical approach for a higher level package like struct::graph::op would be that the weight was some attribute given to the arcs, and that a command for computing e.g. a shortest path would take as an argument (or option value) the name of the attribute to use as "weight" in this particular calculation. Lars Hellström |
From: Andreas K. <and...@ac...> - 2009-06-01 18:36:39
|
Lars Hellström wrote: > Andreas Kupries skrev: >> Michał Antoniewski wrote: >>> Hello. Hi Lars. >>> Today, I started my work and I've got some questions. >> Ok. I will answer to the best of my abilities. I may explicitly draw in other >> people and their expertise, depending on the context. As of now I will copy our >> conversation to tcllib-devel, allowing the community to chime in as well. > > Chiming in (perhaps a bit late), then... > > Is there a reason this is titled graph /manipulations/, when most of it > is about implementing algorithms that take a graph as input but don't > aim to change it in any way? Not that I mind (I'm more comfortable > modelling graphs as immutable data than as mutable objects), but often > one wants higher level operations that modify graphs than just > adding/removing one edge/vertex. It is a bit of a misnomer, and definitely my fault, I named the idea that way ... Maybe I had the graph query and transform idea in my mind at the time. > As an example, one kind of manipulation that seems useful (especially > when you get to the flow algorithms, in a couple of weeks) would be an > operation that splits every vertex in two, where one part receives the > incoming edges and the other gets the outbound edges, with a new edge > from in-part to out-part. (This is a classical trick for reducing flow > problems with capacities on vertices to flow problems with capacities > only on the edges.) I would put such an operation into a graph::ops procedure, and export it from the ops package. I.e. a helper, but valuable enough to be a regular public operation. Which of such helpers we then move to the foundation class, i.e. struct::graph, can then be discussed separately. >> > As it is in >>> timeline I started with Bellman-Ford and Johnson's. > > In general, I think time is better spent implementing one algorith each > for doing A, B, and C, than three different algorithms for doing A, but > YMMV. The three distances algorithms (dijkstra, bellman-ford, johnson) have different time complexities and limitations. As such also different areas of application, depending on the graphs to be expected as inputs. And automatic selection based on the input is often not possible. Information like this, i.e. raw time complexities and limitations, and then at the higher level which algorithm is better in what situation is definitely something we have to put into the documentation. > Also, I find the rule applied in struct::graph::op about naming > commands after algorithms that is a bit curious. There's certainly a > point in doing so when you've got many algorithms for doing the same > thing, but on the other hand it renders code employing these commands > more obscure. Maybe. It depends a bit on the audience. Assuming that the reader knows graph theory I see now problem. Another thing using code can do is to define aliases mapping from, for example, 'distance', to 'dijkstra'. That location would then also be a good place for the user to comment on why a specific algorithm was chosen, see above about that. >>> 2. Documentation. Should I just extend those html files in tcllib ( >>> graphops.html ) or write my documantation? As we talked it over before - >>> documentation will be made at the end. >> Looking over the Tcllib sources I have here I do not see any file named >> "graphops.html". Can you tell me where you found it ? > > Perhaps http://tcllib.sourceforge.net/doc/graphops.html? That one does > not seem to be listed in http://tcllib.sourceforge.net/doc/index.html, > though... ;-) Interesting that it is not in the index. I will ping Joe English, he maintains that part of the tcllib site. (This index is maintained manually so far (*)) If this is indeed the graphops.html Michal was talking about, then to clarify, this HTML is generated from the file 'graphops.man' in the modules/struct/ directory. (*) There are some very new sak commands to help him, with intention to make it fully automatic in the future. Alas, afaik we are not there yet. > > Regarding that previously existing manpage, I notice it documents both > [struct::graph:op::kruskal] and [struct::graph:op::prim] as computing a > minimum spanning tree (I presume that should be a minimum /weight/ > spanning tree) for a graph, but fails to give any distinguishing trait, > other than the names of the algorithms. While this may be sufficient, > for a reader with access to the literature, I think the average tcllib > user would appreciate a bit more information; in particular, I think > the respective complexities of the commands should be given the > manpage, especially since the results varies depending on how cleverly > an algorithm has been implemented. (E.g., looking in a textbook I see a > naive O(mn) implementation of Kruskal, but also a more efficient O(m > \log n) implementation, and a note that O(m \alpha(m,n)) is possible.) > > IOW, when documenting this kind of command, please give (a good > estimate of) its complexity as well. Fully agreed. See also my comments above re multiple algorithm for the same. The kruskal and prim implementations are from last year's GSoC. Have to check who wrote the docs. Might have been me, as part of the integration into tcllib (The '08 operations were written outside of Tcllib first). As such this is not something to be done by Michal. Not during the GSoC period. I am fine if he wishes to do that after. By then however, I might have fixed things already. Can you (Lars) please go over the docs and file regular Tcllib bugs for where you believe information is missing ? > Also, regarding the API provided by struct::graph, I find it a bit > curious that "weight" is singled out as a special property of arcs, > with its own methods for setting, getting, and so on. I suspect a more > practical approach for a higher level package like struct::graph::op > would be that the weight was some attribute given to the arcs, and that > a command for computing e.g. a shortest path would take as an argument > (or option value) the name of the attribute to use as "weight" in this > particular calculation. This was also a decision made during GSoC 2008. I am not recalling details at the moment. There should be something in my mail archives however as I was involved, although not as primary mentor. For this I however should have some mails. Andreas. |
From: Lars H. <Lar...@re...> - 2009-06-03 14:09:07
|
Andreas Kupries skrev: > Lars Hellström wrote: >> As an example, one kind of manipulation that seems useful (especially >> when you get to the flow algorithms, in a couple of weeks) would be an >> operation that splits every vertex in two, where one part receives the >> incoming edges and the other gets the outbound edges, with a new edge >> from in-part to out-part. (This is a classical trick for reducing flow >> problems with capacities on vertices to flow problems with capacities >> only on the edges.) > > I would put such an operation into a graph::ops procedure, and export it > from the ops package. I.e. a helper, but valuable enough to be a regular > public operation. Which of such helpers we then move to the foundation > class, i.e. struct::graph, can then be discussed separately. Well, although the operation is useful, I'd have to add that an operation which does exactly this feels very much like a lame special case of some more general "insert gadget" operation, and it is really this general operation that should be implemented. The catch is that there's no generally accepted formalism (that I know of) for describing the very intuitive operation of replacing every vertex by some gadget. (Actually, I have done some work on such matters -- see http://abel.math.umu.se/~lars/diamond/paper-gr.pdf, Section 4 -- but that approach is far from immediately applicable to struct::graph.) >>> > As it is in >>>> timeline I started with Bellman-Ford and Johnson's. >> >> In general, I think time is better spent implementing one algorith >> each for doing A, B, and C, than three different algorithms for doing >> A, but YMMV. > > The three distances algorithms (dijkstra, bellman-ford, johnson) have > different time complexities and limitations. As such also different > areas of application, depending on the graphs to be expected as inputs. Yes, but each of them solves maybe 80% of all graph distance problems users will pose. The question is then whether one prioritises the remaining 20% of this problem class A, or the first 80% of some other problem class B... > And automatic selection based on the input is often not possible. And usually a Bad Idea, even when it is possible. > Information like this, i.e. raw time complexities and limitations, and > then at the higher level which algorithm is better in what situation is > definitely something we have to put into the documentation. > > >> Also, I find the rule applied in struct::graph::op about naming >> commands after algorithms that is a bit curious. There's certainly a >> point in doing so when you've got many algorithms for doing the same >> thing, but on the other hand it renders code employing these commands >> more obscure. > > Maybe. It depends a bit on the audience. Assuming that the reader knows > graph theory I see now problem. Well, if one is regularly teaching a graph theory course that covers these algorithms, then I suppose one immediately knows Prim from Kruskal, but otherwise I'm not so sure. Often one remembers algorithms in vaguer terms, such as the edge- or vertex-oriented respectively algorithm for solving the problem in question -- when one remembers individual algorithms at all. A lot of the time, it is sufficient to recall for what problems a polynomial algorithm exists; sorting out the details can typically be delayed until one starts coding. There is also the issue that case conventions can make the names harder to recognise. I recognise "kruskal" and "dijkstra" as names even in all-lower-case, but found "prim" and "tarjan" far more mysterious on first sight. Another problem is that some persons have discovered several algorithms, and for those this naming scheme isn't unique anymore. > Another thing using code can do is to > define aliases mapping from, for example, 'distance', to 'dijkstra'. > That location would then also be a good place for the user to comment on > why a specific algorithm was chosen, see above about that. Yes. Furthermore it could be a good idea to have an overview section /before/ the long list of command descriptions. This section would be organised by topic (e.g.: paths, trees, network flows, connectivity, etc.) and mention what algorithms are available (preferably with hyperlinks to command descriptions). > The kruskal and prim implementations are from last year's GSoC. Have to > check who wrote the docs. Might have been me, as part of the integration > into tcllib (The '08 operations were written outside of Tcllib first). > > As such this is not something to be done by Michal. > Not during the GSoC period. I am fine if he wishes > to do that after. By then however, I might have fixed > things already. > > Can you (Lars) please go over the docs and file regular Tcllib bugs for > where you believe information is missing ? Have filed some earlier today. Could be more on the way. >> Also, regarding the API provided by struct::graph, I find it a bit >> curious that "weight" is singled out as a special property of arcs, >> with its own methods for setting, getting, and so on. I suspect a more >> practical approach for a higher level package like struct::graph::op >> would be that the weight was some attribute given to the arcs, and >> that a command for computing e.g. a shortest path would take as an >> argument (or option value) the name of the attribute to use as >> "weight" in this particular calculation. > > This was also a decision made during GSoC 2008. I am not recalling > details at the moment. There should be something in my mail archives > however as I was involved, although not as primary mentor. For this I > however should have some mails. Thinking a bit more about this lead me to the idea of dividing the struct::graph API into smaller functional groups, but that is getting so long that I'll write a separate mail about it. Lars Hellström |