Re: [Tcllib-devel] gsoc'09 - Re: Graph Manipulations

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Lars Hellström wrote:
> Andreas Kupries skrev:
>> Michał Antoniewski wrote:
>>> Hello.

Hi Lars.

>>> Today, I started my work and I've got some questions.
>> Ok. I will answer to the best of my abilities. I may explicitly draw in other 
>> people and their expertise, depending on the context. As of now I will copy our 
>> conversation to tcllib-devel, allowing the community to chime in as well.
> 
> Chiming in (perhaps a bit late), then...
> 
> Is there a reason this is titled graph /manipulations/, when most of it 
>   is about implementing algorithms that take a graph as input but don't 
> aim to change it in any way? Not that I mind (I'm more comfortable 
> modelling graphs as immutable data than as mutable objects), but often 
> one wants higher level operations that modify graphs than just 
> adding/removing one edge/vertex.

It is a bit of a misnomer, and definitely my fault, I named the idea that way 
... Maybe I had the graph query and transform idea in my mind at the time.

> As an example, one kind of manipulation that seems useful (especially 
> when you get to the flow algorithms, in a couple of weeks) would be an 
> operation that splits every vertex in two, where one part receives the 
> incoming edges and the other gets the outbound edges, with a new edge 
> from in-part to out-part. (This is a classical trick for reducing flow 
> problems with capacities on vertices to flow problems with capacities 
> only on the edges.)

I would put such an operation into a graph::ops procedure, and export it from 
the ops package. I.e. a helper, but valuable enough to be a regular public 
operation. Which of such helpers we then move to the foundation class, i.e. 
struct::graph, can then be discussed separately.

>>  > As it is in
>>> timeline I started with Bellman-Ford and Johnson's.
> 
> In general, I think time is better spent implementing one algorith each 
> for doing A, B, and C, than three different algorithms for doing A, but 
> YMMV.

The three distances algorithms (dijkstra, bellman-ford, johnson) have different 
time complexities and limitations. As such also different areas of application, 
depending on the graphs to be expected as inputs. And automatic selection based 
on the input is often not possible.

Information like this, i.e. raw time complexities and limitations, and then at 
the higher level which algorithm is better in what situation is definitely 
something we have to put into the documentation.

> Also, I find the rule applied in struct::graph::op about naming 
> commands after algorithms that is a bit curious. There's certainly a 
> point in doing so when you've got many algorithms for doing the same 
> thing, but on the other hand it renders code employing these commands 
> more obscure.

Maybe. It depends a bit on the audience. Assuming that the reader knows graph 
theory I see now problem. Another thing using code can do is to define aliases 
mapping from, for example, 'distance', to 'dijkstra'. That location would then 
also be a good place for the user to comment on why a specific algorithm was 
chosen, see above about that.

>>> 2. Documentation. Should I just extend those html files in tcllib ( 
>>> graphops.html ) or write my documantation? As we talked it over before - 
>>> documentation will be made at the end.
>> Looking over the Tcllib sources I have here I do not see any file named 
>> "graphops.html". Can you tell me where you found it ?
> 
> Perhaps http://tcllib.sourceforge.net/doc/graphops.html? That one does 
> not seem to be listed in http://tcllib.sourceforge.net/doc/index.html, 
> though... ;-)

Interesting that it is not in the index. I will ping Joe English, he maintains 
that part of the tcllib site. (This index is maintained manually so far (*))

If this is indeed the graphops.html Michal was talking about, then to clarify, 
this HTML is generated from the file 'graphops.man' in the modules/struct/ 
directory.

(*) There are some very new sak commands to help him, with intention to make it 
fully automatic in the future. Alas, afaik we are not there yet.

> 
> Regarding that previously existing manpage, I notice it documents both 
> [struct::graph:op::kruskal] and [struct::graph:op::prim] as computing a 
> minimum spanning tree (I presume that should be a minimum /weight/ 
> spanning tree) for a graph, but fails to give any distinguishing trait, 
> other than the names of the algorithms. While this may be sufficient, 
> for a reader with access to the literature, I think the average tcllib 
> user would appreciate a bit more information; in particular, I think 
> the respective complexities of the commands should be given the 
> manpage, especially since the results varies depending on how cleverly 
> an algorithm has been implemented. (E.g., looking in a textbook I see a 
> naive O(mn) implementation of Kruskal, but also a more efficient O(m 
> \log n) implementation, and a note that O(m \alpha(m,n)) is possible.)
> 
> IOW, when documenting this kind of command, please give (a good 
> estimate of) its complexity as well.

Fully agreed. See also my comments above re multiple algorithm for the same.

The kruskal and prim implementations are from last year's GSoC. Have to check 
who wrote the docs. Might have been me, as part of the integration into tcllib 
(The '08 operations were written outside of Tcllib first).

	As such this is not something to be done by Michal.
	Not during the GSoC period. I am fine if he wishes
	to do that after. By then however, I might have fixed
	things already.

Can you (Lars) please go over the docs and file regular Tcllib bugs for where 
you believe information is missing ?

> Also, regarding the API provided by struct::graph, I find it a bit 
> curious that "weight" is singled out as a special property of arcs, 
> with its own methods for setting, getting, and so on. I suspect a more 
> practical approach for a higher level package like struct::graph::op 
> would be that the weight was some attribute given to the arcs, and that 
> a command for computing e.g. a shortest path would take as an argument 
> (or option value) the name of the attribute to use as "weight" in this 
> particular calculation.

This was also a decision made during GSoC 2008. I am not recalling details at 
the moment. There should be something in my mail archives however as I was 
involved, although not as primary mentor. For this I however should have some 
mails.

Andreas.