Re: [Tcllib-devel] gsoc'09 - Re: Graph Manipulations

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Andreas Kupries skrev:
> Lars Hellström wrote:
>> As an example, one kind of manipulation that seems useful (especially 
>> when you get to the flow algorithms, in a couple of weeks) would be an 
>> operation that splits every vertex in two, where one part receives the 
>> incoming edges and the other gets the outbound edges, with a new edge 
>> from in-part to out-part. (This is a classical trick for reducing flow 
>> problems with capacities on vertices to flow problems with capacities 
>> only on the edges.)
> 
> I would put such an operation into a graph::ops procedure, and export it 
> from the ops package. I.e. a helper, but valuable enough to be a regular 
> public operation. Which of such helpers we then move to the foundation 
> class, i.e. struct::graph, can then be discussed separately.

Well, although the operation is useful, I'd have to add that an 
operation which does exactly this feels very much like a lame special 
case of some more general "insert gadget" operation, and it is really 
this general operation that should be implemented. The catch is that 
there's no generally accepted formalism (that I know of) for describing 
the very intuitive operation of replacing every vertex by some gadget. 
(Actually, I have done some work on such matters -- see 
http://abel.math.umu.se/~lars/diamond/paper-gr.pdf, Section 4 -- but 
that approach is far from immediately applicable to struct::graph.)

>>>  > As it is in
>>>> timeline I started with Bellman-Ford and Johnson's.
>>
>> In general, I think time is better spent implementing one algorith 
>> each for doing A, B, and C, than three different algorithms for doing 
>> A, but YMMV.
> 
> The three distances algorithms (dijkstra, bellman-ford, johnson) have 
> different time complexities and limitations. As such also different 
> areas of application, depending on the graphs to be expected as inputs. 

Yes, but each of them solves maybe 80% of all graph distance problems 
users will pose. The question is then whether one prioritises the 
remaining 20% of this problem class A, or the first 80% of some other 
problem class B...

> And automatic selection based on the input is often not possible.

And usually a Bad Idea, even when it is possible.

> Information like this, i.e. raw time complexities and limitations, and 
> then at the higher level which algorithm is better in what situation is 
> definitely something we have to put into the documentation.
> 
> 
>> Also, I find the rule applied in struct::graph::op about naming 
>> commands after algorithms that is a bit curious. There's certainly a 
>> point in doing so when you've got many algorithms for doing the same 
>> thing, but on the other hand it renders code employing these commands 
>> more obscure.
> 
> Maybe. It depends a bit on the audience. Assuming that the reader knows 
> graph theory I see now problem.

Well, if one is regularly teaching a graph theory course that covers 
these algorithms, then I suppose one immediately knows Prim from 
Kruskal, but otherwise I'm not so sure. Often one remembers algorithms 
in vaguer terms, such as the edge- or vertex-oriented respectively 
algorithm for solving the problem in question -- when one remembers 
individual algorithms at all. A lot of the time, it is sufficient to 
recall for what problems a polynomial algorithm exists; sorting out the 
details can typically be delayed until one starts coding.

There is also the issue that case conventions can make the names harder 
to recognise. I recognise "kruskal" and "dijkstra" as names even in 
all-lower-case, but found "prim" and "tarjan" far more mysterious on 
first sight.

Another problem is that some persons have discovered several 
algorithms, and for those this naming scheme isn't unique anymore.

> Another thing using code can do is to 
> define aliases mapping from, for example, 'distance', to 'dijkstra'. 
> That location would then also be a good place for the user to comment on 
> why a specific algorithm was chosen, see above about that.

Yes. Furthermore it could be a good idea to have an overview section 
/before/ the long list of command descriptions. This section would be 
organised by topic (e.g.: paths, trees, network flows, connectivity, 
etc.) and mention what algorithms are available (preferably with 
hyperlinks to command descriptions).

> The kruskal and prim implementations are from last year's GSoC. Have to 
> check who wrote the docs. Might have been me, as part of the integration 
> into tcllib (The '08 operations were written outside of Tcllib first).
> 
>     As such this is not something to be done by Michal.
>     Not during the GSoC period. I am fine if he wishes
>     to do that after. By then however, I might have fixed
>     things already.
> 
> Can you (Lars) please go over the docs and file regular Tcllib bugs for 
> where you believe information is missing ?

Have filed some earlier today. Could be more on the way.

>> Also, regarding the API provided by struct::graph, I find it a bit 
>> curious that "weight" is singled out as a special property of arcs, 
>> with its own methods for setting, getting, and so on. I suspect a more 
>> practical approach for a higher level package like struct::graph::op 
>> would be that the weight was some attribute given to the arcs, and 
>> that a command for computing e.g. a shortest path would take as an 
>> argument (or option value) the name of the attribute to use as 
>> "weight" in this particular calculation.
> 
> This was also a decision made during GSoC 2008. I am not recalling 
> details at the moment. There should be something in my mail archives 
> however as I was involved, although not as primary mentor. For this I 
> however should have some mails.

Thinking a bit more about this lead me to the idea of dividing the 
struct::graph API into smaller functional groups, but that is getting 
so long that I'll write a separate mail about it.

Lars Hellström