Re: [Tcllib-devel] Graph Manip

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Michał Antoniewski wrote:
> Hello.
>  
>  >>Using the following graph as example:
>  >>       A -(2)-> B -(-1)-> C
>  >>       A, B, C nodes
>  >>       2, -1 arcs weights
>  >>       no names for the arcs.
>  >>
>  >>       - so now, for graph G with N nodes:
>  >>             1.    "1 A" gives something like this:    { node1 
> .....                        nodeN }  { list of nodes adjacent to node1 
> } ......  >>{ list of nodes                        adjacent to nodeN   }
>  >>
>  >>      result = {A B C} {{B} {A C} {B}}
>  >>Is my understanding of your description correct ?
>  
> Hmmm, I meant something else. It was bad idea to send this description - 
> I should send the code at once.

Sometimes is code is easier to understand, correct. Even so, a good non-code 
description is vital as well, for the documentation.

> Actually, regarding dictionary usage - I wanted to do this just like you 
> said using dicts from the beginning, but on the other hand procedure 
> toAdjacencyMatrix is done using lists and I thought both procedures 
> should have similar return values. It would be more suitable for users 
> and I thought they shouldn't differ a lot.
>  
> In new version I did Adjacency List using dictionaries - it's surely 
> better solution.

Note that toAdjacencyMatrix returns (by definition) a matrix. For that a 
nestetd list of lists is a usable representation.

For adjacency _lists_ the dictionary form of a nested list-representation 
matches our desires better.

While making return values is similar or identical is a worthwhile goal we 
always also have to balance it against the suitability of the representation we 
are copying or mimicking. Using an unsuitable representation just for the sake 
of similarity will hurt us in the long run.

> About options:
> I searched for actual potential usages in algorithms of -edges option I 
> mentioned before and ..... it's hard to find anything important. It 
> could be useful in some cases but I think it's not useful enough.

Ok.

 > So, I
> erased it from the code. Now there are two options -directed ( for 
> directed graphs ) and -weights ( for weighted graphs ).

Ok.

> Tests for Adjacency List:
>      I've created some new graphs and test cases for them (and others 
> created before):
>          - firstly tests considering missing args...
>          - logical tests: e.g. lack of weights, wrong options.
>          - proper returning values: besides standard tests ( proper 
> option usage, cases with special graphs e.g. no edges and others) also 
> some cases directly concerning implementation, e.g. case when we have 2 
> edges between 2 nodes in directed graph, checking proper usage when arcs 
> are auto-named ( it's important case when considering all possible paths 
> that code can go through ) and different edge weight cases.

> So, that's all for now. I've got pretty crazy time at university now, 
> but I'll try to set at SVN Floyd-Warshall's implementation and maybe 
> some new tests tomorrow.

Ok. Note, please do not burn yourself out to force adherence to the schedule. 
Relaxation time is important as well. And a project like GSoC is a good 
real-life experience showing where problems in a project can come from, 
including outside factors.

 > About docs, I would like to catch up with it
> later, if that's okay with you.

Yes.

> You can find new tcllib at ver4 folder and I prepared different file 
> with toAdjacencyList and easy graph example just to run it and look what 
> exactly it returns ( Trunk/ver1/AdjacencyList ).
>  
> I'm using those verX folders because that way it's more clear for me. 
> When having those assembla revisions (they note every single update )  I 
> won't remeber which number exactly was which update and when I need to 
> find something from any version before, I've got it at one place ( on my 
> local repository too ) nicely arranged in my folders:) 

I see. That sounds to me that you are using the folders as a means of 'tagging' 
revisions, and as replacement for log messages.

Please have a look at
	http://svnbook.red-bean.com/

This is an online book about subversion, titled
	'Version Control with Subversion'.

The 2 chapters I believe are most likely relevant to your problem are

(1)	http://svnbook.red-bean.com/en/1.5/svn.tour.history.html
	Examining History
and
(2)	http://svnbook.red-bean.com/en/1.5/svn.branchmerge.tags.html
	Tags

The remainder of the book should be useful however as well.

... Ok, I wanted to ask if you are using 'log messages', but then looked back 
at the notification messages sent by assembla and I now see that it says 'No 
message'. Missing that was an oversight on my side, my apologies.

Not using log messages is a bad idea. The messages entered for each commit will 
help you, and any future maintainer to see what was happening in the repository 
at the high level.

See	svn help commit

for the syntax to enter log messages.

Using these the output of 'svn log' becomes actually useful. Together with tags 
you will have everything to not need to remember what was in each 
update/commit, as you can ask svn for the information.

As examples for log messages see the file 'ChangeLog' in the struct directory, 
and all the other packages in Tcllib. This should give you an idea of what is 
looked for in such messages.

~~~~

On to the code (revisions 10, 11) ... Your implementation of the undirected 
case is a bit complex because you are handling all adjacent edges in one loop, 
and then have to use 'G arc source|target' to switch how to handle the edge.

Consider splitting this loop into two, handling the incoming and outgoing edges 
separately.

Another trick you could use to simplify your life here is 'dict lappend' (See 
http://docs.activestate.com/activetcl/8.5/tcl/TclCmd/dict.htm#M17).

How is that useful ? Instead of collecting the adjnodes in your loop and then 
setting them all in a single 'dict set' at the end you can add them directly in 
the loop.

Actually ... Having this operation it might be possible to generate the whole 
result by iterating over the edges alone, instead of the vertices and then 
edges per vertex, I think.

Roughly:
	foreach e [$G arcs] {
		set va [G arc source e]
		set vb [G arc target e]

		dict lappend AdjacencyList $va $vb
		dict lappend AdjacencyList $vb $va
	}

Hm ... Ok, we need one loop over the nodes to initialize the dict first, so 
that we capture nodes without neighbours as well ...

	foreach v [G nodes] {
		dict set AdjacencyList $v {}
	}

Not sure about multiple edges between two nodes ... Of course, as you provided 
a test case for that any reworking of the implementation can be quickly checked.

Other things I noted. In

    if { [lsearch $adjNodes [$G arc source $e]] eq -1 } {
        lappend adjNodes [$G arc source $e]
    }

the 'eq -1' is bad. 'eq' is string comparison, -1 is a number. For this '==' is 
the better operator, using '< 0'.

Further, the whole searching likely makes things O(V^2) ... A different way of 
achiving the same things:

(1)	Just lappend nodes without care for duplicates.
	At the end run 'lsort -unique' over the adjacency lists in the
	dictionary to remove all duplicates, if any.

(2)	package require struct::set
	struct::set include adjNodes $v

	The 'set add', especially the C version uses a hashtable underneath,
	making duplicate checks asymptotically O(1) (lsearch is O(n)).

Last, if you have lots of repeated uses of the same large expression, say

	[list [$G arc source $e] [$G arc getweight $e]]

then it is sensible to store the value in a variable and to just refer to that 
later on, instead of computing the data multiple times.

~~~~

The shaw.ca address I added is my home address, should you wish to talk to me 
over the weekend.

Andreas