|
From: Vladimir Y. <vol...@cs...> - 2008-08-19 05:48:12
|
No, I have not yet finished incorporating it into edmonds-alg - have to change accessing weights to property maps. Will post the results when ready. Regarding O(N^2) complexity, you are right. The number of list merge operations is |V|-1 as each merge decreases the number of remaining of vertices (SCCs) by 1. Btw, Tarjan decreases the weight of all incoming edges by the difference between the cycle's edge weight and the weight of the cheapest edge in the cycle. Subtracting the weight of the cycle edge from the weights of all edges sharing with it the target should be good as well and will save additional arithmetic operation per edge. By induction, this will not causes any edge weights to become negative as result. Moreover, the decrement can be performed when we select a critical edge -not waiting the cycle colapse stage- just decrease all the incoming edges by its weight. The weights of edges on weakly connected components then become zero. This is faster then going through all cycle edges to decrement due to cache locality. Regarding the O(VlogV) algorithm, it should be faster than O(V^2) even for dense graphs. I'll work on its implementation as we get O(ElogV) ready - it's very similar and the constants in O look low. Do you have some toy example of creating a sparse graph of V vertices, initially no edges, so that I can add edges myself and have your implementation run on it? This will help a lot in my understanding of the Boost related staff of the implementation. Thanks, Vlad On Sun, Aug 17, 2008 at 04:49, Vladimir Yanovsky <vol...@cs...>wrote: > I modified pending/disjoint_sets.hpp to have fast and dirty > implementation of "disjoint sets with weights" which are used by > Tarjan's implementation to update edge weights when collapsing cycles. > Currently the implementation updates every incoming edge to every > strongly connected component vertex on a cycle being collapsed (this > particularly inefficient when the graph becomes dense and, I think, > leads to Omega(VE) complexity). Actually, I'm quite sure that the time complexity is O(V^2) for the algorithm in the current implementation. You seem to have overlooked the fact that the sizes of the lists containing the incoming edges are at most |V|: Each list contains at most one outgoing edge for each strongly connected component. So we are not actually updating _every_ incoming edge, but at most |V| of them. A careful examination also reveals that the total number of list merge operations is O(V) (I think 2*|V| is the actual upper bound). Since each list contains at most |V| edges, the total time of merging the lists is O(V^2). Updating the weights before or during the merges clearly makes no difference in terms of time complexity. Still, I'm very interested to know if there is any dramatic change in the running time with your implementation of disjoint sets with weights. Have you run any tests? Another thing that should be done is to use Boost's Fibonacci or > Relaxed heaps as a container for edges so that we have O(E log V) > time. Finally Fibonacci heaps implementation of O(E log V) algorithm > can be later modified into O (V log V). Yes, I agree. I'm aware of the article describing this. I just haven't had any use for it myself since all my graphs are dense. I'll need some (continuous) time on my hands to be able to do that, though of course now I have a motivation since someone may actually use it. Any comments, help, especially on correct declaration/interfacing with > Boost is greatly appreciated - my Boost experience is tiny and writing > function bodies is much easier for me than the headers. A great resource is of course the boost users mailing list where more knowledgeable people can help with Boost matters. That would be my first choice of resource when dealing with Boost-specific issues. > One question about the current implementation. Why are all data > structures initialized with (max_vertex_idx +1) elements and Union > Finds with 2*(max_vertex_idx +1)? What are these x2 and "+1" needed > for? Should not max_vertex_idx be enough? > Since the vertex indices range from zero to (and including) max_vertex_idx, there are max_vertex_idx + 1 actual vertices. The disjoint set datastructure in Boost needs to know the maximum number of disjoint sets; in our case, the weakly and strongly connected components. The upper bound on the number of created components is 2*|V|. Cheers, /Ali |
|
From: Vladimir Y. <vol...@cs...> - 2008-08-31 04:56:16
|
Hi Ali, 1. Yes certainly one cannot find branchings faster than one can read the graph:-) Having O(n log n) algorithm will only improve your implementation for dense graphs if you read a graph significantly faster than compute the branchings and I don't know if this is the case. 2. My implementation of cycle contractions using union-find failed on your test. I'm now on vacation/conference in Europe for 3 weeks, so it may take some time till I fix and post it. 3. The purpose of decrementing the edge weight of the edges in a cycle is only in order to make all the weights equal, so that any branching/arborescence can be replace by one having a single entry into the cycle. Hence this least costly edge reduction is completely unnecessary. 4. The main idea of O(n log n) algorithm is very similar to O(m log n). The main difference is that in O(m log n) they did not use Fibonacci heap which was not available then. In O(n log n) paper they use Fibonacci heap to store sets of edges adjacent to nodes and also notice that with Fibonacci heap one does not have to delete an item from one heap in order to move to another, instead a whole tree can be moved between heaps (nodes) in O(1). Then the only operation taking log n time would be finding critical edges (computing MIN/MAX of the heap) and there are at most n of this. 5. Thanks a lot for the examples for sparse graphs. They will realy help. Best, Vlad On Fri, Aug 22, 2008 at 2:35 PM, Ali Tofigh <ali...@gm...> wrote: > On Tue, Aug 19, 2008 at 07:48, Vladimir Yanovsky <vol...@cs...> > wrote: >> >> Btw, Tarjan decreases the weight of all incoming edges by the >> difference between the >> cycle's edge weight and the weight of the cheapest edge in the cycle. >> Subtracting the >> weight of the cycle edge from the weights of all edges sharing with it >> the target should >> be good as well and will save additional arithmetic operation per >> edge. By induction, this will >> not causes any edge weights to become negative as result. > > This only works when finding maximum spanning arborescences. It does not > work for maximum branchings, where the negative edge weights are essential > for the algorithm to work. (Also, there is nothing that prevents the > incoming graph from having negative edges in the first place.) > The purpose of >> >> Moreover, >> the decrement >> can be performed when we select a critical edge -not waiting the cycle >> colapse stage- just decrease all the incoming edges >> by its weight. The weights of edges on weakly connected components >> then become zero. This is faster then going >> through all cycle edges to decrement due to cache locality. > > We could do this optimization by treating the case of MSAs and maximum > branchings separately. But I would rather get the code working for sparse > graphs first, and do such optimizations later when we've gotten most of the > bugs out of the way. > >> Regarding the O(VlogV) algorithm, it should be faster than O(V^2) even >> for dense graphs. > > But dense graphs can never have better time complexity than O(|V|^2) simply > because they have O(|V|^2) edges, and you always have to at least read the > entire input! The time complexity of Tarjan's '86-paper is O(|V| log |V| + > |E|). > >> I'll work >> on its implementation as we get O(ElogV) ready - it's very similar and >> the constants in O look low. > > If I remember correctly, the algorithm in the '86-paper isn't just a simple > modification of the older algorithm. It is much more different than simply > using modified F-heaps to keep track of incoming edges. The core data > structures used in book-keeping are very different (again, this is just from > memory). I remember thinking, while reading the '86-paper, that the constant > in the time complexity may actually be quite large. I don't recall why I > thought so, perhaps because there was so much book-keeping needed, or > perhaps because I had read that F-heaps actually are not very fast in > practice. This, however, is from "Introduction to algorithms" by Cormen et > al 1989, so that information may be dated. > >> >> Do you have some toy example of creating a sparse graph of V vertices, >> initially no edges, so that I can add edges >> myself and have your implementation run on it? This will help a >> lot in my understanding of the Boost related staff of the implementation. > > I have created an example on how to create a sparse graph using the > adjacency_list class of BGL and calling edmonds_optimum_branching. It is now > committed into the subversion repository at sourceforge. You'll find it in > trunk/doc/examples. Let me know if you have problems compiling the code. > > Cheers, > /Ali > > |