We propose a parallel algorithm to do representative approximate frequent subgraph mining based on the REAFUM algorithm in this project. We successfully apply Fork-Join Model to our algorithm and parallelize the calculations on the generation of graph mapping distance matrix. Our algorithm guarantees identical results with REAFUM given same input while achieving significant improvements in runtime when running with multi-threads. We also demonstrate the scalability of our algorithm by experimenting the runtime of our algorithm with various large number of graph databases as inputs. We also provide advise on how to set up our algorithm in order to apply it to different size of datasets as well as some potential directions to improve our algorithm in the future.
Here we only distribute the parallel part of the code, which is the generation of graph mapping distance matrix.
Follow graph mapping distance matrix generator
User Reviews
-
I was able to get this algorithm to run by adding a Main that instantiates SampleCall and passes in run arguments, then saves the results to file. I produced an output graph edit distance file, which I posted to the Wiki. I have run it on my actual graph data, and I'm off to cluster the resulting Graph Edit Distance matrix using 'R'. So far, so good!