Download Latest Version asm-1.0.tar.gz (23.5 kB) Get Updates
Home / asm / 1.0
Name Modified Size InfoDownloads / Week
Parent folder
asm-1.0.tar.gz 2013-04-16 23.5 kB
README.txt 2013-04-16 9.5 kB
Totals: 2 Items   33.0 kB 0
Colorado Computational Pharmacology, University of Colorado School of Medicine April 16, 2013 ASM is a subgraph matching tool for dependency graphs. It is the Java implementation of the approximate subgraph matching algorithm we designed for matching dependency graphs. The subgraph matching problem (subgraph isomorphism) is NP-complete. Previously, we designed an exact subgraph matching (ESM) algorithm for dependency graphs using a backtracking approach (esmalgorithm.sourceforge.net). We further designed an approximate subgraph matching (ASM) algorithm that is capable of detecting approximate subgraph matching based on a subgraph distance. Assume that the graph G and the subgraph Gs have m and n vertices, and km and kn edges respectively, the total worst-case algorithm complexity is O(m^n * n(n-1)/2 * km * log m). We have demonstrated the successful usage of our algorithm in different biomedical information extraction (relation and event) applications: BioNLP 2011 and 2013 shared tasks on event extraction in molecular biology and cancer genetics, and Protein-Residue association detection. The ASM result natually subsumes the ESM result, i.e., approximate subgraph matching includes exact subgraph matching. However, if you look for exact subgraph isomorphism or graph isomorphism, please consider the ESM implementation (esmalgorithm.sourceforge.net), which is more efficient in complexity. This directory contains the ASM tool developed by Haibin Liu <Haibin.Liu@ucdenver.edu>, Larry Hunter <Larry.Hunter@ucdenver.edu>, Vlado Keselj <vlado@cs.dal.ca> and Karin Verspoor <Karin.Verspoor@nicta.com.au>. It is developed in Java and is released as open source software to the NLP and text mining research communities to be used for research purposes only (see section 5 below for copyright information). It can be downloaded via http://asmalgorithm.sourceforge.net. If you make any changes, the authors would appreciate it if you can send the details of what you have done. Note: The ASM code requires Java version 6 or greater. 1. Files and Folders --------------------- README.txt this file asm-1.0.tar.gz the source code, and test cases for the ASM 2. Usage -------- The constructor of ASM takes as input two graphs, one is considered as subgraph (smaller) and the other is considered as a graph (bigger), a given subgraph distance threshold that allows approximate subgraph matching, and a set of subgraph matching weights that balance the graph differences in structure, label and directionality (Equal weights are usually used as default). ASM provides one main public method for detecting approximate subgraph matching results: (1) getApproximateSubgraphMatchingMatches() Retrieve specific matching results of all possible approximate matchings between the input subgraph and the input graph because a subgraph can sometimes match different places of a graph. The implementation of the underlying approximate subgraph matching algorithm is declared as private method. See the source code for the detailed implementation. If you look for exact subgraph isomorphism or graph isomorphism, please consider the ESM implementation (esmalgorithm.sourceforge.net). ESM provides three main public methods for different exact subgraph matching needs: For more details about our ASM algorithm, the analysis on time complexity, and the different applications of the algorithm, see the section "Related Publications" for the complete list of our ASM-related publications. Set the MAVEN_OPTS environment variable to provide the JVM enough memory to load the BioLemmatizer (this command only needs to be executed once): export MAVEN_OPTS="-Xmx1G" 3. Usage Examples --------------------------------------------------------------------------------------------------------------------------- ASMTest.java contains a comprehensive set of unit test cases for the ASM public method. Please see the test cases for specific usage examples before starting to build your own ASM-based applications. Generallly, the following 3 steps are involved: (1) Create two graphs a createGraph() method is provided in the ASMTest.java to help with creating dependency graphs directly from dependency representations. The "DirectedGraph" implementation in the JUNG library is adopted to facilitate the graph generation and operation. However, separate definitions for graph vertex and edge are provided, along with the ASM, to cater to the specific needs of dependency graphs. The definitions of vertex and edge can be modified or extended according to users' special needs. In addition, the BioLemmatizer is used to generate lemmas for graph node tokens for comparing node contents during the graph matching process. (2) Create ASM object The ASM constructor takes as input two graphs, one is considered as subgraph (smaller) and the other is considered as a graph (bigger), a given subgraph distance threshold that allows approximate subgraph matching, and a set of subgraph matching weights that balance the graph differences in structure, label and directionality (Equal weights are usually used as default). (3) Gall getApproximateSubgraphMatchingMatches() method The public approximate graph isomorphism method can be then invoked on the ASM object. 4. Related Publications ------------------------------------ This ASM implementation implements the approximate subgraph matching algorithm proposed and applied in the following papers (in descending chronological order): (1) Haibin Liu, Lawrence Hunter, Vlado Keselj, and Karin Verspoor. Approximate Subgraph Matching-based Literature Mining for Biomedical Events and Relations. PLOS ONE, 8:4 e60954, 2013. (2) Ravikumar Komandur, Haibin Liu, Judith Cohn, Michael E. Wall and Karin Verspoor. Literature Mining of Protein-Residue Associations with Graph Rules Learned through Distant Supervision. Journal of Biomedical Semantics, 3(Suppl 3):S2, 2012. 5. Copyright and License ------------------------------------ The software is released under the New BSD license (http://www.opensource.org/licenses/bsd-license.php). Copyright (c) 2013, Regents of the University of Colorado All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the University of Colorado nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any documentation, advertising materials, publications and other materials related to redistribution and use must acknowledge that the software was developed by Haibin Liu <Haibin.Liu@ucdenver.edu>, Larry Hunter <Larry.Hunter@ucdenver.edu>, Vlado Keselj <vlado@cs.dal.ca> and Karin Verspoor <Karin.Verspoor@nicta.com.au> and must refer to the following publication: Haibin Liu, Lawrence Hunter, Vlado Keselj, and Karin Verspoor. Approximate Subgraph Matching-based Literature Mining for Biomedical Events and Relations. PLOS ONE, 8:4 e60954, 2013. 6. Incorporated Library --------------------------------------- This software incorporates the JUNG library (http://jung.sourceforge.net/). JUNG is a Java-based open-source software library designed to support the modeling, analysis, and visualization of data that can be represented as graphs. Its focus is on mathematical and algorithmic graph applications pertaining to the fields of social network analysis, information visualization, knowledge discovery and data mining. JUNG library license: JUNG is licensed and made freely available under the Berkeley Software Distribution (BSD) license. 7. Acknowledgements ------------------------------------ Many thanks to William A Baumgartner Jr for his contribution to the software engineering and the release of the tool, and to Professor Lawrence Hunter and other members of the Colorado Computational Pharmacology group for providing valuable effort and suggestions related to this work.
Source: README.txt, updated 2013-04-16