The aim is to develop a library and a set of tools to easily use semantic similarity measures (i.e. Resnik, SimGIC, ...) over the Gene Ontology.
A manual will come soon. Meanwhile, I supplied two example tools (see README file within the archive).
Comments and suggestions are welcome. I'm actively developing this tool, and if you post your needings I'll take care of them as soon as possible. Thank you.
In general, a Gene Ontology descriptor file and an annotation corpus are required.
Gene Ontology files can be downloaded from http://www.geneontology.org. Actually fastSemSim can perfectly parse xml-obo files.
There are many annotation corpora around the web. One of the most popular is Gene Ontology Annotation (GOA) http://www.ebi.ac.uk/GOA/. fastSemSim is able to parse GOA original file format, without requiring any effort to the user. FastSemSim comes with a smart parser able to filter uninteresting annotations (i.e. it can select annotations involving proteins or gene products of a specific organism).
NEWS:
- Bug fix: fastSemSim does not hang anymore when using older versions of the Gene Ontology missing newest GO Terms.
- fastSemSim command line interface: Now provide the --GOTerm flag to evaluate the semantic similarity between GO Terms.
NEWS:
- New Graphical User Interface (GUI)
- The new libraries now allow to select which GO relationships (is_a, part_of, regulates, has_part) to consider or ignore
- Fixed some minor bugs on data coherence checks
- GUI: Can store current settings in a file and load them.
- An embedded Gene Ontology is now provided (updated to 22-10-2012)
NEWS:
- Introduced a new command line tool to evaluate the semantic similarity from the command line
- The command line tool now supports enhanced Resnik. (speeds up calculation of proteome-wide SS dramatically - Resnik only)
- Fixed a minor bug regarding GO Term Id conversion from string to int
TODO:
- Add support for the embedded Gene Ontology to fastSemSimGui
NEWS:
- Introduced a new command line tool to evaluate the semantic similarity from the command line
- An up-to-date (2012-02-24) version of the Gene Ontology is now embedded in fastSemSim
- Setup scripts have been further improved
- Added support of gzipped Gene Ontology files. Now you can directly use the files downloaded from http://www.geneontology.org; no need to decompress them
TODO:
- Add support for the embedded Gene Ontology to fastSemSimGui
This release improves the main libary and fixes several issues of the GUI.
NEWS:
- Improved example files
- Library files are now better formatted and commented
- Redundant code has been unified
- Several test has been performed to verify the stability and the correctness of the library
- Improved GeneOntology and AnnotationCorpus classes
- Fixed several bugs of the GUI
- Setup files have been rewritten and improved.
- Added support for 64 bit versions of Windows and Linux
TODO:
- Add support to Python 3
NEWS:
- Improved Wiki here
- Built version 0.4.6 available for Windows users (runs without Python)
a new tool to efficiently evaluate proteome-wide and genome-wide Resnik max semantic similarity. For yeast and fly it takes between 20 mins. and an hour (depending on the architecture you're using). See the corresponding Wiki page for info (coming soon)
FIXES AND IMPROVEMENTS:
- Added support for regulates, positively regulates and negatively regulates relations between GO terms
- FastSemSim now comes with a setup procedure to install it (see INSTALL file).
- Several example files are provided in examples folder
- Example Gene Ontology and annotation corpora are now included
- Better interface to load GO and Annotation corpora. More options to filter IEA or taxonomy-specific annotations
- Corrected some bugs of the GUI
TODO:
- FastSemSimGui runs smoothly on several Linux and OS X distributions, but it might slow down during calculations in Windows-based environments. I'm still trying to resolve this issue.
- fastSemSimGui might not work under Python-2.6 or earlier versions. fastSemSim library works well.
- Soon there will be a fastSemSim version compatible with Python 3.
NEWS:
- New Wiki pages explaining how to use fastSemSim and fastSemSimGui are available here
FIXES:
- Fixed some issues regarding some groupwise semantic similarity measures.
- FastSemSim is now able to parse annotation files in GAF-2.0 format.
- The executable version of fastSemSimGui should work now.
TODO:
- FastSemSimGui runs smoothly on several Linux and OS X distributions, but it might slow down during calculations in Windows-based environments. I'm still trying to resolve this issue.
- fastSemSimGui might not work under Python-2.6 or earlier versions. fastSemSim library works well.
- Soon there will be a fastSemSim version compatible with Python 3.
Improved version of release 0.4. Just fixed the exit procedure (now it kills the background process too) and improved log messages.
In this version I fixed some minor issues regarding the parsing of Gene Ontology files.
I also improved the graphical user interface. Now data are loaded directly from the background process, resulting in a lighter application. It is now possible to load huge annotation corpora, such as all the GOA annotations for human or yeast, without particular requirements in terms of physical memory.
These changes will allow you to filter annotations on the basis of their Evidence Code, or exclude proteins from a specific organism.
This version should be fully functional. However, it is still an alpha version. Improvements will come as soon as possible.
added configuration files: you can save and restore your favourite configuration, so you won't need to reconfigure the program manually.
bug resolved: no input from query field
bug resolved: no input from file
included sample files in examples directory. You can use these files to test fastSemSim functionalities.
new measures available: Dice, Jaccard, GSESAME, TO, NTO, Czekanowski-Dice, Cosine. Other measures will be available soon.