Home

Marco Mina

FastSemSim project.

The aim is to develop a library and a set of tools to easily use semantic similarity measures (i.e. Resnik, SimGIC, ...) over the Gene Ontology.


A manual will come soon. Meanwhile, I supplied two example tools (see README file within the archive).
Comments and suggestions are welcome. I'm actively developing this tool, and if you post your needings I'll take care of them as soon as possible. Thank you.

In general, a Gene Ontology descriptor file and an annotation corpus are required.
Gene Ontology files can be downloaded from http://www.geneontology.org. Actually fastSemSim can perfectly parse xml-obo files.
There are many annotation corpora around the web. One of the most popular is Gene Ontology Annotation (GOA) http://www.ebi.ac.uk/GOA/. fastSemSim is able to parse GOA original file format, without requiring any effort to the user. FastSemSim comes with a smart parser able to filter uninteresting annotations (i.e. it can select annotations involving proteins or gene products of a specific organism).

This work is released under GNU GPL license. However, this software is currently unpublished work. You must contact us before using it or its results or any work/app. based on top of it in any published work.
Corresponding author: Marco Mina. Email: marco.mina.85@gmail.com

News

[03/04/2013] News: fastSemSim 0.7.2 is about to be released.
Minor bug fixes, possibility of extracting the Information Content (IC) of GO Terms, possibility of loading the IC of GO Terms from an external file.
[28/11/2012] News: fastSemSim 0.7.1 has been released.
Minor bug fixes, possibility of evaluating GO Term Semantic Similarity with the command line interface.

NEWS:
- Bug fix: fastSemSim does not hang anymore when using older versions of the Gene Ontology missing newest GO Terms.
- fastSemSim command line interface: Now provide the --GOTerm flag to evaluate the semantic similarity between GO Terms.

Please check the Wiki for details

Log

[22/10/2012] News: fastSemSim 0.7 has been released.
Updated core libraries, updated Gene Ontology, new Graphical User Interface

NEWS:
- New Graphical User Interface (GUI)
- The new libraries now allow to select which GO relationships (is_a, part_of, regulates, has_part) to consider or ignore
- Fixed some minor bugs on data coherence checks
- GUI: Can store current settings in a file and load them.
- An embedded Gene Ontology is now provided (updated to 22-10-2012)

Please check the Wiki for details
[08/03/2012] News: fastSemSim 0.6 has been released.
Introducing support for enhanced Resnik

NEWS:
- Introduced a new command line tool to evaluate the semantic similarity from the command line
- The command line tool now supports enhanced Resnik. (speeds up calculation of proteome-wide SS dramatically - Resnik only)
- Fixed a minor bug regarding GO Term Id conversion from string to int

TODO:
- Add support for the embedded Gene Ontology to fastSemSimGui

Please check the Wiki for details
[24/02/2012] News: fastSemSim 0.5.1 has been released.
##Introducing FastSemSim command line tool

NEWS:
- Introduced a new command line tool to evaluate the semantic similarity from the command line
- An up-to-date (2012-02-24) version of the Gene Ontology is now embedded in fastSemSim
- Setup scripts have been further improved
- Added support of gzipped Gene Ontology files. Now you can directly use the files downloaded from http://www.geneontology.org; no need to decompress them

TODO:
- Add support for the embedded Gene Ontology to fastSemSimGui

[19/02/2012] News: fastSemSim 0.5 has been released.

This release improves the main libary and fixes several issues of the GUI.

NEWS:
- Improved example files
- Library files are now better formatted and commented
- Redundant code has been unified
- Several test has been performed to verify the stability and the correctness of the library
- Improved GeneOntology and AnnotationCorpus classes
- Fixed several bugs of the GUI
- Setup files have been rewritten and improved.
- Added support for 64 bit versions of Windows and Linux

TODO:
- Add support to Python 3

[01/12/2011] News: fastSemSim 0.4.6 has been released.

NEWS:
- Improved Wiki here
- Built version 0.4.6 available for Windows users (runs without Python)

Introducing fastResnik

a new tool to efficiently evaluate proteome-wide and genome-wide Resnik max semantic similarity. For yeast and fly it takes between 20 mins. and an hour (depending on the architecture you're using). See the corresponding Wiki page for info (coming soon)

FIXES AND IMPROVEMENTS:
- Added support for regulates, positively regulates and negatively regulates relations between GO terms
- FastSemSim now comes with a setup procedure to install it (see INSTALL file).
- Several example files are provided in examples folder
- Example Gene Ontology and annotation corpora are now included
- Better interface to load GO and Annotation corpora. More options to filter IEA or taxonomy-specific annotations
- Corrected some bugs of the GUI

TODO:
- FastSemSimGui runs smoothly on several Linux and OS X distributions, but it might slow down during calculations in Windows-based environments. I'm still trying to resolve this issue.
- fastSemSimGui might not work under Python-2.6 or earlier versions. fastSemSim library works well.
- Soon there will be a fastSemSim version compatible with Python 3.

[29/10/2011] News: fastSemSim 0.4.3 has been released.

NEWS:
- New Wiki pages explaining how to use fastSemSim and fastSemSimGui are available here

FIXES:
- Fixed some issues regarding some groupwise semantic similarity measures.
- FastSemSim is now able to parse annotation files in GAF-2.0 format.
- The executable version of fastSemSimGui should work now.

TODO:
- FastSemSimGui runs smoothly on several Linux and OS X distributions, but it might slow down during calculations in Windows-based environments. I'm still trying to resolve this issue.
- fastSemSimGui might not work under Python-2.6 or earlier versions. fastSemSim library works well.
- Soon there will be a fastSemSim version compatible with Python 3.

[03/10/2011] News: fastSemSim 0.4.1 has been released.

Improved version of release 0.4. Just fixed the exit procedure (now it kills the background process too) and improved log messages.

[26/09/2011] News: fastSemSim 0.4 has been released.

In this version I fixed some minor issues regarding the parsing of Gene Ontology files.
I also improved the graphical user interface. Now data are loaded directly from the background process, resulting in a lighter application. It is now possible to load huge annotation corpora, such as all the GOA annotations for human or yeast, without particular requirements in terms of physical memory.

TODO toward the next release
  • develop a more sophisticate interface to load annotation corpora
  • develop a more sophisticate interface to load personalized query files
  • develop a more sophisticate interface to personalize output files format

These changes will allow you to filter annotations on the basis of their Evidence Code, or exclude proteins from a specific organism.

[06/09/2011] News: fastSemSim 0.3.4 has been released.

This version should be fully functional. However, it is still an alpha version. Improvements will come as soon as possible.

Changelog
  • added configuration files: you can save and restore your favourite configuration, so you won't need to reconfigure the program manually.

  • bug resolved: no input from query field

  • bug resolved: no input from file

  • included sample files in examples directory. You can use these files to test fastSemSim functionalities.

  • new measures available: Dice, Jaccard, GSESAME, TO, NTO, Czekanowski-Dice, Cosine. Other measures will be available soon.

[06/09/2011] fastSemSim 0.3.4 has been released.
[03/09/2011] fastSemSim 0.3.1 has been released.