Download Latest Version negative_examples.zip (30.0 MB)
Email in envelope

Get an email when there's a new version of sourcesinc

Home / elSOM
Name Modified Size InfoDownloads / Week
Parent folder
data 2018-08-21
elsom-0.3.zip 2018-08-24 44.5 kB
README.txt 2018-08-21 3.3 kB
Totals: 3 Items   47.8 kB 0
elSOM: SOM-based architectures for large class imbalance prediction in genome-wide
data.
sinc(i) - http://fich.unl.edu.ar/sinc/
L.A. Bugnon, C. Yones, G. Stegmayer and  D.H. Milone
lbugnon@sinc.unl.edu.ar
--------------------------------------------------------------------------------

This is a distribution of the source code used in: 

L.A. Bugnon, C. Yones, G. Stegmayer and  D.H. Milone, "Novel SOM Architectures for  
Large Class Imbalance in Genome-wide Pre-miRNA Prediction", under review (2018).

In this work we present two novel self-organizing map (SOM) architectures for
classification in the context of large class imbalance problems. The methods
automatically build several layers of SOM. Data is clustered and samples that
are not likely to be positive class member are discarded at each level. 

The elastic-deepSOM (elasticSOM) is a deep architecture of SOM layers where the map
size is automatically set in each layer according to the data filtered in each previous 
map. The ensemble-elasticSOM (eeSOM) uses several SOMs in ensemble layers to
face the high imbalance challenges. These new models are particularly suited
to handle problems where there is a labeled class of interest (positive
class) that is significantly under-represented with respect to a higher number
of unlabeled data. The methods are tested for pre-miRNA prediction in genome-wide data, 
using several model species.

This code can be used, modified or distributed for academic purposes under GNU
GPL. Please feel free to contact with any issue, comment or suggestion.

Setup and run ==================================================================

This source code requires a few standard libraries, which are also open and free to
use. Setup instructions for the following packages can be found here:
https://wiki.python.org/moin/BeginnersGuide/Download. The recommended package versions 
are in brackets.


Requirements:
- Python [2.7.13 or 3.5.4]
- sompy: SOM library for python. Install the modified version  
of this library contained in folder SOMPY (using for example "python SOMPY/setup.py install" or "pip install --user SOMPY "). Please visit https://github.com/sevamoo/SOMPY for additional information. 
Python packages:
- matplotlib [2.0]
- pandas [0.19.2]
- scipy [0.18.1]
- sklearn [0.19] 
- orange/orange3

The code, by default, will be training and testing the models 
(elasticSOM and eeSOM) using one dataset (A. thaliana). Download and unzip the features
set "ath" (https://sourceforge.net/projects/sourcesinc/files/elSOM/ath.zip) into the 
"data/" folder and run the main file:

"python main.py"

For each dataset, the script trains and tests the proposed methods on eight
different partitions. Using an AMD Ryzen 7 17000 processor, the approximated time
cost could be  of 40 min per fold per classifier (for the ath dataset). 

If you want to reproduce the complete set of results, download the feature sets "cel" 
(https://sourceforge.net/projects/sourcesinc/files/elSOM/cel.zip), "aga" 
(https://sourceforge.net/projects/sourcesinc/files/elSOM/aga.zip) and "hsa" 
(https://sourceforge.net/projects/sourcesinc/files/elSOM/hsa.zip) and uncomment the line 48
in main.py:

"#datasets=['cel','ath','aga','hsa']"

Please keep in mind that the whole procedure can take several hours/days. 
Source: README.txt, updated 2018-08-21