Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
data | 2018-08-21 | ||
elsom-0.3.zip | 2018-08-24 | 44.5 kB | |
README.txt | 2018-08-21 | 3.3 kB | |
Totals: 3 Items | 47.8 kB | 0 |
elSOM: SOM-based architectures for large class imbalance prediction in genome-wide data. sinc(i) - http://fich.unl.edu.ar/sinc/ L.A. Bugnon, C. Yones, G. Stegmayer and D.H. Milone lbugnon@sinc.unl.edu.ar -------------------------------------------------------------------------------- This is a distribution of the source code used in: L.A. Bugnon, C. Yones, G. Stegmayer and D.H. Milone, "Novel SOM Architectures for Large Class Imbalance in Genome-wide Pre-miRNA Prediction", under review (2018). In this work we present two novel self-organizing map (SOM) architectures for classification in the context of large class imbalance problems. The methods automatically build several layers of SOM. Data is clustered and samples that are not likely to be positive class member are discarded at each level. The elastic-deepSOM (elasticSOM) is a deep architecture of SOM layers where the map size is automatically set in each layer according to the data filtered in each previous map. The ensemble-elasticSOM (eeSOM) uses several SOMs in ensemble layers to face the high imbalance challenges. These new models are particularly suited to handle problems where there is a labeled class of interest (positive class) that is significantly under-represented with respect to a higher number of unlabeled data. The methods are tested for pre-miRNA prediction in genome-wide data, using several model species. This code can be used, modified or distributed for academic purposes under GNU GPL. Please feel free to contact with any issue, comment or suggestion. Setup and run ================================================================== This source code requires a few standard libraries, which are also open and free to use. Setup instructions for the following packages can be found here: https://wiki.python.org/moin/BeginnersGuide/Download. The recommended package versions are in brackets. Requirements: - Python [2.7.13 or 3.5.4] - sompy: SOM library for python. Install the modified version of this library contained in folder SOMPY (using for example "python SOMPY/setup.py install" or "pip install --user SOMPY "). Please visit https://github.com/sevamoo/SOMPY for additional information. Python packages: - matplotlib [2.0] - pandas [0.19.2] - scipy [0.18.1] - sklearn [0.19] - orange/orange3 The code, by default, will be training and testing the models (elasticSOM and eeSOM) using one dataset (A. thaliana). Download and unzip the features set "ath" (https://sourceforge.net/projects/sourcesinc/files/elSOM/ath.zip) into the "data/" folder and run the main file: "python main.py" For each dataset, the script trains and tests the proposed methods on eight different partitions. Using an AMD Ryzen 7 17000 processor, the approximated time cost could be of 40 min per fold per classifier (for the ath dataset). If you want to reproduce the complete set of results, download the feature sets "cel" (https://sourceforge.net/projects/sourcesinc/files/elSOM/cel.zip), "aga" (https://sourceforge.net/projects/sourcesinc/files/elSOM/aga.zip) and "hsa" (https://sourceforge.net/projects/sourcesinc/files/elSOM/hsa.zip) and uncomment the line 48 in main.py: "#datasets=['cel','ath','aga','hsa']" Please keep in mind that the whole procedure can take several hours/days.