Installation
MLDSP-GUI requires MATLAB runtime 9.6 (R2019a) freely available (for Windows, Linux, or Mac) at: https://www.mathworks.com/products/compiler/matlab-runtime.html
The installation can be done in two ways:
-
If you don't have MATLAB runtime 9.6 installed:
- Download MLDSPGUIweb.exe from "InstallOnline" directory.
- Run the file as administrator and follow the instructions.
- It is important that you replace the desktop shortcut created during installation. Once installed, go to the installation directory (Default is ~/Program Files/MLDSPGUI/Application)and create a desktop shortcut for MLDSPGUI.exe (right-click and select "send to desktop").
-
If you have MATLAB runtime 9.6 installed already:
- Download all the files from the "InstallOffline" directory and run the application using MLDSPGUI.exe.
Using your own dataset
Run the MLDSP-GUI app, select "Browse" under dataset, and choose the parent folder of the dataset.
The dataset should be created using the following format:
- Create a parent folder.
- Make subfolders (each subfolder represents a cluster).
- Each subfolder should contain .fasta file sequences (one sequence per file) of the respective type.
- Refer to the provided datasets in the "ProvidedDatasetFastaZip" directory for more details/examples.
Provided datasets
Besides the datasets provided in the executable file (Primates mtDNA, Influenza virus subtypes, Flaviridaeviruses, mitochondrial disease genomes), MLDSP-GUI provides additional datasets that can be downloaded separately and imported into the already installed tool.
The datasets can be downloaded and imported in two ways:
-
Using .mat files:
- Download the .mat file of the required dataset from the "ProvidedDatasetMat" directory and copy to the installation directory (Default is ~/Program Files/MLDSPGUI/Application/Database).
- run the MLDSP-GUI app, it will automatically read, and add the datasets to the list of available datasets.
-
Using .fasta files (raw sequences):
- Download the .zip file of the required dataset from the "ProvidedDatasetFastaZip" directory
- Unzip to extract the folders containing .fasta files.
- run the MLDSP-GUI app, select "browse" under dataset and choose the parent folder of the downloaded dataset when prompted.
List of provided datasets is given below.
Dataset | #Sequences |
---|---|
3classes | 3,200 sequences |
Amphibians | 264 sequences |
Birds-Fish-Mammals | 4,565 sequences |
ClassToSubclass(Actinopterygii) | 2,566 sequences |
Dengue | 4,721 sequences |
Disease-Classification | 102 sequences |
DomainToKingdom(Eukaryota) | 9,727 sequences |
DomainToKingdom(EukaryotanoProtists) | 9,483 sequences |
FamilyToGenus(Cyprinidae) | 92 sequences |
Flavivirus | 7,881 sequences |
Fungi | 340 sequences |
Human haplogroups | 1,150 sequences |
Human haplogroups subgroup1 | 350 sequences |
Human haplogroups subgroup2 | 250 sequences |
Influenza | 38 sequences |
Insects | 1,636 sequences |
KingdomToPhylum(Animalia) | 8,792 sequences |
Mammalia | 1,075 sequences |
OrderToFamily(Cypriniformes) | 756 sequences |
PhylumToSubphylum(Chordata) | 5,224 sequences |
Plants | 265 sequences |
Primates | 211 sequences |
Protists | 222 sequences |
SubclassToSuperorder(Neopterygii) | 1,759 sequences |
SubfamilyToGenus(Acheilognathinae) | 26 sequences |
SubphylumToClass(Vertebrata) | 5,176 sequences |
SuperorderToOrder(Ostariophysi) | 942 sequences |
Important Notes:
- The exported UPGMA tree (.tree file in Newick format ) can be viwed using any supported program. We recommend an online tool iTOL available at: https://itol.embl.de/upload.cgi
- Please note that the first run of MLDSP-GUI can seem bit slower, because it takes around a minute to start the parallel pool.
- Though distance computation is very fast, classification using 10-fold cross-validation can be slower for a larger dataset.
- Reported accuracies are a result of 10-fold cross-validation that takes average of 10 runs of each classifier.
- For distance computations, FATHOM toolbox is used: Jones D.L. (2017) Fathom Toolbox for MATLAB: software for multivariate ecological and oceanographic data analysis, University of South Florida. Available from: https://www.marine.usf.edu/research/matlab-resources/