Download Latest Version MLDSPGUIweb.exe (103.4 MB)
Email in envelope

Get an email when there's a new version of MLDSP-GUI

Home / ProvidedDatasetFastaZip
Name Modified Size InfoDownloads / Week
Parent folder
Influenza.zip 2019-08-28 30.5 kB
Dengue.zip 2019-08-28 17.7 MB
SuperorderToOrder(Ostariophysi).zip 2019-08-28 5.3 MB
SubphylumToClass(Vertebrata).zip 2019-08-28 29.2 MB
SubfamilyToGenus(Acheilognathinae).zip 2019-08-28 147.7 kB
SubclassToSuperorder(Neopterygii).zip 2019-08-28 10.0 MB
Protists.zip 2019-08-28 2.2 MB
Plants.zip 2019-08-28 24.4 MB
PhylumToSubphylum(Chordata).zip 2019-08-28 29.5 MB
Primates.zip 2019-08-28 1.2 MB
OrderToFamily(Cypriniformes).zip 2019-08-28 4.3 MB
Mammalia.zip 2019-08-28 6.0 MB
KingdomToPhylum(Animalia).zip 2019-08-28 47.6 MB
Insects.zip 2019-08-28 8.1 MB
HumanHaplogroupsSubgroup2.zip 2019-08-28 1.4 MB
HumanHaplogroupsSubgroup1.zip 2019-08-28 1.9 MB
HumanHaplogroups.zip 2019-08-28 6.4 MB
Fungi.zip 2019-08-28 5.8 MB
Flavivirus.zip 2019-08-28 29.7 MB
FamilyToGenus(Cyprinidae).zip 2019-08-28 521.3 kB
DomainToKingdom(Eukaryota_noProtists).zip 2019-08-28 78.7 MB
DomainToKingdom(Eukaryota).zip 2019-08-28 81.1 MB
Disease-Classification.zip 2019-08-28 569.0 kB
ClassToSubclass(Actinopterygii).zip 2019-08-28 14.5 MB
Birds-Fish-Mammals.zip 2019-08-28 25.7 MB
Amphibians.zip 2019-08-28 1.5 MB
3classes.zip 2019-08-28 16.7 MB
Totals: 27 Items   450.2 MB 0

Installation

MLDSP-GUI requires MATLAB runtime 9.6 (R2019a) freely available (for Windows, Linux, or Mac) at: https://www.mathworks.com/products/compiler/matlab-runtime.html

The installation can be done in two ways:

  1. If you don't have MATLAB runtime 9.6 installed:

    • Download MLDSPGUIweb.exe from "InstallOnline" directory.
    • Run the file as administrator and follow the instructions.
    • It is important that you replace the desktop shortcut created during installation. Once installed, go to the installation directory (Default is ~/Program Files/MLDSPGUI/Application)and create a desktop shortcut for MLDSPGUI.exe (right-click and select "send to desktop").
  2. If you have MATLAB runtime 9.6 installed already:

    • Download all the files from the "InstallOffline" directory and run the application using MLDSPGUI.exe.

Using your own dataset

Run the MLDSP-GUI app, select "Browse" under dataset, and choose the parent folder of the dataset.

The dataset should be created using the following format:

  • Create a parent folder.
  • Make subfolders (each subfolder represents a cluster).
  • Each subfolder should contain .fasta file sequences (one sequence per file) of the respective type.
  • Refer to the provided datasets in the "ProvidedDatasetFastaZip" directory for more details/examples.

Provided datasets

Besides the datasets provided in the executable file (Primates mtDNA, Influenza virus subtypes, Flaviridaeviruses, mitochondrial disease genomes), MLDSP-GUI provides additional datasets that can be downloaded separately and imported into the already installed tool.

The datasets can be downloaded and imported in two ways:

  1. Using .mat files:

    • Download the .mat file of the required dataset from the "ProvidedDatasetMat" directory and copy to the installation directory (Default is ~/Program Files/MLDSPGUI/Application/Database).
    • run the MLDSP-GUI app, it will automatically read, and add the datasets to the list of available datasets.
  2. Using .fasta files (raw sequences):

    • Download the .zip file of the required dataset from the "ProvidedDatasetFastaZip" directory
    • Unzip to extract the folders containing .fasta files.
    • run the MLDSP-GUI app, select "browse" under dataset and choose the parent folder of the downloaded dataset when prompted.

List of provided datasets is given below.

Dataset #Sequences
3classes 3,200 sequences
Amphibians 264 sequences
Birds-Fish-Mammals 4,565 sequences
ClassToSubclass(Actinopterygii) 2,566 sequences
Dengue 4,721 sequences
Disease-Classification 102 sequences
DomainToKingdom(Eukaryota) 9,727 sequences
DomainToKingdom(EukaryotanoProtists) 9,483 sequences
FamilyToGenus(Cyprinidae) 92 sequences
Flavivirus 7,881 sequences
Fungi 340 sequences
Human haplogroups 1,150 sequences
Human haplogroups subgroup1 350 sequences
Human haplogroups subgroup2 250 sequences
Influenza 38 sequences
Insects 1,636 sequences
KingdomToPhylum(Animalia) 8,792 sequences
Mammalia 1,075 sequences
OrderToFamily(Cypriniformes) 756 sequences
PhylumToSubphylum(Chordata) 5,224 sequences
Plants 265 sequences
Primates 211 sequences
Protists 222 sequences
SubclassToSuperorder(Neopterygii) 1,759 sequences
SubfamilyToGenus(Acheilognathinae) 26 sequences
SubphylumToClass(Vertebrata) 5,176 sequences
SuperorderToOrder(Ostariophysi) 942 sequences

Important Notes:

  • The exported UPGMA tree (.tree file in Newick format ) can be viwed using any supported program. We recommend an online tool iTOL available at: https://itol.embl.de/upload.cgi
  • Please note that the first run of MLDSP-GUI can seem bit slower, because it takes around a minute to start the parallel pool.
  • Though distance computation is very fast, classification using 10-fold cross-validation can be slower for a larger dataset.
  • Reported accuracies are a result of 10-fold cross-validation that takes average of 10 runs of each classifier.
  • For distance computations, FATHOM toolbox is used: Jones D.L. (2017) Fathom Toolbox for MATLAB: software for multivariate ecological and oceanographic data analysis, University of South Florida. Available from: https://www.marine.usf.edu/research/matlab-resources/
Source: README.md, updated 2019-08-28