Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
This folder has no files. | |||
Totals: 0 Items | 0 |
Genome-wide hairpins datasets.
sinc(i) - http://sinc.unl.edu.ar/ lbugnon@sinc.unl.edu.ar
This is a public distribution of the dataset detailed in: "Genome-wide hairpins datasets of animals and plants for novel miRNA prediction”, by Leandro A. Bugnon, Cristian Yones, Diego H. Milone and Georgina Stegmayer, under review (2019).
We make available to the research community the hairpin sequences and extracted features of genome-wide data, that can be very useful for the task of computational microRNAs prediction:
- Homo sapiens (hsa)
- Arabidopsis thaliana (ath)
- Anopheles gambiae (aga)
- Caenorhabditis elegans (cel)
- Drosophila melanogaster (dme)
These datasets are organized in the following folders:
- sequences/: Contains sequences that forms hairpin structures. Each genome is zipped individually, sorting out unlabeled sequences from well-known miRNAs sequences.
- features/: Contains sets of relevant features computed for miRNA prediction. Each zip file contains a comma-separated file with all the features and labels for each sequence extracted. As human dataset is very large, it was separated in three zip files: "hsa_miRNAs.zip" (positive samples), "hsa_unlabeled_part1.zip" and "hsa_unlabaled_part2.zip".
In addition, the script "dataset_stats.py" was provided to show an insight of the datasets. This script reproduce the figures shown in the manuscript. To run it, please unzip the features files and execute "python dataset_stats.py".
This data can be used and distributed for academic purposes. Please feel free to contact with any issue, comment or suggestion.