README
#Requires python 2.6/2.7
#Biopython
#Example walk through
#The inputs were taken from Vipie results from job: https://binf.uta.fi/vipie/results.html?key=BweTu1Pk
#We are grateful to MetaShot authors for usage of simulated data.
#Please see their paper and simulation data description. https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx036/2959848/MetaShot-an-accurate-workflow-for-taxon
#This validation script should work with all Vipie jobs/results provided that the simulation data (*R1.fastq, *R2.fastq) headers include
#valid read titles (@); which means they must include a reference id, we recommend using: Huang W. et al. (2012) ART: a next-generation sequencing read simulator. Bioinformatics , 28, 593–594.
#for example, the reads below from R1 and R2 have NC_012932.1, a Flavivirus genome:
@gi|254688376|ref|NC_012932.1|-1286/1
CAGGGTAATAGCACTGGCCAAATGCTAGAAACGCGTTATGGAAGGATGGAGTACCGACCAATGTATGTGGATGACCGGTTCGAGAACATAGAGTGGGACCAGCGCCGTTTATCTATAGAAATGTACATTAACACACGCAGTTCAGCTAGT
+
<??A?BBBDDDDDDB@AFGGEGFEIIIIHFFHHIHGHFI@IHIHI@HIHFHHIHHIGHIIIIFFIHH>IECHHIIFIGIFHFIHIHF?IIIHIIHHHGHFHHHIHGDHH@HCFGG>HHBGGHCFHH,FEGGGECGGFGEAG@DDGGEEFE
@gi|254688376|ref|NC_012932.1|-1286/2
AATAATGACACAAAATCCCAGGGGCACTCCAAAGTAATGCAGAACTCCAGGGGACGTTAGGAAAAAGCCACCTCCTAGTGGATCAGAGACGGGCATTGGCTCAAAAGAGTTTTTTCCCGACCGGGAATATGACATCAGGCAAGTAAAAAC
+
#Within source forge /vipie/files/validation please download the RBH612U1342Z.tar.gz and references.tar.gz
#then assess_precision_recall.py and
#and in the same directory where you saved assess_precision_recall.py, unzipped the gz files using
#tar -xzvf RBH612U1342Z.tar.gz
#tar -xzvf references.tar.gz
#Run
#with python 2.6/2.7 in your path, issue the following command.
python -W ignore assess_precision_recall.py RBH612U1342Z
#summary and stats are recorded in job_label.summary, ie, RBH612U1342Z.summary
#your outputs should be similar to this. files with genus and species match are stored
#as well as to the console precision, recall and f-measure stats, in addition, several files are created capturing
#the viral genus and species comparisons between simulated input and Vipie assignments, forming the basis for scores below
('False positive and false negative validation started for RBH612U1342Z', '2017-03-23 10:20:51')
UNKNOWN INPUT SEQUENCE_3569_length_150 004 NC_007580.2 GTTTTGAAAGGAGCCTCACGACTCCGTTCACCATCGAGCTAGCTGATCCAGTTGCTTTCACTTCATAACTTCCATGATAGTTCCACGTTCTATATGGGTTGTTGTGGTCATACGTCCACGTTTGTTGGTATTCTTCCCTCAATCTCCGTA
{"perfect": 1194, "species_match": 4922, "false_positive": 38, "genus_match": 5202, "human_retrovirus": 292, "negative_genus": 53, "in_assigned_not_all_reads": 1, "negative_group": 0, "human_retro_positive": 292, "assigned_seqs": 958535, "true_positive": 5255, "human_positives": 938471, "human_all_reads": 945367, "negative_species": 333, "human_false_positives": 0, "all_reads_seqs": 966460, "false_negative": 0}
human all reads 945367 assigned 938471 false_positive 0
false negatives genus 53 species 333 in virus assignments
virus group assigned:99.98
virus genus assigned:98.99
virus species assigned:93.66
virus group correctly assigned:93.38
virus genus correctly assigned:93.32
virus species correctly assigned:92.97
human_assigned:99.27
human_correctly_assigned:99.27
human_unassigned: 0 virus_unassigned 37200
virus group precision:99.28
virus genus precision:98.28
virus species precision:92.99
overall_precision_virus:96.85
virus_group_recall:100.0
virus_genus_recall:98.00
virus_species_recall:88.08
overall_recall_virus95.36
virus_group_fscore:99.63
virus_genus_fscore:98.14
virus_species_fscore:90.46
overall_fscore_virus:96.08
human precision:1.0
human recall:99.96
human fscore:99.98
('Vipie assignment precision and recall assessment completed for job RBH612U1342Z', '2017-03-23 10:22:47')
###Files produces
-rw-r--r-- 1 jakelin staff 1.6M Mar 23 12:22 RBH612U1342Z_vipie_assigned_validation.tsv
-rw-r--r-- 1 jakelin staff 8.7K Mar 23 12:22 RBH612U1342Z_vipie_assigned_false_positive.tsv
-rw-r--r-- 1 jakelin staff 68K Mar 23 12:22 RBH612U1342Z_false_negatives.tsv
-rw-r--r-- 1 jakelin staff 415B Mar 23 12:22 RBH612U1342Z.summary