Download Latest Version vipie_cgi.tar.gz (276.5 MB)
Email in envelope

Get an email when there's a new version of vipie

Home / validation
Name Modified Size InfoDownloads / Week
Parent folder
README.txt 2017-03-27 4.2 kB
XQB816483N1R.tar.gz 2017-03-25 193.6 MB
XQB816483N1R.summary 2017-03-25 1.2 kB
assess_precision_recall.py 2017-03-24 18.8 kB
RBH612U1342Z.summary 2017-03-24 1.2 kB
EWMCIY2Q1NNZ.tar.gz 2017-03-24 171.3 MB
RBH612U1342Z_vipie_assigned_validation.tsv 2017-03-24 1.7 MB
RBH612U1342Z_false_negatives.tsv 2017-03-24 69.3 kB
RBH612U1342Z_vipie_assigned_false_positive.tsv 2017-03-24 9.0 kB
references.tar.gz 2017-03-23 123.0 MB
RBH612U1342Z.tar.gz 2017-03-23 193.6 MB
fasta_seq_dic.py 2017-03-21 6.9 kB
Totals: 12 Items   683.3 MB 0
README
#Requires python 2.6/2.7 
#Biopython

#Example walk through
#The inputs were taken from Vipie results from job: https://binf.uta.fi/vipie/results.html?key=BweTu1Pk

#We are grateful to MetaShot authors for usage of simulated data. 
#Please see their paper and simulation data description. https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx036/2959848/MetaShot-an-accurate-workflow-for-taxon
 
#This validation script should work with all Vipie jobs/results provided that the simulation data (*R1.fastq, *R2.fastq) headers include 
#valid read titles (@); which means they must include a reference id, we recommend using: Huang W. et al. (2012) ART: a next-generation sequencing read simulator. Bioinformatics , 28, 593–594.
#for example, the reads below from R1 and R2 have NC_012932.1, a Flavivirus genome:

@gi|254688376|ref|NC_012932.1|-1286/1
CAGGGTAATAGCACTGGCCAAATGCTAGAAACGCGTTATGGAAGGATGGAGTACCGACCAATGTATGTGGATGACCGGTTCGAGAACATAGAGTGGGACCAGCGCCGTTTATCTATAGAAATGTACATTAACACACGCAGTTCAGCTAGT

+
<??A?BBBDDDDDDB@AFGGEGFEIIIIHFFHHIHGHFI@IHIHI@HIHFHHIHHIGHIIIIFFIHH>IECHHIIFIGIFHFIHIHF?IIIHIIHHHGHFHHHIHGDHH@HCFGG>HHBGGHCFHH,FEGGGECGGFGEAG@DDGGEEFE
@gi|254688376|ref|NC_012932.1|-1286/2
AATAATGACACAAAATCCCAGGGGCACTCCAAAGTAATGCAGAACTCCAGGGGACGTTAGGAAAAAGCCACCTCCTAGTGGATCAGAGACGGGCATTGGCTCAAAAGAGTTTTTTCCCGACCGGGAATATGACATCAGGCAAGTAAAAAC
+


#Within source forge /vipie/files/validation please download the RBH612U1342Z.tar.gz and references.tar.gz
#then assess_precision_recall.py and  
#and in the same directory where you saved assess_precision_recall.py, unzipped the gz files using 
#tar -xzvf RBH612U1342Z.tar.gz 
#tar -xzvf references.tar.gz

#Run
#with python 2.6/2.7 in your path, issue the following command. 

python -W ignore assess_precision_recall.py RBH612U1342Z

#summary and stats are recorded in job_label.summary, ie, RBH612U1342Z.summary

#your outputs should be similar to this. files with genus and species match are stored
#as well as to the console precision, recall and f-measure stats, in addition, several files are created capturing
#the viral genus and species comparisons between simulated input and Vipie assignments, forming the basis for scores below

('False positive and false negative validation started for RBH612U1342Z', '2017-03-23 10:20:51')
UNKNOWN INPUT  SEQUENCE_3569_length_150	004	         NC_007580.2	GTTTTGAAAGGAGCCTCACGACTCCGTTCACCATCGAGCTAGCTGATCCAGTTGCTTTCACTTCATAACTTCCATGATAGTTCCACGTTCTATATGGGTTGTTGTGGTCATACGTCCACGTTTGTTGGTATTCTTCCCTCAATCTCCGTA

{"perfect": 1194, "species_match": 4922, "false_positive": 38, "genus_match": 5202, "human_retrovirus": 292, "negative_genus": 53, "in_assigned_not_all_reads": 1, "negative_group": 0, "human_retro_positive": 292, "assigned_seqs": 958535, "true_positive": 5255, "human_positives": 938471, "human_all_reads": 945367, "negative_species": 333, "human_false_positives": 0, "all_reads_seqs": 966460, "false_negative": 0}
human all reads 945367 assigned 938471 false_positive 0
false negatives genus 53 species 333 in virus assignments
virus group assigned:99.98
virus genus assigned:98.99
virus species assigned:93.66
virus group correctly assigned:93.38
virus genus correctly assigned:93.32
virus species correctly assigned:92.97
human_assigned:99.27
human_correctly_assigned:99.27
human_unassigned: 0 virus_unassigned 37200
virus group precision:99.28
virus genus precision:98.28
virus species precision:92.99
overall_precision_virus:96.85
virus_group_recall:100.0
virus_genus_recall:98.00
virus_species_recall:88.08
overall_recall_virus95.36
virus_group_fscore:99.63
virus_genus_fscore:98.14
virus_species_fscore:90.46
overall_fscore_virus:96.08
human precision:1.0
human recall:99.96
human fscore:99.98
('Vipie assignment precision and recall assessment completed for job RBH612U1342Z', '2017-03-23 10:22:47')

###Files produces
-rw-r--r--     1 jakelin  staff   1.6M Mar 23 12:22 RBH612U1342Z_vipie_assigned_validation.tsv
-rw-r--r--     1 jakelin  staff   8.7K Mar 23 12:22 RBH612U1342Z_vipie_assigned_false_positive.tsv
-rw-r--r--     1 jakelin  staff    68K Mar 23 12:22 RBH612U1342Z_false_negatives.tsv
-rw-r--r--     1 jakelin  staff   415B Mar 23 12:22 RBH612U1342Z.summary

Source: README.txt, updated 2017-03-27