Im going to report all detected bugs in order to help you with improving.
hg19 annovar option (at the early beginning) is not working, only hg38
It would be nice to have a comment string in Readme about unzipping sownloaded genome fasta/GTF
( In addition, you should download genome fasta/GTF:
• human genome sequence ( ftp://ftp.ensembl.org/pub/release-80/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz )
• and GTF model ( ftp://ftp.ensembl.org/pub/release-80/gtf/homo_sapiens/Homo_sapiens.GRCh38.80.gtf.gz ) then GUNZIP them
and indicate path to the files in config.txt)
because python reports about utf-8 undecodable charachters are not very informative.
Last edit: avkitex 2015-11-03
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
During vcf file reordering perl script consumed more than 57 Gb of memory and finally failed. I'll try to use common vcf file (it is 0,8 Gb in zipped instead of 3,0 Gb in All.vcf)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
it is needed to install HTSeq-0.6.1 https://pypi.python.org/pypi/HTSeq and specify path to its HTSeq-0.6.1/scripts/htseq-count in config
or RuntimeErrors with java stack trace and FAILED will be outputed and PPline will stop
All this bugs are for PPLine.0.9.7.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Nikita,
Thank you for the bug report! I have fixed all the mentioned bugz except the last one. PPLine overrides PATH environment variable each time when launched and includes folders like Useful.stuff/bowtie2-2.2.5 in the PATH. Outside PPLine process and childs (e.g. Tophat2 being launced by ppline), PATH is intact.
What OS do you use? I have tested PPLine at Ubuntu 14.04 as everythong worked fine here.
Other bugz were fixed (0.9.8)
PPLine automatically downloads and unpacks genome fasta and GTF from UCSC
PPLIne also downloads reordered GATK-ready dbSNP vcf from our ftp server. No need to reorder it.
HTSeq is included in the release now.
Annovar DB download issue (hg19) was fixed
Splicing analysis with novel exon junctions discovery is disabled by default. This analysis may consume a lot of time but it is not needed for the most people. It can be switched on with --enable-splicing-analysis yes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the improovements.
I'm running PPline on Ubuntu 14 (on rackspace)
I'll try the new version.
Here is a log file with some errors I faced with and also my installation log. I believe it could contribute to the README file
Also it would be great idea to have some "steps" of the pipeline running.
It means that when some step failed we can restart PPline from this step next time (after fixing).
I've run PPline 3 times. 1st time it failed before tophut run (with error posted above about bowtie2 not installed)
2nd time it failed because i forgot to chmod +x htseq-count
And 3rd time because of GATK (see nohup.out, top post).
And for the 3rd time however i had all previous results, I had to wait for about 8 howrs (while full pipeline is doing the same)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'll try PPLine on a clean Ubuntu VM to uncover the bug with bowtie2. My ppline uses bowtie2 located in the Useful.stuff folder.
HTSeq count in included in the release (0.9.8) and should work now...
GATK failure is related to the inappropriate quality-coding. I'll add automatic quality rescoring for the input fasta files in the next release.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not familiar with GATK.
I am testing PPline on SRR1609982.
I believe that it is coded with phred 33 (does GATK require other or am I talking about something absolutely different?)
What am I supposed to do if I wold like to finish PPline successfully?
Shall I somehow do quality rescoring manually or better wait for the next PPline release?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
PPLIne 0.9.8.1 is releleased
- Ppline automatically downloads all the database at the first run
- added quality score rescaling feature (Illumina.1.3/1.5/Solexa > Phred33/Sanger)
- now PPLine accepts SRA accessions (e.g. SRR1609982, etc.) and downloads SRA
- fixed bug with downloading incorrect dbSNP vcf
- fixed bug with HTSeq-count python2 location
- other bugs were fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank You for the great work!
My Ppline run has just finished succesfully!
I have some questions left:
1)Is there any way to find out information about fudion proteins from Ppline output (e.g. is tophut-fusion a part of Ppline?)
2) What is the best criteria of proteotypic peptides sorting? (There is a plenty of them. There are lots os different scores in the vcf/visual xls. How shall i sort this list if I'd like to get 10 best according to the result (no more other programms))
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) No, Tophat-fusion is not a part of PPline. I strongly recommend deFuse as it is more accurate than Tophat-fusion. https://bitbucket.org/dranew/defuse
2) Sorting proteotypic peptides is a dual task. First, SNP/SAP detection should be reliable. The main criteria here is quantity of reads with alternate allele ('Alt.HQ Reads' column in excel file) and Phred quality of SNP calling ('Phread qual' col). Otherwise, you can use the values in 'SAP score' col as a measure of its reliability.
Second, not all proteotypic peptides 'can fly'. Many of them are non-ionizable. It's very difficult to predict this.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear authors,
I'm fond of your pipeline.
Im going to report all detected bugs in order to help you with improving.
hg19 annovar option (at the early beginning) is not working, only hg38
It would be nice to have a comment string in Readme about unzipping sownloaded genome fasta/GTF
( In addition, you should download genome fasta/GTF:
• human genome sequence ( ftp://ftp.ensembl.org/pub/release-80/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz )
• and GTF model ( ftp://ftp.ensembl.org/pub/release-80/gtf/homo_sapiens/Homo_sapiens.GRCh38.80.gtf.gz )
then GUNZIP them
and indicate path to the files in config.txt)
because python reports about utf-8 undecodable charachters are not very informative.
Last edit: avkitex 2015-11-03
And by the way there is no link to dbSNP database:
ftp://ftp.ncbi.nih.gov/snp/organisms/ find human_version and download All_(date).vcf
During vcf file reordering perl script consumed more than 57 Gb of memory and finally failed. I'll try to use common vcf file (it is 0,8 Gb in zipped instead of 3,0 Gb in All.vcf)
it is needed to install HTSeq-0.6.1 https://pypi.python.org/pypi/HTSeq and specify path to its HTSeq-0.6.1/scripts/htseq-count in config
or RuntimeErrors with java stack trace and FAILED will be outputed and PPline will stop
All this bugs are for PPLine.0.9.7.
TopHat uses system bowtie2 instead of the one located in Useful.stuf/ dir
[2015-11-21 21:48:06] Checking for Bowtie
Bowtie 2 not found, checking for older version..
Error: Bowtie not found on this system.
Last edit: avkitex 2015-11-23
Dear Nikita,
Thank you for the bug report! I have fixed all the mentioned bugz except the last one. PPLine overrides PATH environment variable each time when launched and includes folders like Useful.stuff/bowtie2-2.2.5 in the PATH. Outside PPLine process and childs (e.g. Tophat2 being launced by ppline), PATH is intact.
What OS do you use? I have tested PPLine at Ubuntu 14.04 as everythong worked fine here.
Other bugz were fixed (0.9.8)
Thank you for the improovements.
I'm running PPline on Ubuntu 14 (on rackspace)
I'll try the new version.
Here is a log file with some errors I faced with and also my installation log. I believe it could contribute to the README file
Last edit: avkitex 2015-11-24
Also it would be great idea to have some "steps" of the pipeline running.
It means that when some step failed we can restart PPline from this step next time (after fixing).
I've run PPline 3 times. 1st time it failed before tophut run (with error posted above about bowtie2 not installed)
2nd time it failed because i forgot to
chmod +x htseq-countAnd 3rd time because of GATK (see nohup.out, top post).
And for the 3rd time however i had all previous results, I had to wait for about 8 howrs (while full pipeline is doing the same)
Yes, that's a really good idea.
I'll try PPLine on a clean Ubuntu VM to uncover the bug with bowtie2. My ppline uses bowtie2 located in the Useful.stuff folder.
HTSeq count in included in the release (0.9.8) and should work now...
GATK failure is related to the inappropriate quality-coding. I'll add automatic quality rescoring for the input fasta files in the next release.
I'm not familiar with GATK.
I am testing PPline on SRR1609982.
I believe that it is coded with phred 33 (does GATK require other or am I talking about something absolutely different?)
What am I supposed to do if I wold like to finish PPline successfully?
Shall I somehow do quality rescoring manually or better wait for the next PPline release?
I will add the quality rescoring feature in the next release (when i'll have time - I hope in a few days)
PPLIne 0.9.8.1 is releleased
- Ppline automatically downloads all the database at the first run
- added quality score rescaling feature (Illumina.1.3/1.5/Solexa > Phred33/Sanger)
- now PPLine accepts SRA accessions (e.g. SRR1609982, etc.) and downloads SRA
- fixed bug with downloading incorrect dbSNP vcf
- fixed bug with HTSeq-count python2 location
- other bugs were fixed
Thank You for the great work!
My Ppline run has just finished succesfully!
I have some questions left:
1)Is there any way to find out information about fudion proteins from Ppline output (e.g. is tophut-fusion a part of Ppline?)
2) What is the best criteria of proteotypic peptides sorting? (There is a plenty of them. There are lots os different scores in the vcf/visual xls. How shall i sort this list if I'd like to get 10 best according to the result (no more other programms))
1) No, Tophat-fusion is not a part of PPline. I strongly recommend deFuse as it is more accurate than Tophat-fusion. https://bitbucket.org/dranew/defuse
2) Sorting proteotypic peptides is a dual task. First, SNP/SAP detection should be reliable. The main criteria here is quantity of reads with alternate allele ('Alt.HQ Reads' column in excel file) and Phred quality of SNP calling ('Phread qual' col). Otherwise, you can use the values in 'SAP score' col as a measure of its reliability.
Second, not all proteotypic peptides 'can fly'. Many of them are non-ionizable. It's very difficult to predict this.