Hello,
I am in process of analyzing miRNA-Seq data using miRPro evaluation version. And also I don't have all other software, esp., Novoalign, I am using miRPro modules and not the full version. So I want to first reproduce result using human miRNA-seq data used in miRPro paper.
First step in analyzing miRNA-Seq data is performing raw read quality filtering using FASTX-Toolkit to filter out reads with poor qualities using the following settings:
1) minimum quality score for each base = 20;
2) percent of bases that must have the minimum quality score ≤ 95%. ( version 0.0.14,http://hannonlab.cshl.edu/fastx_toolkit/index.html)
I used following command to perform quality filtering
fastq_quality_filter -i input -Q 33 -o output.fastq -v -q 20 -p 95
The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292, which was clean of adapter sequences.
I got following results
SampleId TotalReads TrimmedReads %OfGoodQualityReadsWithinTotalReads
SRR1542714 1866654 962422 51.56 %
SRR1542715 1842228 955859 51.89 %
SRR1542716 2777542 1976509 71.16 %
SRR1542717 1324705 318259 24.02 %
SRR1542718 3085962 1830745 59.32 %
SRR1542719 1937831 619794 31.98 %
This results are not consistent (and quite bit of variance) with result reported in miRPro paper.
So my question is "Is there any problem in running fastq_quality_filter with these parameter settings?". If not what should be reason I am not able to reproduce the result?
Sorry for asking really basic question. I am novice in this field and starting with something challenging.
Thanks,
Hardik
When the paper was published, miRPro had some bugs.
One bug is that we used "fastq_quality_filter -q 20 -p 5" in quality filtering by default.
In the current version, we have turned off quality filtering by default. If you add "-q 1" option in mirpro, 'fastq_quality_filter -q 20 -p 95' will be used.
So your results are correct here.
Thanks. Two questions
1) All the subsequent analysis of human miRNA data is after quality filtering using this criteria, right?
"fastq_quality_filter -q 20 -p 5"
2) 'fastq_quality_filter -q 20 -p 95' removes 40-70% reads. Compare to miRPro's data I got only only third of the differential expressed miRNA in control vs., ET-1. Seems like this step removes the good read as well! So do you think I should avoid this step?
1) yes
2) I suggest you skip this step.
Sent from my iPhone
Related
Tickets:
#7