mirPRo / Tickets / #7 Raw Read Quality Filtering using FASTX-Toolkit of miRNA-Seq data

#7 Raw Read Quality Filtering using FASTX-Toolkit of miRNA-Seq data

Milestone: 1.0

Status: closed

Owner: nobody

Labels: None

Updated: 2016-10-23

Created: 2016-08-11

Creator: Hardik Modi

Private: No

Hello,

I am in process of analyzing miRNA-Seq data using miRPro evaluation version. And also I don't have all other software, esp., Novoalign, I am using miRPro modules and not the full version. So I want to first reproduce result using human miRNA-seq data used in miRPro paper.

First step in analyzing miRNA-Seq data is performing raw read quality filtering using FASTX-Toolkit to filter out reads with poor qualities using the following settings:
1) minimum quality score for each base = 20;
2) percent of bases that must have the minimum quality score ≤ 95%. ( version 0.0.14,http://hannonlab.cshl.edu/fastx_toolkit/index.html)

I used following command to perform quality filtering

fastq_quality_filter -i input -Q 33 -o output.fastq -v -q 20 -p 95

The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292, which was clean of adapter sequences.

I got following results

SampleId TotalReads TrimmedReads %OfGoodQualityReadsWithinTotalReads
SRR1542714 1866654 962422 51.56 %
SRR1542715 1842228 955859 51.89 %
SRR1542716 2777542 1976509 71.16 %
SRR1542717 1324705 318259 24.02 %
SRR1542718 3085962 1830745 59.32 %
SRR1542719 1937831 619794 31.98 %

This results are not consistent (and quite bit of variance) with result reported in miRPro paper.
So my question is "Is there any problem in running fastq_quality_filter with these parameter settings?". If not what should be reason I am not able to reproduce the result?

Sorry for asking really basic question. I am novice in this field and starting with something challenging.

Thanks,
Hardik

Jieming Shi - 2016-08-12

When the paper was published, miRPro had some bugs.
One bug is that we used "fastq_quality_filter -q 20 -p 5" in quality filtering by default.

In the current version, we have turned off quality filtering by default. If you add "-q 1" option in mirpro, 'fastq_quality_filter -q 20 -p 95' will be used.

So your results are correct here.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hardik Modi - 2016-08-12

Thanks. Two questions
1) All the subsequent analysis of human miRNA data is after quality filtering using this criteria, right?
"fastq_quality_filter -q 20 -p 5"
2) 'fastq_quality_filter -q 20 -p 95' removes 40-70% reads. Compare to miRPro's data I got only only third of the differential expressed miRNA in control vs., ET-1. Seems like this step removes the good read as well! So do you think I should avoid this step?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jieming Shi - 2016-08-13
  
  1) yes
  2) I suggest you skip this step.
  
  Sent from my iPhone
  
  On Aug 12, 2016, at 6:28 PM, Hardik Modi modihardik@users.sf.net wrote:
  
  Thanks. Two questions
  1) All the subsequent analysis of human miRNA data is after quality filtering using this criteria, right?
  "fastq_quality_filter -q 20 -p 5"
  2) 'fastq_quality_filter -q 20 -p 95' removes 40-70% reads. Compare to miRPro's data I got only only third of the differential expressed miRNA in control vs., ET-1. Seems like this step removes the good read as well! So do you think I should avoid this step?
  
  [tickets:#7] Raw Read Quality Filtering using FASTX-Toolkit of miRNA-Seq data
  
  Status: open
  Milestone: 1.0
  Created: Thu Aug 11, 2016 10:08 PM UTC by Hardik Modi
  Last Updated: Fri Aug 12, 2016 04:38 PM UTC
  Owner: nobody
  
  Hello,
  
  I am in process of analyzing miRNA-Seq data using miRPro evaluation version. And also I don't have all other software, esp., Novoalign, I am using miRPro modules and not the full version. So I want to first reproduce result using human miRNA-seq data used in miRPro paper.
  
  First step in analyzing miRNA-Seq data is performing raw read quality filtering using FASTX-Toolkit to filter out reads with poor qualities using the following settings:
  1) minimum quality score for each base = 20;
  2) percent of bases that must have the minimum quality score ≤ 95%. ( version 0.0.14,http://hannonlab.cshl.edu/fastx_toolkit/index.html)
  
  I used following command to perform quality filtering
  
  fastq_quality_filter -i input -Q 33 -o output.fastq -v -q 20 -p 95
  
  The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292, which was clean of adapter sequences.
  
  I got following results
  
  SampleId TotalReads TrimmedReads %OfGoodQualityReadsWithinTotalReads
  SRR1542714 1866654 962422 51.56 %
  SRR1542715 1842228 955859 51.89 %
  SRR1542716 2777542 1976509 71.16 %
  SRR1542717 1324705 318259 24.02 %
  SRR1542718 3085962 1830745 59.32 %
  SRR1542719 1937831 619794 31.98 %
  
  This results are not consistent (and quite bit of variance) with result reported in miRPro paper.
  So my question is "Is there any problem in running fastq_quality_filter with these parameter settings?". If not what should be reason I am not able to reproduce the result?
  
  Sorry for asking really basic question. I am novice in this field and starting with something challenging.
  
  Thanks,
  Hardik
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/mirpro/tickets/7/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  Related
  
  Tickets: #7
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jieming Shi - 2016-10-23

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Raw Read Quality Filtering using FASTX-Toolkit of miRNA-Seq data

Tool for miRNA-seq analysis in C++

Milestone

Searches

Help

#7 Raw Read Quality Filtering using FASTX-Toolkit of miRNA-Seq data

Related

Discussion

Related