Menu

#7 Raw Read Quality Filtering using FASTX-Toolkit of miRNA-Seq data

1.0
closed
nobody
None
2016-10-23
2016-08-11
Hardik Modi
No

Hello,

I am in process of analyzing miRNA-Seq data using miRPro evaluation version. And also I don't have all other software, esp., Novoalign, I am using miRPro modules and not the full version. So I want to first reproduce result using human miRNA-seq data used in miRPro paper.

First step in analyzing miRNA-Seq data is performing raw read quality filtering using FASTX-Toolkit to filter out reads with poor qualities using the following settings:
1) minimum quality score for each base = 20;
2) percent of bases that must have the minimum quality score ≤ 95%. ( version 0.0.14,http://hannonlab.cshl.edu/fastx_toolkit/index.html)

I used following command to perform quality filtering

fastq_quality_filter -i input -Q 33 -o output.fastq -v -q 20 -p 95

The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292, which was clean of adapter sequences.

I got following results

SampleId TotalReads TrimmedReads %OfGoodQualityReadsWithinTotalReads
SRR1542714 1866654 962422 51.56 %
SRR1542715 1842228 955859 51.89 %
SRR1542716 2777542 1976509 71.16 %
SRR1542717 1324705 318259 24.02 %
SRR1542718 3085962 1830745 59.32 %
SRR1542719 1937831 619794 31.98 %

This results are not consistent (and quite bit of variance) with result reported in miRPro paper.
So my question is "Is there any problem in running fastq_quality_filter with these parameter settings?". If not what should be reason I am not able to reproduce the result?

Sorry for asking really basic question. I am novice in this field and starting with something challenging.

Thanks,
Hardik

Related

Tickets: #7

Discussion

  • Jieming Shi

    Jieming Shi - 2016-08-12

    When the paper was published, miRPro had some bugs.
    One bug is that we used "fastq_quality_filter -q 20 -p 5" in quality filtering by default.

    In the current version, we have turned off quality filtering by default. If you add "-q 1" option in mirpro, 'fastq_quality_filter -q 20 -p 95' will be used.

    So your results are correct here.

     
  • Hardik Modi

    Hardik Modi - 2016-08-12

    Thanks. Two questions
    1) All the subsequent analysis of human miRNA data is after quality filtering using this criteria, right?
    "fastq_quality_filter -q 20 -p 5"
    2) 'fastq_quality_filter -q 20 -p 95' removes 40-70% reads. Compare to miRPro's data I got only only third of the differential expressed miRNA in control vs., ET-1. Seems like this step removes the good read as well! So do you think I should avoid this step?

     
    • Jieming Shi

      Jieming Shi - 2016-08-13

      1) yes
      2) I suggest you skip this step.

      Sent from my iPhone

      On Aug 12, 2016, at 6:28 PM, Hardik Modi modihardik@users.sf.net wrote:

      Thanks. Two questions
      1) All the subsequent analysis of human miRNA data is after quality filtering using this criteria, right?
      "fastq_quality_filter -q 20 -p 5"
      2) 'fastq_quality_filter -q 20 -p 95' removes 40-70% reads. Compare to miRPro's data I got only only third of the differential expressed miRNA in control vs., ET-1. Seems like this step removes the good read as well! So do you think I should avoid this step?

      [tickets:#7] Raw Read Quality Filtering using FASTX-Toolkit of miRNA-Seq data

      Status: open
      Milestone: 1.0
      Created: Thu Aug 11, 2016 10:08 PM UTC by Hardik Modi
      Last Updated: Fri Aug 12, 2016 04:38 PM UTC
      Owner: nobody

      Hello,

      I am in process of analyzing miRNA-Seq data using miRPro evaluation version. And also I don't have all other software, esp., Novoalign, I am using miRPro modules and not the full version. So I want to first reproduce result using human miRNA-seq data used in miRPro paper.

      First step in analyzing miRNA-Seq data is performing raw read quality filtering using FASTX-Toolkit to filter out reads with poor qualities using the following settings:
      1) minimum quality score for each base = 20;
      2) percent of bases that must have the minimum quality score ≤ 95%. ( version 0.0.14,http://hannonlab.cshl.edu/fastx_toolkit/index.html)

      I used following command to perform quality filtering

      fastq_quality_filter -i input -Q 33 -o output.fastq -v -q 20 -p 95

      The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292, which was clean of adapter sequences.

      I got following results

      SampleId TotalReads TrimmedReads %OfGoodQualityReadsWithinTotalReads
      SRR1542714 1866654 962422 51.56 %
      SRR1542715 1842228 955859 51.89 %
      SRR1542716 2777542 1976509 71.16 %
      SRR1542717 1324705 318259 24.02 %
      SRR1542718 3085962 1830745 59.32 %
      SRR1542719 1937831 619794 31.98 %

      This results are not consistent (and quite bit of variance) with result reported in miRPro paper.
      So my question is "Is there any problem in running fastq_quality_filter with these parameter settings?". If not what should be reason I am not able to reproduce the result?

      Sorry for asking really basic question. I am novice in this field and starting with something challenging.

      Thanks,
      Hardik

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/mirpro/tickets/7/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Tickets: #7

  • Jieming Shi

    Jieming Shi - 2016-10-23
    • status: open --> closed
     

Log in to post a comment.

MongoDB Logo MongoDB