ngopt / Tickets / #5 sub get

#5 sub get_phred64 bug

Status: New

Owner: nobody

Labels: None

Priority: Medium

Type: Defect

Updated: 2013-01-03

Created: 2012-10-31

Creator: Anonymous

Private: No

Originally created by: nyoun... (code.google.com)@gmail.com

I found a bug with the 'get_phred64' subroutine.
If a 'high quality' read in Phred+33 fastq format is provided (i.e. all bases in read have ACSII 64-74), then the subroutine defines the fastq format as Phred+64. I made the following changes to the subroutine at it appears to call the correct format in all cases that I tested:

sub get_phred64{
        my $tail_file = shift;
        my $qline = `head -n 1000 $tail_file | tail -n 1`;
        chomp $qline;
        my $phred64 = 1;
        my($b64, $a74) = (0,0);
        for my $q (split(//,$qline)){
                $b64++ if ord($q) < 64;
                $a74++ if ord($q) > 74;
                #$phred64 = 0;
                #last;
        }
        if($b64 > 0 && $a74 > 0){ die " ERROR: found qual scores specific to Phred+33 & Phred+64 format"; }
        elsif($b64 > 0 && $a74 == 0){ $phred64 = 0; } # if Phred+33
        elsif($a74 > 0 && $b64 == 0){ $phred64 = 1; } # if Phred+64
        elsif($b64 == 0 && $a74 == 0){ $phred64 = 0; } # if all between 64 to 73, assuming high Phred+33 scores
        else{ die " Logic error\n", $!; }

        return $phred64;
}

Nick

Discussion

Comment has been marked as spam.
Undo

View and moderate all "tickets Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Tickets"

Anonymous - 2013-01-03

Originally posted by: senanu.p... (code.google.com)@gmail.com

Another way to do this (that is not mutually exclusive) would be to evaluate more than one short read. This should also help avoid the assumption of high Phred+33 score in the last elsif line in Nicks solution:

sub get_phred64{
    my $tail_file = shift;
    my $qline = `head -n 1000 $tail_file | tail -n 1`;
    chomp $qline;
    my $qline2 = `head -n 2000 $tail_file | tail -n 1`;
    chomp $qline2;
    $qline .= $qline2;
etc. etc...

Senanu

*Originally posted by:* [senanu.p...@gmail.com](http://code.google.com/u/109272908294027843491/) Another way to do this $that is not mutually exclusive$ would be to evaluate more than one short read. This should also help avoid the assumption of high Phred+33 score in the last elsif line in Nicks solution: sub get\_phred64\{     my $tail\_file = shift;     my $qline = \`head -n 1000 $tail\_file | tail -n 1\`;     chomp $qline;     my $qline2 = \`head -n 2000 $tail\_file | tail -n 1\`;     chomp $qline2;     $qline .= $qline2; etc. etc... Senanu

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sub get_phred64 bug

de novo assembly & analysis of Illumina sequence data

Searches

Help

#5 sub get_phred64 bug

Discussion