Originally created by: nyoun... (code.google.com)@gmail.com
I found a bug with the 'get_phred64' subroutine.
If a 'high quality' read in Phred+33 fastq format is provided (i.e. all bases in read have ACSII 64-74), then the subroutine defines the fastq format as Phred+64. I made the following changes to the subroutine at it appears to call the correct format in all cases that I tested:
sub get_phred64{
my $tail_file = shift;
my $qline = `head -n 1000 $tail_file | tail -n 1`;
chomp $qline;
my $phred64 = 1;
my($b64, $a74) = (0,0);
for my $q (split(//,$qline)){
$b64++ if ord($q) < 64;
$a74++ if ord($q) > 74;
#$phred64 = 0;
#last;
}
if($b64 > 0 && $a74 > 0){ die " ERROR: found qual scores specific to Phred+33 & Phred+64 format"; }
elsif($b64 > 0 && $a74 == 0){ $phred64 = 0; } # if Phred+33
elsif($a74 > 0 && $b64 == 0){ $phred64 = 1; } # if Phred+64
elsif($b64 == 0 && $a74 == 0){ $phred64 = 0; } # if all between 64 to 73, assuming high Phred+33 scores
else{ die " Logic error\n", $!; }
return $phred64;
}
Nick
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: senanu.p... (code.google.com)@gmail.com
Another way to do this (that is not mutually exclusive) would be to evaluate more than one short read. This should also help avoid the assumption of high Phred+33 score in the last elsif line in Nicks solution:
sub get_phred64{
my $tail_file = shift;
my $qline = `head -n 1000 $tail_file | tail -n 1`;
chomp $qline;
my $qline2 = `head -n 2000 $tail_file | tail -n 1`;
chomp $qline2;
$qline .= $qline2;
etc. etc...
Senanu