Quality filtering on minimum average quality

BBMap short read aligner, and other bioinformatic tools.

Brought to you by: brian-jgi

Quality filtering on minimum average quality

Forum: General Discussion

Created: 2022-09-16

Updated: 2022-09-21

Jake - 2022-09-16

I've been using bbduk for quality filtering and I pretty much just took it for granted. However, I've been looking at it as I usually use minavequality=15 which removes about 15-20% of our reads. minavequality=30 removes almost 100% of them.

However when I do the quality averaging by hand, almost all of my reads have an average quality above 30. I've been taking the ASCII values of the quality scores and subtracting 33, summing them up and dividing by the length.

How is bbduk computing the average quality?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jake - 2022-09-21

Seems like it is done with

-10*math.log(sum([10**(-(ord(c)-33)/10) for c in line4])/len(line4),10)

rather than just

sum([ord(c)-33 for c in line4])/len(line4)

So, its the quality score of the average probability.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.