Menu

#62 bbduk.sh with maq option and outs keeps low-quality mates

1.0
open
nobody
None
2023-10-31
2023-10-31
C.S.
No

Version 39.03

I noticed that the maq option preserves low quality reads when orphan reads/singletons are written to outs. In the below example, both reads are below my filter maq value and are discarded, but then read1/2 is written to the singleton channel.
I think this also happens with low entropy discards.

Example data:
test_R1.fq:

@read1/1 AGGGATGGAATAAATTCGGAAAAGATGGAGAAGATGTTGTAATTCCTATTCCTCCTGGAACTACAATTCGTGATGCAGAAACAAATGAATTAATTCATGATTTTACTACTGAATCTGAAAATGAAATGTTTACATTTTTAGAAGGTGGCA + AAAAAEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEEEEAEEEEEAAEEEAEEEAA/AEEEE<AA/EEE/</6<///<A///

test_R2.fq:

@read1/2 TGGCGCAATCTTAGGACGAGCATTAGTAAAAGCAGTTAATAAAGAAGATTTTCCAGCATTAGGAAAGCCAACTAAACCAACATCAGCCATAATGCTTAATTCAAGTTTTAATAAACGAACTTCACCAGGTTTTCAATCATGAGCATATCT + AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEAEEE6EE/EEEEAAAEEAAEEEAEEEEEEE/EEEEAEEEAEEEEEAEEEE<AAEEA<<AEAEEEAE<AAAEEEEAEEEEEEEAE//A<EAEEEEE/AEEEEA<//<AAA<6AAE/EEEEE

Avq quality according to the formula in the bbduk source code:
read1/1 = 18.97, read 1/2 = 19.925

bbduk.sh in=test_R1.fq in2=test_R2.fq out=test_clean_R1.fq out2=test_clean_R2.fq outs=test_orphans.fq maq=25

=> read 1/2 is written to outs instead of being discarded.

When redirecting outs to stdout and sending it through another bbduk run, the maq filter of the downstream process does not catch the low-quality read.

bbduk.sh in=test_R1.fq in2=test_R2.fq out=test_clean_R1.fq out2=test_clean_R2.fq outs=stdout.fq maq=25 | bbduk.sh in=stdin.fq out=test_orphans.fq maq=25 int=f

=> read 1/2 is written to out instead of being discarded

Not sure if this behaviour is intended. Unfortunately, it makes bbduk, which otherwise is my absolute favourite tool, unusable for quality filtering.

Thanks,
Christian

Discussion


Log in to post a comment.