Brian, I'm having some trouble with the 'filterbyname' function not removing any reads.
The line of code I am running:
filterbyname.sh in=$inputdir/read1.fastq in2=$inputdir/read2.fastq out=$outdir/read1filtered.fastq out2=$outdir/read2filtered.fastq names=$inputdir/FilterReads.txt
The 'names' file is a txt file with 1x readname per line eg:
@D00580:D00580:AC58BGANXX:2:1211:12654:10833
@D00580:D00580:AC58BGANXX:2:2309:12469:49507
The function seems to be running fine, and I can definitely locate these names in my original read1 and read2.fastq files, but they are not being removed from the filtered file. The 'Reads Processed' = 'Reads Out' and I can still grep these readnames in the filtered fastq files. Can you please let me know where I might be going wrong?
Thank you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
"filterbyname" by default requires an exact name match. In fastq format, this line:
@D00580:D00580:AC58BGANXX:2:1211:12654:10833
...indicates a read named:
D00580:D00580:AC58BGANXX:2:1211:12654:10833
In other words, the @ symbol is not part of the name. I will probably in the future allow a leading @ or > symbol as that seems to be a common use-case.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Brian, I'm having some trouble with the 'filterbyname' function not removing any reads.
The line of code I am running:
filterbyname.sh in=$inputdir/read1.fastq in2=$inputdir/read2.fastq out=$outdir/read1filtered.fastq out2=$outdir/read2filtered.fastq names=$inputdir/FilterReads.txt
The 'names' file is a txt file with 1x readname per line eg:
@D00580:D00580:AC58BGANXX:2:1211:12654:10833
@D00580:D00580:AC58BGANXX:2:2309:12469:49507
The function seems to be running fine, and I can definitely locate these names in my original read1 and read2.fastq files, but they are not being removed from the filtered file. The 'Reads Processed' = 'Reads Out' and I can still grep these readnames in the filtered fastq files. Can you please let me know where I might be going wrong?
Thank you.
"filterbyname" by default requires an exact name match. In fastq format, this line:
@D00580:D00580:AC58BGANXX:2:1211:12654:10833
...indicates a read named:
D00580:D00580:AC58BGANXX:2:1211:12654:10833
In other words, the @ symbol is not part of the name. I will probably in the future allow a leading @ or > symbol as that seems to be a common use-case.