|
From: Neel A. <na...@wh...> - 2011-03-15 00:47:58
|
Hello, This is my first post and I am a beginner to samtools. So, please bear with me. 1. I am seeing a discrepancy in the number of unaligned reads I am getting with two different methods (samtools flagstat and filtering with awk). According to flagstat function, I have a total of 8187768 reads and 4553610 are mapped. I presume the remaining 3634158 reads are unmapped. $ samtools flagstat MH_0001aligned_sorted.bam 8187768 in total 0 QC failure 0 duplicates 4553610 mapped (55.61%) 0 paired in sequencing 0 read1 0 read2 0 properly paired (nan%) 0 with itself and mate mapped 0 singletons (nan%) 0 with mate mapped to a different chr 0 with mate mapped to a different chr (mapQ>=5) 2. I want to write the unmapped reads to a new file, so I am using awk to search and write them to a separate file ( I know you can do this with samtools view function, but I am unable to do it, see below). That gives me a total of 3633980 reads as unaligned. Somewhere along, I am loosing 178 reads. I am not concerned about them but just for peace of mind, I want to know where they have gone. $awk '($6 == "*")' MH_0001aligned.sam | wc -l 3633980 With flagstat function, I used indexed and sorted bam file, whereas with awk, I used aligned sam file. Could this explain the differences? 3. Also, I am having a hard time using options with samtools view for filtering the reads. How can I filter the reads using -F option if I want to filter unmapped reads? I tried $ samtools view - F U MH_0001aligned_sorted.bam | wc -l and didn't get any result. How do I specify it so that I can filter correctly. Any comments or suggestions to explain this will be highly appreciated. Thank you, Neel Neel Aluru Postdoctoral Scholar Biology Department Woods Hole Oceanographic Institution Woods Hole, MA 02543 USA 508-289-3607 |