|
From: Shawn Y. <yos...@gm...> - 2011-03-15 01:10:58
|
Hi Neel, To extract all unmapped reads from a sam file type 'samtools view -Shf 4 IN.sam > unaligned.sam' To get all aligned reads type 'samtools view -ShF 4 IN.sam > Aligned.sam'. Note in awk you can do "awk '$2==4' IN.sam > unAligned.sam" To get aligned reads do "awk '$2!=4' IN.sam > Aligned.sam". NOTE! the awk command only works for a FRAGMENT read data set NOT for paired-end. Use the samtools view command for paired-end (or mate-paired) data. Shawn P.S. to save space you may want to work in a stream and store things in .bam format. For example you can do the same above by typing "samtools view -hbf 4 IN.bam > unAligned.bam" or the awk command "samtools view -h IN.bam | awk '$1~"@" || $2==4' | samtools view -Shb - > unAligned.bam" Note the $1~"@" is there to make sure you keep the header of the sam file. On Mon, Mar 14, 2011 at 5:47 PM, Neel Aluru <na...@wh...> wrote: > Hello, > > This is my first post and I am a beginner to samtools. So, please bear with > me. > > 1. I am seeing a discrepancy in the number of unaligned reads I am getting > with two different methods (samtools flagstat and filtering with awk). > According to flagstat function, I have a total of 8187768 reads and 4553610 > are mapped. I presume the remaining 3634158 reads are unmapped. > > $ samtools flagstat MH_0001aligned_sorted.bam > 8187768 in total > 0 QC failure > 0 duplicates > 4553610 mapped (55.61%) > 0 paired in sequencing > 0 read1 > 0 read2 > 0 properly paired (nan%) > 0 with itself and mate mapped > 0 singletons (nan%) > 0 with mate mapped to a different chr > 0 with mate mapped to a different chr (mapQ>=5) > > 2. I want to write the unmapped reads to a new file, so I am using awk to > search and write them to a separate file ( I know you can do this with > samtools view function, but I am unable to do it, see below). That gives me > a total of 3633980 reads as unaligned. Somewhere along, I am loosing 178 > reads. I am not concerned about them but just for peace of mind, I want to > know where they have gone. > > $awk '($6 == "*")' MH_0001aligned.sam | wc -l > 3633980 > > With flagstat function, I used indexed and sorted bam file, whereas with > awk, I used aligned sam file. Could this explain the differences? > > 3. Also, I am having a hard time using options with samtools view for > filtering the reads. How can I filter the reads using -F option if I want to > filter unmapped reads? > I tried $ samtools view - F U MH_0001aligned_sorted.bam | wc -l and didn't > get any result. How do I specify it so that I can filter correctly. > > Any comments or suggestions to explain this will be highly appreciated. > > Thank you, > Neel > > Neel Aluru > Postdoctoral Scholar > Biology Department > Woods Hole Oceanographic Institution > Woods Hole, MA 02543 > USA > 508-289-3607 > > > > > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help > |