|
From: Neel A. <na...@wh...> - 2011-03-15 01:56:18
|
Thank you, Shawn. I really appreciate it. Now i understand how to samtools view options work. Neel On Mar 14, 2011, at 9:10 PM, Shawn Yost wrote: > Hi Neel, > To extract all unmapped reads from a sam file type 'samtools view -Shf 4 IN.sam > unaligned.sam' To get all aligned reads type 'samtools view -ShF 4 IN.sam > Aligned.sam'. Note in awk you can do "awk '$2==4' IN.sam > unAligned.sam" To get aligned reads do "awk '$2!=4' IN.sam > Aligned.sam". NOTE! the awk command only works for a FRAGMENT read data set NOT for paired-end. Use the samtools view command for paired-end (or mate-paired) data. > > Shawn > > P.S. to save space you may want to work in a stream and store things in .bam format. For example you can do the same above by typing "samtools view -hbf 4 IN.bam > unAligned.bam" or the awk command "samtools view -h IN.bam | awk '$1~"@" || $2==4' | samtools view -Shb - > unAligned.bam" Note the $1~"@" is there to make sure you keep the header of the sam file. > > On Mon, Mar 14, 2011 at 5:47 PM, Neel Aluru <na...@wh...> wrote: > Hello, > > This is my first post and I am a beginner to samtools. So, please bear with me. > > 1. I am seeing a discrepancy in the number of unaligned reads I am getting with two different methods (samtools flagstat and filtering with awk). According to flagstat function, I have a total of 8187768 reads and 4553610 are mapped. I presume the remaining 3634158 reads are unmapped. > > $ samtools flagstat MH_0001aligned_sorted.bam > 8187768 in total > 0 QC failure > 0 duplicates > 4553610 mapped (55.61%) > 0 paired in sequencing > 0 read1 > 0 read2 > 0 properly paired (nan%) > 0 with itself and mate mapped > 0 singletons (nan%) > 0 with mate mapped to a different chr > 0 with mate mapped to a different chr (mapQ>=5) > > 2. I want to write the unmapped reads to a new file, so I am using awk to search and write them to a separate file ( I know you can do this with samtools view function, but I am unable to do it, see below). That gives me a total of 3633980 reads as unaligned. Somewhere along, I am loosing 178 reads. I am not concerned about them but just for peace of mind, I want to know where they have gone. > > $awk '($6 == "*")' MH_0001aligned.sam | wc -l > 3633980 > > With flagstat function, I used indexed and sorted bam file, whereas with awk, I used aligned sam file. Could this explain the differences? > > 3. Also, I am having a hard time using options with samtools view for filtering the reads. How can I filter the reads using -F option if I want to filter unmapped reads? > I tried $ samtools view - F U MH_0001aligned_sorted.bam | wc -l and didn't get any result. How do I specify it so that I can filter correctly. > > Any comments or suggestions to explain this will be highly appreciated. > > Thank you, > Neel > > Neel Aluru > Postdoctoral Scholar > Biology Department > Woods Hole Oceanographic Institution > Woods Hole, MA 02543 > USA > 508-289-3607 > > > > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help > Neel Aluru Visiting Investigator Biology Department Redfield Building 3-04 Woods Hole Oceanographic Institution 45, Water Street Woods Hole, MA na...@wh... 508-289-3607 |