|
From: Hanspeter N. <fi...@sn...> - 2013-01-29 17:42:43
|
I have a set of 86 aligned bam files from an Illumina targeted resequencing run. Because the input samples were not clonal, we are interested in determining the frequency of bases at each position along the sequenced amplicon (~3kb). Basically, at the end of the day, I want to be able to generate a table that conceptually looks like this (pos 1 had 13 reads in the bam: 6 C and 7 T): Pos A C T G 1 0 6 7 0 2 0 0 0 12 3 10 1 1 0 ... Parsing the Illumina provided VCF files doesn't give me the coverage that I need since many of the variants are of low frequency (< 5%) and many positions have more than one variant. The closest I've been able to get is to modify this script <http://www.biostars.org/p/45384/> to work with mpileup output, and that gives me output like this: 311 A 40 312 T 39 C 1 313 C 29 T 7 but when I look at my .bam file in IGViewer, the number of reads actually present is about 2.5 times higher than what the script presents (eg 86 reads for position 313). Is there some software solution that already exists that would allow me individually read and tally bases along a bam? Or can someone point out why the above script is not giving the accurate counts for each position from the mpileup output? Thanks, Hanspeter |