From: Pall I. O. <pal...@eb...> - 2012-01-26 12:46:10
|
Hi, I am doing a raw pileup of reads to take a look at the underlying alleles and I noticed that while samtools mpileup, piped through bcftools, collates the alleles on a per-sample basis, the "old style" mpileup (without -g/-u) does not: samtools mpileup -f ref.fa -r S00039:6188031-6188031 -b bampaths [mpileup] 20 samples in 24 input files <mpileup> Set max per-file depth to 400 S00039 6188031 a 8 ,.,,.,,. ;<<<=<<= 3 GGG :#; 7 .g,G.,g :9;:;;9 5 ,$..gg 8;;95 0 * * 4 .,,. 789: 4 Gg., 9798 1 G 8 1 G 9 5 ...,, :;<:9 4 ,.,, :<:; 5 GGggG<9989 2 gg 98 9 gGGGgGG.G 99;<:;;<; 9 ggg.Gg,GG :88<;9;;; 7 gG.gGG. 9;;:;<< 11 ..GgG,..gg^]. :;<9:;;<:5: 5 G.,g, ;:::8 2 ,G :; 2 .g 78 5 ggGGg 99;;9 5 ...,, :;;89 5 ,,... 9:;:; 6 G,GGg, :9;;9: so 24*3 output columns, 3 per file, and NOT per sample, even though mpileup obviously groks that there are only 20 samples in those 24 files. Am I missing something basic here? If not I would call this a bug as the man page states that "Alignment records are grouped by sample identifiers in @RG header lines." Any ideas for other ways of examining the underlying alleles, per sample? The quick fix here would be to merge the bam files, but thats not as elegant as Id hope to keep the data organized. cheers Pall -- ---------------------------------------------------------- Pall Isolfur Olason, PhD pal...@eb... Bioinformatics researcher / SNIC-UPPMAX application expert Evolutionsbiologiskt Centrum Uppsala Universitet Norbyv. 18D tlf: 070-949 8104 75236 Uppsala fax: 018-471 6310 ---------------------------------------------------------- |