Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Right-click on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
From: Hanspeter Niederstrasser <fink@sn...> - 2013-01-29 17:42:43
|
I have a set of 86 aligned bam files from an Illumina targeted resequencing run. Because the input samples were not clonal, we are interested in determining the frequency of bases at each position along the sequenced amplicon (~3kb). Basically, at the end of the day, I want to be able to generate a table that conceptually looks like this (pos 1 had 13 reads in the bam: 6 C and 7 T): Pos A C T G 1 0 6 7 0 2 0 0 0 12 3 10 1 1 0 ... Parsing the Illumina provided VCF files doesn't give me the coverage that I need since many of the variants are of low frequency (< 5%) and many positions have more than one variant. The closest I've been able to get is to modify this script <http://www.biostars.org/p/45384/> to work with mpileup output, and that gives me output like this: 311 A 40 312 T 39 C 1 313 C 29 T 7 but when I look at my .bam file in IGViewer, the number of reads actually present is about 2.5 times higher than what the script presents (eg 86 reads for position 313). Is there some software solution that already exists that would allow me individually read and tally bases along a bam? Or can someone point out why the above script is not giving the accurate counts for each position from the mpileup output? Thanks, Hanspeter |
From: Ryan Golhar <ngsbioinformatics@gm...> - 2013-01-30 21:44:03
Attachments:
Message as HTML
|
I have a java program that does exactly this. I'll email you off list with the program. On Tue, Jan 29, 2013 at 12:42 PM, Hanspeter Niederstrasser < fink@...> wrote: > I have a set of 86 aligned bam files from an Illumina targeted > resequencing run. Because the input samples were not clonal, we are > interested in determining the frequency of bases at each position along > the sequenced amplicon (~3kb). > > Basically, at the end of the day, I want to be able to generate a table > that conceptually looks like this (pos 1 had 13 reads in the bam: 6 C > and 7 T): > > Pos A C T G > 1 0 6 7 0 > 2 0 0 0 12 > 3 10 1 1 0 > ... > > Parsing the Illumina provided VCF files doesn't give me the coverage > that I need since many of the variants are of low frequency (< 5%) and > many positions have more than one variant. The closest I've been able > to get is to modify this script <http://www.biostars.org/p/45384/> to > work with mpileup output, and that gives me output like this: > > 311 A 40 > 312 T 39 C 1 > 313 C 29 T 7 > > but when I look at my .bam file in IGViewer, the number of reads > actually present is about 2.5 times higher than what the script presents > (eg 86 reads for position 313). > > Is there some software solution that already exists that would allow me > individually read and tally bases along a bam? Or can someone point out > why the above script is not giving the accurate counts for each position > from the mpileup output? > > Thanks, > > Hanspeter > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnnow-d2d > _______________________________________________ > Samtools-help mailing list > Samtools-help@... > https://lists.sourceforge.net/lists/listinfo/samtools-help > |
From: Tychele Turner <tturne18@jh...> - 2013-01-31 01:25:49
|
GATK also has a tool that will do this. It's called DepthOfCoverage. http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthOfCoverage.html If you use the --printBaseCounts option it should also give you the count of A, C, T, and G at every position and in every individual. There are other options as well so you might also want to look in those. Hope this helps. Cheers, Tychele ________________________________________ From: Hanspeter Niederstrasser [fink@...] Sent: Tuesday, January 29, 2013 12:42 PM To: samtools-help@... Subject: [Samtools-help] retrieving base frequencies at specific position I have a set of 86 aligned bam files from an Illumina targeted resequencing run. Because the input samples were not clonal, we are interested in determining the frequency of bases at each position along the sequenced amplicon (~3kb). Basically, at the end of the day, I want to be able to generate a table that conceptually looks like this (pos 1 had 13 reads in the bam: 6 C and 7 T): Pos A C T G 1 0 6 7 0 2 0 0 0 12 3 10 1 1 0 ... Parsing the Illumina provided VCF files doesn't give me the coverage that I need since many of the variants are of low frequency (< 5%) and many positions have more than one variant. The closest I've been able to get is to modify this script <http://www.biostars.org/p/45384/> to work with mpileup output, and that gives me output like this: 311 A 40 312 T 39 C 1 313 C 29 T 7 but when I look at my .bam file in IGViewer, the number of reads actually present is about 2.5 times higher than what the script presents (eg 86 reads for position 313). Is there some software solution that already exists that would allow me individually read and tally bases along a bam? Or can someone point out why the above script is not giving the accurate counts for each position from the mpileup output? Thanks, Hanspeter ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d _______________________________________________ Samtools-help mailing list Samtools-help@... https://lists.sourceforge.net/lists/listinfo/samtools-help |
From: Louis Letourneau <louis.letourneau@ma...> - 2013-01-31 02:19:45
|
The problem is that DepthOfCoverage is deprecated. http://gatkforums.broadinstitute.org/discussion/40/depthofcoverage-v3-0-how-much-data-do-i-have "Please note that the DepthOfCoverage tool is going to be retired in the upcoming 2.4 release." And DiagnoseTargets doesn't do individual bases, or I missed an option. Like Ryan, we also implemented our own program to extract bases and indels at given positions (can be filtered by some criterias, mapQ and such). We also use this on a set of common snps in human to fingerprint sequencing lanes for mixups. It works very well. mpileup could be used to do the same at given positions see option '-l': -l FILE list of positions (chr pos) or regions (BED) [null] Maybe in your case the bases are excluded by BAQ (disable with -B) Or anomalous read pairs (-A count anomalous read pairs) Or Base quality (-Q INT) Or Maybe max depth (-d INT) but it doesn't seem like it Louis On 1/30/2013 8:25 PM, Tychele Turner wrote: > GATK also has a tool that will do this. It's called DepthOfCoverage. > > http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthOfCoverage.html > > If you use the --printBaseCounts option it should also give you the count of A, C, T, and G at every position and in every individual. There are other options as well so you might also want to look in those. Hope this helps. > > Cheers, > > Tychele > ________________________________________ > From: Hanspeter Niederstrasser [fink@...] > Sent: Tuesday, January 29, 2013 12:42 PM > To: samtools-help@... > Subject: [Samtools-help] retrieving base frequencies at specific position > > I have a set of 86 aligned bam files from an Illumina targeted > resequencing run. Because the input samples were not clonal, we are > interested in determining the frequency of bases at each position along > the sequenced amplicon (~3kb). > > Basically, at the end of the day, I want to be able to generate a table > that conceptually looks like this (pos 1 had 13 reads in the bam: 6 C > and 7 T): > > Pos A C T G > 1 0 6 7 0 > 2 0 0 0 12 > 3 10 1 1 0 > ... > > Parsing the Illumina provided VCF files doesn't give me the coverage > that I need since many of the variants are of low frequency (< 5%) and > many positions have more than one variant. The closest I've been able > to get is to modify this script <http://www.biostars.org/p/45384/> to > work with mpileup output, and that gives me output like this: > > 311 A 40 > 312 T 39 C 1 > 313 C 29 T 7 > > but when I look at my .bam file in IGViewer, the number of reads > actually present is about 2.5 times higher than what the script presents > (eg 86 reads for position 313). > > Is there some software solution that already exists that would allow me > individually read and tally bases along a bam? Or can someone point out > why the above script is not giving the accurate counts for each position > from the mpileup output? > > Thanks, > > Hanspeter > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnnow-d2d > _______________________________________________ > Samtools-help mailing list > Samtools-help@... > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > _______________________________________________ > Samtools-help mailing list > Samtools-help@... > https://lists.sourceforge.net/lists/listinfo/samtools-help > |
From: Hideo Imamura <hi1@sa...> - 2013-01-31 10:31:42
|
Hi, Well this might be already resolved but I have scripts that would provide allele frequency of all bases and more. It provides: weighted allele frequency 4 bases like (A C G T)=(0 0 0.7 0.3). raw base depth for (A C G T) raw base depth for (A C G T) forward raw base depth for (A C G T) reverse strand bias info pileup SNP info This requires samtools 0.1.16. and you need to specify its file location in scripts. They are written in perl and use samtools 0.1.16 So it may be outdated but it's nice to have such kind of information. It will skip indels, I think. I described them in my blog. There are two versions: one takes a list of individual position and another one takes ranges in BED format. http://antwerpbioinfo.blogspot.be/2013/01/task-this-script-will-print-out-allele.html http://antwerpbioinfo.blogspot.be/2013/01/print-allele-frequency-from-bam-with.html Hope some one will find them useful. best, Hideo PS. I did not cleanup my scripts and I will clean them up if needed. On 31 Jan 2013, at 03:19, Louis Letourneau wrote: > The problem is that DepthOfCoverage is deprecated. > http://gatkforums.broadinstitute.org/discussion/40/depthofcoverage-v3-0-how-much-data-do-i-have > "Please note that the DepthOfCoverage tool is going to be retired in the > upcoming 2.4 release." > > And DiagnoseTargets doesn't do individual bases, or I missed an option. > > Like Ryan, we also implemented our own program to extract bases and > indels at given positions (can be filtered by some criterias, mapQ and > such). > We also use this on a set of common snps in human to fingerprint > sequencing lanes for mixups. > > It works very well. > > mpileup could be used to do the same at given positions see option '-l': > -l FILE list of positions (chr pos) or regions (BED) [null] > > Maybe in your case the bases are excluded by BAQ (disable with -B) > Or anomalous read pairs (-A count anomalous read pairs) > Or Base quality (-Q INT) > Or Maybe max depth (-d INT) but it doesn't seem like it > > Louis > > On 1/30/2013 8:25 PM, Tychele Turner wrote: >> GATK also has a tool that will do this. It's called DepthOfCoverage. >> >> http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthOfCoverage.html >> >> If you use the --printBaseCounts option it should also give you the count of A, C, T, and G at every position and in every individual. There are other options as well so you might also want to look in those. Hope this helps. >> >> Cheers, >> >> Tychele >> ________________________________________ >> From: Hanspeter Niederstrasser [fink@...] >> Sent: Tuesday, January 29, 2013 12:42 PM >> To: samtools-help@... >> Subject: [Samtools-help] retrieving base frequencies at specific position >> >> I have a set of 86 aligned bam files from an Illumina targeted >> resequencing run. Because the input samples were not clonal, we are >> interested in determining the frequency of bases at each position along >> the sequenced amplicon (~3kb). >> >> Basically, at the end of the day, I want to be able to generate a table >> that conceptually looks like this (pos 1 had 13 reads in the bam: 6 C >> and 7 T): >> >> Pos A C T G >> 1 0 6 7 0 >> 2 0 0 0 12 >> 3 10 1 1 0 >> ... >> >> Parsing the Illumina provided VCF files doesn't give me the coverage >> that I need since many of the variants are of low frequency (< 5%) and >> many positions have more than one variant. The closest I've been able >> to get is to modify this script <http://www.biostars.org/p/45384/> to >> work with mpileup output, and that gives me output like this: >> >> 311 A 40 >> 312 T 39 C 1 >> 313 C 29 T 7 >> >> but when I look at my .bam file in IGViewer, the number of reads >> actually present is about 2.5 times higher than what the script presents >> (eg 86 reads for position 313). >> >> Is there some software solution that already exists that would allow me >> individually read and tally bases along a bam? Or can someone point out >> why the above script is not giving the accurate counts for each position >> from the mpileup output? >> >> Thanks, >> >> Hanspeter >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. ON SALE this month only -- learn more at: >> http://p.sf.net/sfu/learnnow-d2d >> _______________________________________________ >> Samtools-help mailing list >> Samtools-help@... >> https://lists.sourceforge.net/lists/listinfo/samtools-help >> >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_jan >> _______________________________________________ >> Samtools-help mailing list >> Samtools-help@... >> https://lists.sourceforge.net/lists/listinfo/samtools-help >> > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_jan > _______________________________________________ > Samtools-help mailing list > Samtools-help@... > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |