|
From: Lincoln S. <lin...@gm...> - 2012-04-19 17:46:08
|
Hi Gregor, BigWig performs a rolling mean and max across each bin. The BAM counting simply counts the number of reads in the bin and assigns it to the beginning of the bin. So there is going to be a subtle offset issue when you're looking at regions in which the bin size is greater than one pixel wide. The other issue is that BigWig computes the mean, max and standard deviation. BAM is giving the mean only. I will think about how to make the BAM counting more consistent with BigWig. Lincoln On Thu, Apr 19, 2012 at 1:33 PM, Gregor Rot <gre...@fr...>wrote: > Forgot the image (attached). > > Sorry for so many e-mails, today is my Gbrowse day:) > > Gregor > > > On 4/19/2012 7:33 PM, Gregor Rot wrote: > >> Hi all, >> >> why is the whisker plot cut at the top (yellow)? (max value is 12930). >> >> Tim: forgot to mention, in generating the bigWig i used position=span >> (my data is not paired, no gapped alignments). >> >> Thanks, >> Gregor >> >> On 4/19/2012 7:11 PM, Gregor Rot wrote: >> >>> The correct values are: >>> >>> 1024 image size, 100nt region, window size = 1nt >>> 1024 image size, 1024KB region, window size = 1000nt >>> >>> Thanks, >>> Gregor >>> >>> On 4/19/2012 7:06 PM, Gregor Rot wrote: >>> >>>> Hi Tim, >>>> >>>> thank you for the explanation, very nice. >>>> >>>> One more thing: if you look at my picture: i have the image width set to >>>> 1024 and am viewing a region of 100bp. This would set the "window" to 1, >>>> since the region< image width. >>>> >>>> The red color is bigWig adapter. Since window size = 1, this shows >>>> coverage data nicely. Mean = sum (1 value per window). Indeed, it is >>>> showing 12930 as max value and this is the maximum value present in the >>>> bam coverage, i double checked this with my script. >>>> >>>> Is this interpretation correct? >>>> >>>> Now, if i zoom out and view a 10024KB region with the image size of >>>> 1024, the window size would be 1000nt now, so each pixel represents a >>>> window of 1024nt. The bigWig adapter would look for coverage inside the >>>> window (at each nucleotide) and return the mean value (out of 1000 >>>> coverage values) for the window. Is this so? >>>> >>>> Another puzzling thing is the bam (blue color) graph. Still using glyph >>>> xyplot, but it only shows 8008 as the max value. But from your >>>> explanation, this one should be showing 12930 also? >>>> >>>> The thing is it would be really terrific to have the bigWig adapter >>>> return the sum for each window instead of mean. How difficult would that >>>> be? I can try to implement this, but i need some help for where to look. >>>> >>>> Thanks, >>>> Gregor >>>> >>>> On 4/19/2012 5:56 PM, Timothy Parnell wrote: >>>> >>>>> Hi, >>>>> Not sure if I can adequately explain all, but here goes >>>>> >>>>> The coverage method from Bio::DB::Sam returns a simple count of all >>>>> alignments covering each "window" or pixel in the current view. >>>>> Depending >>>>> on the zoom factor of the current view, each window could represent >>>>> anything from 1 bp to the current view size divided by panel size in >>>>> pixels (10 Mb / 1024 pixels = 9766 bp per pixel). >>>>> >>>>> The summary method from Bio::DB::BigWig returns some statistics for >>>>> each >>>>> window, including count, sum, etc. The wiggle_xyplot glyph by default >>>>> shows the mean value for each window, which is ok for something like >>>>> microarray probe values but not good for displaying sequence coverage >>>>> (sum >>>>> would be better). I've been meaning to dig into the code and suggest a >>>>> patch to Lincoln to change this behavior. >>>>> >>>>> The wiggle_whiskers glyph displays the mean, standard deviation, and >>>>> max >>>>> values for each window, denoting each as a different color. Hence, it >>>>> only >>>>> works with BigWigs that return window statistics. This is a little >>>>> better. >>>>> >>>>> One way to ameliorate this problem is to convert the bam into binned >>>>> coverage (100, 500, or 1000 bp) wig files. My bam2wig.pl will now do >>>>> this. >>>>> >>>>> I'm not sure about the difference between bam and bigwig when zoomed >>>>> in at >>>>> base pair level. My bam2wig.pl script can count alignments in a >>>>> number of >>>>> different ways: at the start position, mid position, or each alignment >>>>> base. It may or may not count gaps, depending on whether splices are >>>>> enabled, and can skip low-scoring alignments. The Bam coverage method >>>>> very >>>>> likely does not count gaps, and will count regardless of score. It's >>>>> hard >>>>> to say without careful accounting of each alignment at each base >>>>> pair and >>>>> comparing methods. >>>>> >>>>> I hope that answers at least some of your concerns, >>>>> Tim >>>>> >>>>> >>>>> On 4/19/12 7:54 AM, "Gregor Rot"<gre...@fr...> wrote: >>>>> >>>>> Hi all, >>>>>> >>>>>> i have 3 simple questions: >>>>>> >>>>>> a) i converted my bam to bigWig (each aligned read contributes +1 at >>>>>> each position spanning the read). I am now comparing the bigWig track >>>>>> with the coverage bam track, the database definitions are: >>>>>> >>>>>> [bigwig_db:database] >>>>>> db_adaptor = Bio::DB::BigWigSet >>>>>> db_args = -dir /big_wig_folder >>>>>> >>>>>> [bam_db:database] >>>>>> db_adaptor = Bio::DB::Sam >>>>>> db_args = -bam /path_to_bam_file >>>>>> >>>>>> and i am using: >>>>>> >>>>>> glypx = wiggle_xyplot >>>>>> >>>>>> For feature i am using "coverage" for the bam track and "summary" for >>>>>> the bigWig track. What is the difference? >>>>>> >>>>>> If you look at figure bam_vs_bigwig_coverage.png, you will see that >>>>>> coverage at centre is 27 for bigWig (bottom) and 35 for the bam track >>>>>> (top). I checked the sam file and the correct coverage is 27 >>>>>> (bigWig), i >>>>>> don't know how the bam coverage is computed? >>>>>> >>>>>> b) if i zoom out to chr1 (scaling is set to local min/max), you see >>>>>> the >>>>>> result in figure scaling_1.png. The selected region has the highest >>>>>> peak >>>>>> (8500), but you can see other higher regions in the bam coverage >>>>>> track. >>>>>> Why? Also the y-axis on this track now shows only 347, but the bigWig >>>>>> track correctly shows 8554. >>>>>> >>>>>> c) If you look at figure whiskers.png, i am using the whiskers >>>>>> glyph for >>>>>> the bigWig track. What is the difference between xyplot and >>>>>> whiskers? I >>>>>> don't understand why the tops of the values are being cut off (yellow >>>>>> color), and at value 8000. >>>>>> >>>>>> --- >>>>>> To sum up, i would like to show the bigWig for coverage (it looks very >>>>>> nice, the combined forward/reverse strand with red/blue colors). The >>>>>> problem is i would need some kind of log-value scaling or something >>>>>> like >>>>>> that (because if a user zooms out to the entire chromosome it's very >>>>>> difficult to see where the peak regions are). >>>>>> >>>>>> Any help appreciated, >>>>>> >>>>>> Thanks, >>>>>> Gregor >>>>>> >>>>>> -- >>>>>> Gregor Rot >>>>>> Bioinformatics Laboratory >>>>>> Faculty of computer and information science >>>>>> SI-1000 Ljubljana >>>>>> Slovenia >>>>>> http://www.fri.uni-lj.si/en/**gregor-rot<http://www.fri.uni-lj.si/en/gregor-rot> >>>>>> >>>>> >>>>> >>>> >>>> ------------------------------**------------------------------** >>>> ------------------ >>>> >>>> For Developers, A Lot Can Happen In A Second. >>>> Boundary is the first to Know...and Tell You. >>>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >>>> http://p.sf.net/sfu/Boundary-**d2dvs2<http://p.sf.net/sfu/Boundary-d2dvs2> >>>> >>>> >>>> >>>> ______________________________**_________________ >>>> Gmod-gbrowse mailing list >>>> Gmod-gbrowse@lists.**sourceforge.net<Gmo...@li...> >>>> https://lists.sourceforge.net/**lists/listinfo/gmod-gbrowse<https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse> >>>> >>> >>> ------------------------------**------------------------------** >>> ------------------ >>> >>> For Developers, A Lot Can Happen In A Second. >>> Boundary is the first to Know...and Tell You. >>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >>> http://p.sf.net/sfu/Boundary-**d2dvs2<http://p.sf.net/sfu/Boundary-d2dvs2> >>> ______________________________**_________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse@lists.**sourceforge.net<Gmo...@li...> >>> https://lists.sourceforge.net/**lists/listinfo/gmod-gbrowse<https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse> >>> >> > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Gmod-gbrowse mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa <Ren...@oi...> |