|
From: Gregor R. <gre...@fr...> - 2012-04-19 17:33:17
|
Hi all, why is the whisker plot cut at the top (yellow)? (max value is 12930). Tim: forgot to mention, in generating the bigWig i used position=span (my data is not paired, no gapped alignments). Thanks, Gregor On 4/19/2012 7:11 PM, Gregor Rot wrote: > The correct values are: > > 1024 image size, 100nt region, window size = 1nt > 1024 image size, 1024KB region, window size = 1000nt > > Thanks, > Gregor > > On 4/19/2012 7:06 PM, Gregor Rot wrote: >> Hi Tim, >> >> thank you for the explanation, very nice. >> >> One more thing: if you look at my picture: i have the image width set to >> 1024 and am viewing a region of 100bp. This would set the "window" to 1, >> since the region< image width. >> >> The red color is bigWig adapter. Since window size = 1, this shows >> coverage data nicely. Mean = sum (1 value per window). Indeed, it is >> showing 12930 as max value and this is the maximum value present in the >> bam coverage, i double checked this with my script. >> >> Is this interpretation correct? >> >> Now, if i zoom out and view a 10024KB region with the image size of >> 1024, the window size would be 1000nt now, so each pixel represents a >> window of 1024nt. The bigWig adapter would look for coverage inside the >> window (at each nucleotide) and return the mean value (out of 1000 >> coverage values) for the window. Is this so? >> >> Another puzzling thing is the bam (blue color) graph. Still using glyph >> xyplot, but it only shows 8008 as the max value. But from your >> explanation, this one should be showing 12930 also? >> >> The thing is it would be really terrific to have the bigWig adapter >> return the sum for each window instead of mean. How difficult would that >> be? I can try to implement this, but i need some help for where to look. >> >> Thanks, >> Gregor >> >> On 4/19/2012 5:56 PM, Timothy Parnell wrote: >>> Hi, >>> Not sure if I can adequately explain all, but here goes >>> >>> The coverage method from Bio::DB::Sam returns a simple count of all >>> alignments covering each "window" or pixel in the current view. Depending >>> on the zoom factor of the current view, each window could represent >>> anything from 1 bp to the current view size divided by panel size in >>> pixels (10 Mb / 1024 pixels = 9766 bp per pixel). >>> >>> The summary method from Bio::DB::BigWig returns some statistics for each >>> window, including count, sum, etc. The wiggle_xyplot glyph by default >>> shows the mean value for each window, which is ok for something like >>> microarray probe values but not good for displaying sequence coverage >>> (sum >>> would be better). I've been meaning to dig into the code and suggest a >>> patch to Lincoln to change this behavior. >>> >>> The wiggle_whiskers glyph displays the mean, standard deviation, and max >>> values for each window, denoting each as a different color. Hence, it >>> only >>> works with BigWigs that return window statistics. This is a little >>> better. >>> >>> One way to ameliorate this problem is to convert the bam into binned >>> coverage (100, 500, or 1000 bp) wig files. My bam2wig.pl will now do >>> this. >>> >>> I'm not sure about the difference between bam and bigwig when zoomed >>> in at >>> base pair level. My bam2wig.pl script can count alignments in a number of >>> different ways: at the start position, mid position, or each alignment >>> base. It may or may not count gaps, depending on whether splices are >>> enabled, and can skip low-scoring alignments. The Bam coverage method >>> very >>> likely does not count gaps, and will count regardless of score. It's hard >>> to say without careful accounting of each alignment at each base pair and >>> comparing methods. >>> >>> I hope that answers at least some of your concerns, >>> Tim >>> >>> >>> On 4/19/12 7:54 AM, "Gregor Rot"<gre...@fr...> wrote: >>> >>>> Hi all, >>>> >>>> i have 3 simple questions: >>>> >>>> a) i converted my bam to bigWig (each aligned read contributes +1 at >>>> each position spanning the read). I am now comparing the bigWig track >>>> with the coverage bam track, the database definitions are: >>>> >>>> [bigwig_db:database] >>>> db_adaptor = Bio::DB::BigWigSet >>>> db_args = -dir /big_wig_folder >>>> >>>> [bam_db:database] >>>> db_adaptor = Bio::DB::Sam >>>> db_args = -bam /path_to_bam_file >>>> >>>> and i am using: >>>> >>>> glypx = wiggle_xyplot >>>> >>>> For feature i am using "coverage" for the bam track and "summary" for >>>> the bigWig track. What is the difference? >>>> >>>> If you look at figure bam_vs_bigwig_coverage.png, you will see that >>>> coverage at centre is 27 for bigWig (bottom) and 35 for the bam track >>>> (top). I checked the sam file and the correct coverage is 27 (bigWig), i >>>> don't know how the bam coverage is computed? >>>> >>>> b) if i zoom out to chr1 (scaling is set to local min/max), you see the >>>> result in figure scaling_1.png. The selected region has the highest peak >>>> (8500), but you can see other higher regions in the bam coverage track. >>>> Why? Also the y-axis on this track now shows only 347, but the bigWig >>>> track correctly shows 8554. >>>> >>>> c) If you look at figure whiskers.png, i am using the whiskers glyph for >>>> the bigWig track. What is the difference between xyplot and whiskers? I >>>> don't understand why the tops of the values are being cut off (yellow >>>> color), and at value 8000. >>>> >>>> --- >>>> To sum up, i would like to show the bigWig for coverage (it looks very >>>> nice, the combined forward/reverse strand with red/blue colors). The >>>> problem is i would need some kind of log-value scaling or something like >>>> that (because if a user zooms out to the entire chromosome it's very >>>> difficult to see where the peak regions are). >>>> >>>> Any help appreciated, >>>> >>>> Thanks, >>>> Gregor >>>> >>>> -- >>>> Gregor Rot >>>> Bioinformatics Laboratory >>>> Faculty of computer and information science >>>> SI-1000 Ljubljana >>>> Slovenia >>>> http://www.fri.uni-lj.si/en/gregor-rot >>> >> >> >> ------------------------------------------------------------------------------ >> For Developers, A Lot Can Happen In A Second. >> Boundary is the first to Know...and Tell You. >> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >> http://p.sf.net/sfu/Boundary-d2dvs2 >> >> >> >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Gmod-gbrowse mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |