|
From: Gregor R. <gre...@fr...> - 2012-04-19 17:34:00
|
Forgot the image (attached). Sorry for so many e-mails, today is my Gbrowse day:) Gregor On 4/19/2012 7:33 PM, Gregor Rot wrote: > Hi all, > > why is the whisker plot cut at the top (yellow)? (max value is 12930). > > Tim: forgot to mention, in generating the bigWig i used position=span > (my data is not paired, no gapped alignments). > > Thanks, > Gregor > > On 4/19/2012 7:11 PM, Gregor Rot wrote: >> The correct values are: >> >> 1024 image size, 100nt region, window size = 1nt >> 1024 image size, 1024KB region, window size = 1000nt >> >> Thanks, >> Gregor >> >> On 4/19/2012 7:06 PM, Gregor Rot wrote: >>> Hi Tim, >>> >>> thank you for the explanation, very nice. >>> >>> One more thing: if you look at my picture: i have the image width set to >>> 1024 and am viewing a region of 100bp. This would set the "window" to 1, >>> since the region< image width. >>> >>> The red color is bigWig adapter. Since window size = 1, this shows >>> coverage data nicely. Mean = sum (1 value per window). Indeed, it is >>> showing 12930 as max value and this is the maximum value present in the >>> bam coverage, i double checked this with my script. >>> >>> Is this interpretation correct? >>> >>> Now, if i zoom out and view a 10024KB region with the image size of >>> 1024, the window size would be 1000nt now, so each pixel represents a >>> window of 1024nt. The bigWig adapter would look for coverage inside the >>> window (at each nucleotide) and return the mean value (out of 1000 >>> coverage values) for the window. Is this so? >>> >>> Another puzzling thing is the bam (blue color) graph. Still using glyph >>> xyplot, but it only shows 8008 as the max value. But from your >>> explanation, this one should be showing 12930 also? >>> >>> The thing is it would be really terrific to have the bigWig adapter >>> return the sum for each window instead of mean. How difficult would that >>> be? I can try to implement this, but i need some help for where to look. >>> >>> Thanks, >>> Gregor >>> >>> On 4/19/2012 5:56 PM, Timothy Parnell wrote: >>>> Hi, >>>> Not sure if I can adequately explain all, but here goes >>>> >>>> The coverage method from Bio::DB::Sam returns a simple count of all >>>> alignments covering each "window" or pixel in the current view. >>>> Depending >>>> on the zoom factor of the current view, each window could represent >>>> anything from 1 bp to the current view size divided by panel size in >>>> pixels (10 Mb / 1024 pixels = 9766 bp per pixel). >>>> >>>> The summary method from Bio::DB::BigWig returns some statistics for >>>> each >>>> window, including count, sum, etc. The wiggle_xyplot glyph by default >>>> shows the mean value for each window, which is ok for something like >>>> microarray probe values but not good for displaying sequence coverage >>>> (sum >>>> would be better). I've been meaning to dig into the code and suggest a >>>> patch to Lincoln to change this behavior. >>>> >>>> The wiggle_whiskers glyph displays the mean, standard deviation, and >>>> max >>>> values for each window, denoting each as a different color. Hence, it >>>> only >>>> works with BigWigs that return window statistics. This is a little >>>> better. >>>> >>>> One way to ameliorate this problem is to convert the bam into binned >>>> coverage (100, 500, or 1000 bp) wig files. My bam2wig.pl will now do >>>> this. >>>> >>>> I'm not sure about the difference between bam and bigwig when zoomed >>>> in at >>>> base pair level. My bam2wig.pl script can count alignments in a >>>> number of >>>> different ways: at the start position, mid position, or each alignment >>>> base. It may or may not count gaps, depending on whether splices are >>>> enabled, and can skip low-scoring alignments. The Bam coverage method >>>> very >>>> likely does not count gaps, and will count regardless of score. It's >>>> hard >>>> to say without careful accounting of each alignment at each base >>>> pair and >>>> comparing methods. >>>> >>>> I hope that answers at least some of your concerns, >>>> Tim >>>> >>>> >>>> On 4/19/12 7:54 AM, "Gregor Rot"<gre...@fr...> wrote: >>>> >>>>> Hi all, >>>>> >>>>> i have 3 simple questions: >>>>> >>>>> a) i converted my bam to bigWig (each aligned read contributes +1 at >>>>> each position spanning the read). I am now comparing the bigWig track >>>>> with the coverage bam track, the database definitions are: >>>>> >>>>> [bigwig_db:database] >>>>> db_adaptor = Bio::DB::BigWigSet >>>>> db_args = -dir /big_wig_folder >>>>> >>>>> [bam_db:database] >>>>> db_adaptor = Bio::DB::Sam >>>>> db_args = -bam /path_to_bam_file >>>>> >>>>> and i am using: >>>>> >>>>> glypx = wiggle_xyplot >>>>> >>>>> For feature i am using "coverage" for the bam track and "summary" for >>>>> the bigWig track. What is the difference? >>>>> >>>>> If you look at figure bam_vs_bigwig_coverage.png, you will see that >>>>> coverage at centre is 27 for bigWig (bottom) and 35 for the bam track >>>>> (top). I checked the sam file and the correct coverage is 27 >>>>> (bigWig), i >>>>> don't know how the bam coverage is computed? >>>>> >>>>> b) if i zoom out to chr1 (scaling is set to local min/max), you see >>>>> the >>>>> result in figure scaling_1.png. The selected region has the highest >>>>> peak >>>>> (8500), but you can see other higher regions in the bam coverage >>>>> track. >>>>> Why? Also the y-axis on this track now shows only 347, but the bigWig >>>>> track correctly shows 8554. >>>>> >>>>> c) If you look at figure whiskers.png, i am using the whiskers >>>>> glyph for >>>>> the bigWig track. What is the difference between xyplot and >>>>> whiskers? I >>>>> don't understand why the tops of the values are being cut off (yellow >>>>> color), and at value 8000. >>>>> >>>>> --- >>>>> To sum up, i would like to show the bigWig for coverage (it looks very >>>>> nice, the combined forward/reverse strand with red/blue colors). The >>>>> problem is i would need some kind of log-value scaling or something >>>>> like >>>>> that (because if a user zooms out to the entire chromosome it's very >>>>> difficult to see where the peak regions are). >>>>> >>>>> Any help appreciated, >>>>> >>>>> Thanks, >>>>> Gregor >>>>> >>>>> -- >>>>> Gregor Rot >>>>> Bioinformatics Laboratory >>>>> Faculty of computer and information science >>>>> SI-1000 Ljubljana >>>>> Slovenia >>>>> http://www.fri.uni-lj.si/en/gregor-rot >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> For Developers, A Lot Can Happen In A Second. >>> Boundary is the first to Know...and Tell You. >>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >>> http://p.sf.net/sfu/Boundary-d2dvs2 >>> >>> >>> >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmo...@li... >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> >> ------------------------------------------------------------------------------ >> >> For Developers, A Lot Can Happen In A Second. >> Boundary is the first to Know...and Tell You. >> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >> http://p.sf.net/sfu/Boundary-d2dvs2 >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse |