|
From: Timothy P. <Tim...@hc...> - 2012-04-19 15:57:01
|
Hi, Not sure if I can adequately explain all, but here goes The coverage method from Bio::DB::Sam returns a simple count of all alignments covering each "window" or pixel in the current view. Depending on the zoom factor of the current view, each window could represent anything from 1 bp to the current view size divided by panel size in pixels (10 Mb / 1024 pixels = 9766 bp per pixel). The summary method from Bio::DB::BigWig returns some statistics for each window, including count, sum, etc. The wiggle_xyplot glyph by default shows the mean value for each window, which is ok for something like microarray probe values but not good for displaying sequence coverage (sum would be better). I've been meaning to dig into the code and suggest a patch to Lincoln to change this behavior. The wiggle_whiskers glyph displays the mean, standard deviation, and max values for each window, denoting each as a different color. Hence, it only works with BigWigs that return window statistics. This is a little better. One way to ameliorate this problem is to convert the bam into binned coverage (100, 500, or 1000 bp) wig files. My bam2wig.pl will now do this. I'm not sure about the difference between bam and bigwig when zoomed in at base pair level. My bam2wig.pl script can count alignments in a number of different ways: at the start position, mid position, or each alignment base. It may or may not count gaps, depending on whether splices are enabled, and can skip low-scoring alignments. The Bam coverage method very likely does not count gaps, and will count regardless of score. It's hard to say without careful accounting of each alignment at each base pair and comparing methods. I hope that answers at least some of your concerns, Tim On 4/19/12 7:54 AM, "Gregor Rot" <gre...@fr...> wrote: >Hi all, > >i have 3 simple questions: > >a) i converted my bam to bigWig (each aligned read contributes +1 at >each position spanning the read). I am now comparing the bigWig track >with the coverage bam track, the database definitions are: > >[bigwig_db:database] >db_adaptor = Bio::DB::BigWigSet >db_args = -dir /big_wig_folder > >[bam_db:database] >db_adaptor = Bio::DB::Sam >db_args = -bam /path_to_bam_file > >and i am using: > >glypx = wiggle_xyplot > >For feature i am using "coverage" for the bam track and "summary" for >the bigWig track. What is the difference? > >If you look at figure bam_vs_bigwig_coverage.png, you will see that >coverage at centre is 27 for bigWig (bottom) and 35 for the bam track >(top). I checked the sam file and the correct coverage is 27 (bigWig), i >don't know how the bam coverage is computed? > >b) if i zoom out to chr1 (scaling is set to local min/max), you see the >result in figure scaling_1.png. The selected region has the highest peak >(8500), but you can see other higher regions in the bam coverage track. >Why? Also the y-axis on this track now shows only 347, but the bigWig >track correctly shows 8554. > >c) If you look at figure whiskers.png, i am using the whiskers glyph for >the bigWig track. What is the difference between xyplot and whiskers? I >don't understand why the tops of the values are being cut off (yellow >color), and at value 8000. > >--- >To sum up, i would like to show the bigWig for coverage (it looks very >nice, the combined forward/reverse strand with red/blue colors). The >problem is i would need some kind of log-value scaling or something like >that (because if a user zooms out to the entire chromosome it's very >difficult to see where the peak regions are). > >Any help appreciated, > >Thanks, >Gregor > >-- >Gregor Rot >Bioinformatics Laboratory >Faculty of computer and information science >SI-1000 Ljubljana >Slovenia >http://www.fri.uni-lj.si/en/gregor-rot |