From: Shankar A. S. <sha...@gm...> - 2009-09-28 20:24:41
|
Hi, I'm a newbie to JBrowse and I was wondering if someone could help me out with my question. I currently have a de novo genome assembly on which I'd like to overlay wiggle tracks from RNA-seq data. The wiggle file (variableStep) has values that are in the range 1-275000. This (huge) range of values is probably why most of the regions appear to be without any reads/expression - the low values are probably not rendered in the png image. I'm wondering if there's a solution to see the data at a better resolution/scale. Are there options while converting to json that I can use to make it appear better? Is dynamic scaling possible as a user zooms in/out? Or should I do my own scaling/transposition before I do the json conversion? Thanks, Shankar |
From: Mitch S. <mit...@be...> - 2009-09-29 10:52:20
|
The short answer is; right now you can set the scale with the --min and --max command line parameters. Sometime soon, we'll be implementing a log scale option. Enabling the user to zoom vertically is something I'd like to do; there's a basic version of that that would be easy to implement (toggling between a "full" and "compressed" vertical axis is what I have in mind). Would any of those things help in your case? Implementing a vertical scale indicator is definitely on the to-do list. Brenton Gravely has been advocating making the vertical scale adjust to the values in the visible region; I think that would be hard to implement so it's probably not going to happen near-term unless someone wants to tackle it. I only have a basic understanding of exactly what people are using RNA-seq for. So I'd like to know: in your case, how does the data answer a biological question? If (say) one region has variation between 100 and 200, and another region has variation between 100,000 and 200,000, are the 100-200 differences and the 100k-200k differences both interesting? In other words, what constitutes meaningful variation your RNA-seq data? I keep asking these questions, and so far I haven't gotten an answer that has made it clear for me. Maybe we're at an exploratory stage where the answers aren't clear in general? I don't know. It's tough to make good UI decisions if the use case isn't clear. Mitch On 09/28/2009 01:24 PM, Shankar Ajay Subramanian wrote: > Hi, > > I'm a newbie to JBrowse and I was wondering if someone could help me > out with my question. I currently have a de novo genome assembly on > which I'd like to overlay wiggle tracks from RNA-seq data. The wiggle > file (variableStep) has values that are in the range 1-275000. This > (huge) range of values is probably why most of the regions appear to > be without any reads/expression - the low values are probably not > rendered in the png image. > > I'm wondering if there's a solution to see the data at a better > resolution/scale. Are there options while converting to json that I > can use to make it appear better? Is dynamic scaling possible as a > user zooms in/out? Or should I do my own scaling/transposition before > I do the json conversion? > > Thanks, > Shankar > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax > |
From: Shankar A. S. <sha...@gm...> - 2009-09-29 14:06:56
|
Hi Mitch, I circumvented the problem by converting my data to a log scale (base 2), so the track looks like a set of jagged peaks that I expected to see. Looking through the code very briefly it appears as though that JBrowse precomputes png images for wiggle tracks at different zoom levels. If that's the case, you're absolutely right, it's a tough task to adjust the scale to values in the visible region (akin to the UCSC genome browser). The use case that you described (100 vs 200 and 100k vs 200k) both constitute interesting differences since there is a two-fold difference between the min and max for the two regions. In my particular case, if I were to compare two conditions/experiments (say, pre- and post-treatment) where a particular region varies between 1-500 (which is no doubt "interesting"), I'm unable to see this difference on the browser currently, since the threshold to have a data point displayed with my data range (1-200k) is ~1000. I suppose vertical scaling will help fix this. Correct me if I am wrong. I don't know if I answered your question, but I'd be happy to answer any further questions that you might have in making improvements. Regards, Shankar On Tue, Sep 29, 2009 at 5:51 AM, Mitch Skinner <mit...@be...> wrote: > The short answer is; right now you can set the scale with the --min and > --max command line parameters. > > Sometime soon, we'll be implementing a log scale option. Enabling the > user to zoom vertically is something I'd like to do; there's a basic > version of that that would be easy to implement (toggling between a > "full" and "compressed" vertical axis is what I have in mind). Would > any of those things help in your case? > > Implementing a vertical scale indicator is definitely on the to-do list. > > Brenton Gravely has been advocating making the vertical scale adjust to > the values in the visible region; I think that would be hard to > implement so it's probably not going to happen near-term unless someone > wants to tackle it. > > I only have a basic understanding of exactly what people are using > RNA-seq for. So I'd like to know: in your case, how does the data > answer a biological question? If (say) one region has variation between > 100 and 200, and another region has variation between 100,000 and > 200,000, are the 100-200 differences and the 100k-200k differences both > interesting? In other words, what constitutes meaningful variation your > RNA-seq data? > > I keep asking these questions, and so far I haven't gotten an answer > that has made it clear for me. Maybe we're at an exploratory stage > where the answers aren't clear in general? I don't know. It's tough to > make good UI decisions if the use case isn't clear. > > Mitch > > On 09/28/2009 01:24 PM, Shankar Ajay Subramanian wrote: >> Hi, >> >> I'm a newbie to JBrowse and I was wondering if someone could help me >> out with my question. I currently have a de novo genome assembly on >> which I'd like to overlay wiggle tracks from RNA-seq data. The wiggle >> file (variableStep) has values that are in the range 1-275000. This >> (huge) range of values is probably why most of the regions appear to >> be without any reads/expression - the low values are probably not >> rendered in the png image. >> >> I'm wondering if there's a solution to see the data at a better >> resolution/scale. Are there options while converting to json that I >> can use to make it appear better? Is dynamic scaling possible as a >> user zooms in/out? Or should I do my own scaling/transposition before >> I do the json conversion? >> >> Thanks, >> Shankar >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry® Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9-12, 2009. Register now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> Gmod-ajax mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-ajax >> > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax > |
From: Mitch S. <mit...@be...> - 2009-09-29 14:02:22
|
On 09/29/2009 06:44 AM, Shankar Ajay Subramanian wrote: > I circumvented the problem by converting my data to a log scale (base > 2), so the track looks like a set of jagged peaks that I expected to > see. > Did the log scale solve the problem for you? In other words, does the log display convey all of the relevant variation, or is there some variation that you'd like to see but can't? > The use case that you described (100 vs 200 and 100k vs 200k) both > constitute interesting differences since there is a two-fold > difference between the min and max for the two regions. Can there be 100-200 variation and also 100k-200k variation within the same region? If so, are both of those still interesting? Or do large fluctuations tend to happen only between widely separated areas? > In my > particular case, if I were to compare two conditions/experiments (say, > pre- and post-treatment) where a particular region varies between > 1-500 (which is no doubt "interesting"), I'm unable to see this > difference on the browser currently, since the threshold to have a > data point displayed with my data range (1-200k) is ~1000. I suppose > vertical scaling will help fix this. Correct me if I am wrong. > I think it should help, yes. > I don't know if I answered your question, but I'd be happy to answer > any further questions that you might have in making improvements. > You did answer my questions, thanks. Once we get to implementing the log scaling and vertical scale bar and vertical zooming in JBrowse, it would be helpful to get feedback on how well those things work for you. Thanks, Mitch |
From: Shankar A. S. <sha...@gm...> - 2009-09-29 14:14:31
|
On Tue, Sep 29, 2009 at 9:02 AM, Mitch Skinner <mit...@be...> wrote: > On 09/29/2009 06:44 AM, Shankar Ajay Subramanian wrote: >> >> I circumvented the problem by converting my data to a log scale (base >> 2), so the track looks like a set of jagged peaks that I expected to >> see. >> > > Did the log scale solve the problem for you? In other words, does the log > display convey all of the relevant variation, or is there some variation > that you'd like to see but can't? It seems to be working well so far. I have to yet compare it with another track. Will let you know if I run into some trouble. > >> The use case that you described (100 vs 200 and 100k vs 200k) both >> constitute interesting differences since there is a two-fold >> difference between the min and max for the two regions. > > Can there be 100-200 variation and also 100k-200k variation within the same > region? If so, are both of those still interesting? > > Or do large fluctuations tend to happen only between widely separated areas? > I think I'll have to answer this rather circuitously. If that were to were, it would definitely be interesting. It's tough to say if both those variations in one region is something one might see. Sorry, if the answer sounds silly. >> In my >> particular case, if I were to compare two conditions/experiments (say, >> pre- and post-treatment) where a particular region varies between >> 1-500 (which is no doubt "interesting"), I'm unable to see this >> difference on the browser currently, since the threshold to have a >> data point displayed with my data range (1-200k) is ~1000. I suppose >> vertical scaling will help fix this. Correct me if I am wrong. >> > > I think it should help, yes. > >> I don't know if I answered your question, but I'd be happy to answer >> any further questions that you might have in making improvements. >> > > You did answer my questions, thanks. Once we get to implementing the log > scaling and vertical scale bar and vertical zooming in JBrowse, it would be > helpful to get feedback on how well those things work for you. Definitely, I presume you'll post a message to list informing us on upgrades. > > Thanks, > Mitch > |
From: Brenton G. <brg...@gm...> - 2009-09-29 14:33:17
|
Hi Mitch, Sorry to delay in responding about the vertical scaling issues. As you mentioned in a previous email, there are issues with dynamic and static scaling whether on a linear or log scale. For instance, for some of our RNA-Seq datasets there are genes with hundreds of thousands of reads right next to a gene that is expressed at low levels. What linear scaling allows you to do is see the full range. In a case like this, log scaling would help, but with such a dynamic range, even the low expressed genes would be very low. The other thing about log scaling is that, at least to me, it is non-intuitive. The best thing would be something along the lines of what you mention - some type of "y-zooming". This way, if you are interested in low expression of a certain segment in a region that also contains highly expressed genes, you could see it by zooming in. Similarly, to see the real dynamic range, you could zoom out. Nonetheless, I'll make a few tracks with linear scaling with a limit of 100 or so, linear scaling using the real max, and a log scale track and send some screenshot/links. Cheers, Brent On Sep 29, 2009, at 6:51 AM, Mitch Skinner wrote: > The short answer is; right now you can set the scale with the --min > and > --max command line parameters. > > Sometime soon, we'll be implementing a log scale option. Enabling the > user to zoom vertically is something I'd like to do; there's a basic > version of that that would be easy to implement (toggling between a > "full" and "compressed" vertical axis is what I have in mind). Would > any of those things help in your case? > > Implementing a vertical scale indicator is definitely on the to-do > list. > > Brenton Gravely has been advocating making the vertical scale adjust > to > the values in the visible region; I think that would be hard to > implement so it's probably not going to happen near-term unless > someone > wants to tackle it. > > I only have a basic understanding of exactly what people are using > RNA-seq for. So I'd like to know: in your case, how does the data > answer a biological question? If (say) one region has variation > between > 100 and 200, and another region has variation between 100,000 and > 200,000, are the 100-200 differences and the 100k-200k differences > both > interesting? In other words, what constitutes meaningful variation > your > RNA-seq data? > > I keep asking these questions, and so far I haven't gotten an answer > that has made it clear for me. Maybe we're at an exploratory stage > where the answers aren't clear in general? I don't know. It's > tough to > make good UI decisions if the use case isn't clear. > > Mitch > > On 09/28/2009 01:24 PM, Shankar Ajay Subramanian wrote: >> Hi, >> >> I'm a newbie to JBrowse and I was wondering if someone could help me >> out with my question. I currently have a de novo genome assembly on >> which I'd like to overlay wiggle tracks from RNA-seq data. The wiggle >> file (variableStep) has values that are in the range 1-275000. This >> (huge) range of values is probably why most of the regions appear to >> be without any reads/expression - the low values are probably not >> rendered in the png image. >> >> I'm wondering if there's a solution to see the data at a better >> resolution/scale. Are there options while converting to json that I >> can use to make it appear better? Is dynamic scaling possible as a >> user zooms in/out? Or should I do my own scaling/transposition before >> I do the json conversion? >> >> Thanks, >> Shankar >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry® Developer Conference in SF, >> CA >> is the only developer event you need to attend this year. Jumpstart >> your >> developing skills, take BlackBerry mobile applications to market >> and stay >> ahead of the curve. Join us from November 9-12, 2009. Register >> now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> Gmod-ajax mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-ajax >> > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart > your > developing skills, take BlackBerry mobile applications to market and > stay > ahead of the curve. Join us from November 9-12, 2009. Register > now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax |
From: Brenton G. <brg...@gm...> - 2009-09-29 14:37:08
|
Dear Mitch, I have another feature request that I thought would be best to start on a separate thread. One thing that would be very useful for users like me who have a lot of tracks s the ability to select entire groups of tracks to load at once instead of having to do it individually. For instance, our modENCODE browser has 30 different RNA-Seq tracks from different timepoints throughout Drosophila development. If there was a way to load all 12 embryonic time points at once it would be great. Perhaps one way to do this would be to have a menu type thing on the left where you could have nested tracks. So, perhaps you could have one track listed as "Development" and dragging this to the right would load all 30 time points. Alternatively, there could be an arrow that you could click on and it would display the next set of groupings (e.g., Embryos, Larvae, Pupae, Adults). Again, you could either drag one of these to the browser window to load the whole set or click on an arrow to show the next set of grouping (e.g., the individual tracks in each group). I have absolutely no clue how easy or difficult something like this would be to implement, but it could be quite useful, especially since we will have many more tracks (~200 in the next year or so) to add to the browser. Cheers, Brent |
From: Mitch S. <mit...@be...> - 2009-09-29 15:00:39
|
On 09/29/2009 07:36 AM, Brenton Graveley wrote: > One thing that would be very useful for users like me who have a lot > of tracks s the ability to select entire groups of tracks to load at > once instead of having to do it individually. For instance, our > modENCODE browser has 30 different RNA-Seq tracks from different > timepoints throughout Drosophila development. If there was a way to > load all 12 embryonic time points at once it would be great. I've been thinking about this and I have some questions :) One alternative that was high on my list was to do something like what GBrowse does on the modENCODE server: http://modencode.oicr.on.ca/cgi-bin/gb2/gbrowse/fly/?start=123000;stop=180000;ref=2L;width=800;version=100;flip=0;grid=1;id=405c399cbe0d255c1a2d44e9a4f1e306;label=EMBRYO_SG_Total There are two things going on there; 1. conveying quantitative data with color (lighthouse #7), and 2. subtracks that you manipulate (turn on and off, re-order) together. One possibility that would be easy to implement client-side would be just to have one image track with several subtracks in each image. A nice feature of that approach is that it would reduce disk usage on the server (which I've been concerned about) to some extent by reducing the number of images (there's some filesystem overhead associated with each image file, so fewer larger files are better than more numerous smaller files). Has disk space been an issue for you? A drawback of putting all the subtracks into the same image track would be that you couldn't turn the subtracks on and off individually. Do you always (or almost always) want to view all the time points (or, say, all 12 embryonic time points) together, or do you sometimes want to reorder them or turn them on and off individually? > Perhaps one way to do this would be to have a menu type thing on the > left where you could have nested tracks. So, perhaps you could have > one track listed as "Development" and dragging this to the right would > load all 30 time points. Alternatively, there could be an arrow that > you could click on and it would display the next set of groupings > (e.g., Embryos, Larvae, Pupae, Adults). Again, you could either drag > one of these to the browser window to load the whole set or click on > an arrow to show the next set of grouping (e.g., the individual tracks > in each group). I have absolutely no clue how easy or difficult > something like this would be to implement, but it could be quite > useful, especially since we will have many more tracks (~200 in the > next year or so) to add to the browser. Some kind of hierarchy in the available track list is definitely will-do functionality. The modENCODE people have been telling me the same thing, that they'll have lots and lots of tracks soon. A sense of when it'll happen is helpful for prioritizing that functionality relative to everything else people are asking for, so thanks for the time/track-count estimate. Mitch |
From: Brenton G. <brg...@gm...> - 2009-09-29 15:08:32
|
I think this would be one option, though it would be best to retain the ability to turn individual tracks on or off. For simplicity sake, there are times when viewing 12 embryonic tracks gets overwhelming and having just 1 or 2 is sufficient to compare embryos to adults, for instance. So, one thing the modENCODE gbrowse site has is on button to click on or off a whole set of tracks, and other buttons to click on or off the individual subtracks. Something like this would be ideal.... Disk space is starting to become an issue - I recently started making the wiggle track images at half the default height in order to save on some disk space (I haven't calculated or compared the size difference, but assumed there would be). For the fly genome, at 150 MB, it isn't so bad, but tracks for the human genome are much larger and it is much more of an issue. On Sep 29, 2009, at 11:00 AM, Mitch Skinner wrote: > On 09/29/2009 07:36 AM, Brenton Graveley wrote: >> One thing that would be very useful for users like me who have a >> lot of tracks s the ability to select entire groups of tracks to >> load at once instead of having to do it individually. For >> instance, our modENCODE browser has 30 different RNA-Seq tracks >> from different timepoints throughout Drosophila development. If >> there was a way to load all 12 embryonic time points at once it >> would be great. > > I've been thinking about this and I have some questions :) > > One alternative that was high on my list was to do something like > what GBrowse does on the modENCODE server: > http://modencode.oicr.on.ca/cgi-bin/gb2/gbrowse/fly/?start=123000;stop=180000;ref=2L;width=800;version=100;flip=0;grid=1;id=405c399cbe0d255c1a2d44e9a4f1e306;label=EMBRYO_SG_Total > > There are two things going on there; 1. conveying quantitative data > with color (lighthouse #7), and 2. subtracks that you manipulate > (turn on and off, re-order) together. > > One possibility that would be easy to implement client-side would be > just to have one image track with several subtracks in each image. > A nice feature of that approach is that it would reduce disk usage > on the server (which I've been concerned about) to some extent by > reducing the number of images (there's some filesystem overhead > associated with each image file, so fewer larger files are better > than more numerous smaller files). Has disk space been an issue for > you? > > A drawback of putting all the subtracks into the same image track > would be that you couldn't turn the subtracks on and off > individually. Do you always (or almost always) want to view all the > time points (or, say, all 12 embryonic time points) together, or do > you sometimes want to reorder them or turn them on and off > individually? > >> Perhaps one way to do this would be to have a menu type thing on >> the left where you could have nested tracks. So, perhaps you could >> have one track listed as "Development" and dragging this to the >> right would load all 30 time points. Alternatively, there could be >> an arrow that you could click on and it would display the next set >> of groupings (e.g., Embryos, Larvae, Pupae, Adults). Again, you >> could either drag one of these to the browser window to load the >> whole set or click on an arrow to show the next set of grouping >> (e.g., the individual tracks in each group). I have absolutely no >> clue how easy or difficult something like this would be to >> implement, but it could be quite useful, especially since we will >> have many more tracks (~200 in the next year or so) to add to the >> browser. > > Some kind of hierarchy in the available track list is definitely > will-do functionality. The modENCODE people have been telling me > the same thing, that they'll have lots and lots of tracks soon. A > sense of when it'll happen is helpful for prioritizing that > functionality relative to everything else people are asking for, so > thanks for the time/track-count estimate. > > Mitch |
From: Mitch S. <mit...@be...> - 2009-09-29 20:39:05
|
On 09/29/2009 08:08 AM, Brenton Graveley wrote: > I think this would be one option, though it would be best to retain > the ability to turn individual tracks on or off. For simplicity sake, > there are times when viewing 12 embryonic tracks gets overwhelming and > having just 1 or 2 is sufficient to compare embryos to adults, for > instance. Random thought: for this particular use case, would it be better to have a view that aggregates all the embryonic timepoints (e.g., showing the mean, or the min/mean/max)? We could implement a one-button toggle between the all-timepoints summary and per-timepoint view. Thanks for providing a specific use case here. > Disk space is starting to become an issue - I recently started making > the wiggle track images at half the default height in order to save on > some disk space (I haven't calculated or compared the size difference, > but assumed there would be). For the fly genome, at 150 MB, it isn't > so bad, but tracks for the human genome are much larger and it is much > more of an issue. In my testing in linux (with the ext3 filesystem) filesystem overhead was about half of the total disk usage (in linux, you can measure this by comparing the output of "du" and "du --apparent-size"). In my tests, if a file was smaller than 4k, then it would take up the whole 4k (which is the default block size in ext3). And I'd guess that many of your images are close to or below that threshold. So I think aggregating the images could really help (especially given the PNG compression). In my testing so far, the images (total for all zoom levels) have been taking about 5 and a half bytes per base (~2.5 bytes for the image data, the rest filesystem overhead). That compares with GBrowse's 1 byte per data point storage method; it should be possible (maybe with a bit of tinkering) to plug GBrowse's wiggle image generation into JBrowse. The tradeoff would be that GBrowse installation would add complexity, and it would take (I think) significantly more CPU when people browse the data. JBrowse usually chooses speed over space usage whenever there's a computation/storage tradeoff, and I think that's the right choice in general, especially if you want to serve lots of users. But in your case maybe that choice isn't appropriate. Do you usually have one data point per base across the entire genome, or is it less dense? I thought you were making shorter images to save on screen space, which is something that I was also concerned about when I looked at your JBrowse installation. I did think that your JBrowse installation was pretty awesome, though. Mitch |