Thread: [Gmod-ajax] next generation sequencing visualization

Brought to you by: daveclements, girlwithglasses, gk_fan, hueyling, and 10 others

gmod-ajax

[Gmod-ajax] next generation sequencing visualization

From: Steve T. <ste...@im...> - 2009-03-11 15:40:40

Hi,

Is anyone looking at a way of visualizing alignments from next generation sequencing in JBrowse?

Kind regards and thanks,

Steve
------------------------------------------------------------------
Medical Sciences Division
Weatherall Institute of Molecular Medicine/Sir William Dunn School
Oxford University

Re: [Gmod-ajax] next generation sequencing visualization

From: Ian H. <ih...@be...> - 2009-03-11 15:52:39

It has come up.... obviously this is very important.

One approach would be to use this as the first test case for a plugin 
image track (c.f. GBrowse glyphs).

Another approach is to use this to motivate piecewise loading of 
feature-rich tracks, and do layout (& rendering) on the client.

Both are in tune with what we eventually want to do...

As always, do please let us know if you're able to contribute to either 
of these (the first, server-side approach is probably the most easily 
addressed by non-core developers)

I.


Steve Taylor wrote:
> Hi,
> 
> Is anyone looking at a way of visualizing alignments from next generation sequencing in JBrowse?
> 
> Kind regards and thanks,
> 
> Steve
> ------------------------------------------------------------------
> Medical Sciences Division
> Weatherall Institute of Molecular Medicine/Sir William Dunn School
> Oxford University
> 
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> Gmod-ajax mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax

Re: [Gmod-ajax] next generation sequencing visualization

From: Ian H. <ih...@be...> - 2009-03-11 15:58:23

PS as Jason Stajich just remarked to me, you can already generate a WIG 
track representing read depth at any given site.

Ian Holmes wrote:
> It has come up.... obviously this is very important.
> 
> One approach would be to use this as the first test case for a plugin 
> image track (c.f. GBrowse glyphs).
> 
> Another approach is to use this to motivate piecewise loading of 
> feature-rich tracks, and do layout (& rendering) on the client.
> 
> Both are in tune with what we eventually want to do...
> 
> As always, do please let us know if you're able to contribute to either 
> of these (the first, server-side approach is probably the most easily 
> addressed by non-core developers)
> 
> I.
> 
> 
> Steve Taylor wrote:
>> Hi,
>>
>> Is anyone looking at a way of visualizing alignments from next generation sequencing in JBrowse?
>>
>> Kind regards and thanks,
>>
>> Steve
>> ------------------------------------------------------------------
>> Medical Sciences Division
>> Weatherall Institute of Molecular Medicine/Sir William Dunn School
>> Oxford University
>>
>> ------------------------------------------------------------------------------
>> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
>> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
>> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
>> software that enables intelligent coding and step-through debugging.
>> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
>> _______________________________________________
>> Gmod-ajax mailing list
>> Gmo...@li...
>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
> 
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> Gmod-ajax mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax

Re: [Gmod-ajax] next generation sequencing visualization

From: Steve T. <ste...@im...> - 2009-03-11 16:06:38

Hi Ian,

> PS as Jason Stajich just remarked to me, you can already generate a WIG 
> track representing read depth at any given site.

Yes...but we really need a decent alignment viewer at the bp level to see SNPs etc. Can GBrowse display alignments in the panel?

The closest I have seen on the web is LookSeq.

http://www.sanger.ac.uk/Software/analysis/lookseq/

> 
> Ian Holmes wrote:
>> It has come up.... obviously this is very important.
>>
>> One approach would be to use this as the first test case for a plugin 
>> image track (c.f. GBrowse glyphs).

Are you thinking of using existing GBrowse routines for this to generate tiles (pre-rendered or on on the fly) like in the old Ajax GBrowse?

Steve

>>
>> Another approach is to use this to motivate piecewise loading of 
>> feature-rich tracks, and do layout (& rendering) on the client.
>>
>> Both are in tune with what we eventually want to do...
>>
>> As always, do please let us know if you're able to contribute to 
>> either of these (the first, server-side approach is probably the most 
>> easily addressed by non-core developers)
>>
>> I.
>>
>>
>> Steve Taylor wrote:
>>> Hi,
>>>
>>> Is anyone looking at a way of visualizing alignments from next 
>>> generation sequencing in JBrowse?
>>>
>>> Kind regards and thanks,
>>>
>>> Steve
>>> ------------------------------------------------------------------
>>> Medical Sciences Division
>>> Weatherall Institute of Molecular Medicine/Sir William Dunn School
>>> Oxford University
>>>
>>> ------------------------------------------------------------------------------ 
>>>
>>> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
>>> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
>>> easily build your RIAs with Flex Builder, the Eclipse(TM)based 
>>> development
>>> software that enables intelligent coding and step-through debugging.
>>> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
>>> _______________________________________________
>>> Gmod-ajax mailing list
>>> Gmo...@li...
>>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>>
>> ------------------------------------------------------------------------------ 
>>
>> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
>> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
>> easily build your RIAs with Flex Builder, the Eclipse(TM)based 
>> development
>> software that enables intelligent coding and step-through debugging.
>> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
>> _______________________________________________
>> Gmod-ajax mailing list
>> Gmo...@li...
>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax

Re: [Gmod-ajax] next generation sequencing visualization

From: Ian H. <ih...@be...> - 2009-03-11 16:09:46

Mitch just added Sequence Tracks (which could be hacked to show 
alignments) and also plans to allow arbitrary text in features at 
base-resolution.

However, in order to load this many features you would need some work, 
hence my comments about incremental loading of NCLists.

Steve Taylor wrote:
> Hi Ian,
> 
>> PS as Jason Stajich just remarked to me, you can already generate a 
>> WIG track representing read depth at any given site.
> 
> Yes...but we really need a decent alignment viewer at the bp level to 
> see SNPs etc. Can GBrowse display alignments in the panel?
> 
> The closest I have seen on the web is LookSeq.
> 
> http://www.sanger.ac.uk/Software/analysis/lookseq/
> 
>>
>> Ian Holmes wrote:
>>> It has come up.... obviously this is very important.
>>>
>>> One approach would be to use this as the first test case for a plugin 
>>> image track (c.f. GBrowse glyphs).
> 
> Are you thinking of using existing GBrowse routines for this to generate 
> tiles (pre-rendered or on on the fly) like in the old Ajax GBrowse?
> 
> Steve
> 
>>>
>>> Another approach is to use this to motivate piecewise loading of 
>>> feature-rich tracks, and do layout (& rendering) on the client.
>>>
>>> Both are in tune with what we eventually want to do...
>>>
>>> As always, do please let us know if you're able to contribute to 
>>> either of these (the first, server-side approach is probably the most 
>>> easily addressed by non-core developers)
>>>
>>> I.
>>>
>>>
>>> Steve Taylor wrote:
>>>> Hi,
>>>>
>>>> Is anyone looking at a way of visualizing alignments from next 
>>>> generation sequencing in JBrowse?
>>>>
>>>> Kind regards and thanks,
>>>>
>>>> Steve
>>>> ------------------------------------------------------------------
>>>> Medical Sciences Division
>>>> Weatherall Institute of Molecular Medicine/Sir William Dunn School
>>>> Oxford University
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
>>>> powering Web 2.0 with engaging, cross-platform capabilities. Quickly 
>>>> and
>>>> easily build your RIAs with Flex Builder, the Eclipse(TM)based 
>>>> development
>>>> software that enables intelligent coding and step-through debugging.
>>>> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
>>>> _______________________________________________
>>>> Gmod-ajax mailing list
>>>> Gmo...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>>>
>>> ------------------------------------------------------------------------------ 
>>>
>>> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
>>> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
>>> easily build your RIAs with Flex Builder, the Eclipse(TM)based 
>>> development
>>> software that enables intelligent coding and step-through debugging.
>>> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
>>> _______________________________________________
>>> Gmod-ajax mailing list
>>> Gmo...@li...
>>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax

Re: [Gmod-ajax] next generation sequencing visualization

From: Ian H. <ih...@be...> - 2009-03-11 17:44:08

hmmm, I think you can easily construct situations where people might 
want to eyeball reads at the basepair level. Including insertions 
(which, fwiw, I think can be displayed a little more easily than per 
your email, Mitch -- e.g. as popups.)

Technically I think this comes down to a volume-of-data issue. Point 
being that you can already visualize short reads in aggregate, by 
generating a WIG plot of read density (easy) or by generating your own 
image track (almost as easy).

The only thing you currently cannot do is load a genome's worth of short 
reads into your web browser (nor would you want to do this). So, at the 
level of core tech, this comes down to how you deal with annotation 
tracks containing millions of features. The obvious answer being that 
you load them incrementally (e.g. in chunks [as we currently handle 
sequence] or by CGI range queries).

As an open source, developer-friendly project, we should be encouraging 
people (as a first resort) to make maximal use of the APIs and parts 
that we've already provided. That API should be extended only when it 
simply fails to meet a significant (empirical) demand.

So I think that I'd essentially agree with what Mitch said. Consider 
first what you can do using an image track (it might go further than you 
think -- e.g. you could display SNPs using a sequence logo) and whether 
it is at all possible that you could implement this yourself (obviously, 
with help from us).

At some point we will implement partial loading extensions that will 
allow you to eyeball high-volume feature tracks. But this will happen 
faster if you can demonstrate that you have already pushed back to your 
users with simpler (image-based) alternatives and they are, 
nevertheless, in need of a high-volume solution!

BTW, Sean Eddy has a discussion thread on next-gen sequencing challenges:
http://selab.janelia.org/people/eddys/blog/?p=86

Ian

Mitch Skinner wrote:
> Steve Taylor wrote:
>> Yes...but we really need a decent alignment viewer at the bp level to 
>> see SNPs etc. Can GBrowse display alignments in the panel?
>>   
> 
> The volume of data is large, right?  So why would someone want to 
> eyeball it?  Won't people be running programs to identify SNPs, rather 
> than trying to do it manually?
> 
> I worked with biologists for several years, so I know how much they like 
> to eyeball things.  But if the data volume is large, IMHO it's important 
> to push back and advocate automated analysis instead.  I'd hate to do a 
> lot of work only to find that after the initial burst of enthusiasm no 
> one used it.
> 
> Currently, there's an assumption built fairly widely into JBrowse (and 
> all other genome browsers as far as I know), which is that the 
> coordinate system defined by the reference sequence doesn't change on 
> the fly.  So it'll take a fair chunk of work to be able to show 
> insertions from resequencing.
> 
> On the other hand, if you're talking about viewing just a small region, 
> and you want to view it in alignment coordinates, and all of your data 
> is in aligment coordinates, then the JBrowse part of the work should be 
> easy to do.  We've talked about displaying per-base data (like sequence, 
> or a predicted RNA fold) in features; it's not implemented but it should 
> be straightforward to do.
> 
> Mitch

Re: [Gmod-ajax] next generation sequencing visualization

From: Ian H. <ih...@be...> - 2009-03-11 17:57:13

Could all of this be handled by a combination of:

WIG track (for showing density of reads, and/or read start/endpoints)
Sequence track (for showing SNPs)
GFF/BED track (for showing deletions and insertions, and maybe SNPs)

If so, then aren't we really talking about a small raft of server 
scripts, as opposed to a fundamental change?



Andrew Uzilov wrote:
> Mitch Skinner wrote:
>> Steve Taylor wrote:
>>> Yes...but we really need a decent alignment viewer at the bp level to see SNPs etc. Can GBrowse display alignments in the panel?
>>>   
>> The volume of data is large, right?  So why would someone want to 
>> eyeball it?  Won't people be running programs to identify SNPs, rather 
>> than trying to do it manually?
> 
> I have actually found the genome browser to be very useful in debugging such 
> automated approaches.  It is frequently much easier to look through the output 
> of your program visually, in the genome browser, to spot off-by-1 errors and 
> such, than it is to write debug code to get the same answer.  Software 
> developers would greatly benefit from visualizing the output of their code in 
> the browser, even if the data set is gigantic.
> 
> The whole power of genome browsers is that a picture is worth 1000 words, and 
> visual correlation is easier than looking at tab-delimited debugging logs.
> 
>> I worked with biologists for several years, so I know how much they like 
>> to eyeball things.  But if the data volume is large, IMHO it's important 
>> to push back and advocate automated analysis instead.  I'd hate to do a 
>> lot of work only to find that after the initial burst of enthusiasm no 
>> one used it.
> 
> In defense of the biologists, eyeballing the data is crucial to forming new 
> hypotheses.  And they're not just doing it because it's all they know how to do, 
> but it is important in building personal priors.  You want to have an 
> expectation of some sort before you design and run a large-scale automated 
> analysis.  This is especially true if you are moving into genomics territory so 
> poorly understood that you might not even know what to expect, so you have click 
> around a bit to get an idea.
> 
> This is something I try to teach my students in a graduate computational 
> genomics class.  So many of them run into writing algorithms based on bad 
> assumptions because they haven't even LOOKED at the initial data.  And then they 
> wonder why their accuracy is poor.  Genomics has a lot to learn from 
> low-throughput biology.
> 
>> Currently, there's an assumption built fairly widely into JBrowse (and 
>> all other genome browsers as far as I know), which is that the 
>> coordinate system defined by the reference sequence doesn't change on 
>> the fly.  So it'll take a fair chunk of work to be able to show 
>> insertions from resequencing.
> 
> Yeah, that's a tough one.  UCSC has a "solution" in their conservation (multiple 
> genome alignment) track where they put little tick marks where the insertion in 
> the "other" (non-reference) genome occurs, so you don't have to space out the 
> reference genome.  You can't see the inserted sequence unless you click out to a 
> separate page, but it is a simple solution that is decent.
> 
> Cheers,
> Andrew
> 
> 
> 
>> On the other hand, if you're talking about viewing just a small region, 
>> and you want to view it in alignment coordinates, and all of your data 
>> is in aligment coordinates, then the JBrowse part of the work should be 
>> easy to do.  We've talked about displaying per-base data (like sequence, 
>> or a predicted RNA fold) in features; it's not implemented but it should 
>> be straightforward to do.
>>
>> Mitch
>>
>> ------------------------------------------------------------------------------
>> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
>> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
>> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
>> software that enables intelligent coding and step-through debugging.
>> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
>> _______________________________________________
>> Gmod-ajax mailing list
>> Gmo...@li...
>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
> 
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> Gmod-ajax mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax

Re: [Gmod-ajax] next generation sequencing visualization

From: Mitch S. <mit...@be...> - 2009-03-11 17:12:18

Steve Taylor wrote:
> Yes...but we really need a decent alignment viewer at the bp level to see SNPs etc. Can GBrowse display alignments in the panel?
>   

The volume of data is large, right?  So why would someone want to 
eyeball it?  Won't people be running programs to identify SNPs, rather 
than trying to do it manually?

I worked with biologists for several years, so I know how much they like 
to eyeball things.  But if the data volume is large, IMHO it's important 
to push back and advocate automated analysis instead.  I'd hate to do a 
lot of work only to find that after the initial burst of enthusiasm no 
one used it.

Currently, there's an assumption built fairly widely into JBrowse (and 
all other genome browsers as far as I know), which is that the 
coordinate system defined by the reference sequence doesn't change on 
the fly.  So it'll take a fair chunk of work to be able to show 
insertions from resequencing.

On the other hand, if you're talking about viewing just a small region, 
and you want to view it in alignment coordinates, and all of your data 
is in aligment coordinates, then the JBrowse part of the work should be 
easy to do.  We've talked about displaying per-base data (like sequence, 
or a predicted RNA fold) in features; it's not implemented but it should 
be straightforward to do.

Mitch

Re: [Gmod-ajax] next generation sequencing visualization

From: Ian H. <ih...@be...> - 2009-03-11 17:49:43

Incidentally, in case it's not clear, I think that dealing with next-gen 
sequencing data is a **crucial** issue for JBrowse. Any pushback from us 
about high-volume feature tracks is simply about the best short-term way 
to achieve this (innovative visualization strategies, vs simply scaling 
up the idea of a clickable feature track). It is NOT meant to minimize 
the importance of next-gen sequencing data in genome browsers!


Ian Holmes wrote:
> hmmm, I think you can easily construct situations where people might 
> want to eyeball reads at the basepair level. Including insertions 
> (which, fwiw, I think can be displayed a little more easily than per 
> your email, Mitch -- e.g. as popups.)
> 
> Technically I think this comes down to a volume-of-data issue. Point 
> being that you can already visualize short reads in aggregate, by 
> generating a WIG plot of read density (easy) or by generating your own 
> image track (almost as easy).
> 
> The only thing you currently cannot do is load a genome's worth of short 
> reads into your web browser (nor would you want to do this). So, at the 
> level of core tech, this comes down to how you deal with annotation 
> tracks containing millions of features. The obvious answer being that 
> you load them incrementally (e.g. in chunks [as we currently handle 
> sequence] or by CGI range queries).
> 
> As an open source, developer-friendly project, we should be encouraging 
> people (as a first resort) to make maximal use of the APIs and parts 
> that we've already provided. That API should be extended only when it 
> simply fails to meet a significant (empirical) demand.
> 
> So I think that I'd essentially agree with what Mitch said. Consider 
> first what you can do using an image track (it might go further than you 
> think -- e.g. you could display SNPs using a sequence logo) and whether 
> it is at all possible that you could implement this yourself (obviously, 
> with help from us).
> 
> At some point we will implement partial loading extensions that will 
> allow you to eyeball high-volume feature tracks. But this will happen 
> faster if you can demonstrate that you have already pushed back to your 
> users with simpler (image-based) alternatives and they are, 
> nevertheless, in need of a high-volume solution!
> 
> BTW, Sean Eddy has a discussion thread on next-gen sequencing challenges:
> http://selab.janelia.org/people/eddys/blog/?p=86
> 
> Ian
> 
> 
> Mitch Skinner wrote:
>> Steve Taylor wrote:
>>> Yes...but we really need a decent alignment viewer at the bp level to 
>>> see SNPs etc. Can GBrowse display alignments in the panel?
>>>   
>> The volume of data is large, right?  So why would someone want to 
>> eyeball it?  Won't people be running programs to identify SNPs, rather 
>> than trying to do it manually?
>>
>> I worked with biologists for several years, so I know how much they like 
>> to eyeball things.  But if the data volume is large, IMHO it's important 
>> to push back and advocate automated analysis instead.  I'd hate to do a 
>> lot of work only to find that after the initial burst of enthusiasm no 
>> one used it.
>>
>> Currently, there's an assumption built fairly widely into JBrowse (and 
>> all other genome browsers as far as I know), which is that the 
>> coordinate system defined by the reference sequence doesn't change on 
>> the fly.  So it'll take a fair chunk of work to be able to show 
>> insertions from resequencing.
>>
>> On the other hand, if you're talking about viewing just a small region, 
>> and you want to view it in alignment coordinates, and all of your data 
>> is in aligment coordinates, then the JBrowse part of the work should be 
>> easy to do.  We've talked about displaying per-base data (like sequence, 
>> or a predicted RNA fold) in features; it's not implemented but it should 
>> be straightforward to do.
>>
>> Mitch
> 
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> Gmod-ajax mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax

Re: [Gmod-ajax] next generation sequencing visualization

From: Mitch S. <mit...@be...> - 2009-03-12 05:43:15

Ian Holmes wrote:
> Incidentally, in case it's not clear, I think that dealing with next-gen 
> sequencing data is a **crucial** issue for JBrowse. Any pushback from us 
> about high-volume feature tracks is simply about the best short-term way 
> to achieve this (innovative visualization strategies, vs simply scaling 
> up the idea of a clickable feature track).

Well, for my part, the pushback is mainly about clarifying the use 
cases.  I'm not saying that short reads aren't important, but so far I 
haven't seen anyone really articulate the detailed use cases that one 
would need to make good implementation decisions.

Use cases so far -

1.  Andrew is a computational biologist.  He's writing software to 
process short-read data, and he'd like to eyeball the input and output 
of his program.  Does it matter to him if he's looking at alignment 
coordinates or genomic coordinates?  How much genomic context does he 
need/want to see?  Does he care about a zoomed-out view (e.g., to see 
what fraction of the genome has been covered) or a zoomed-in view (e.g., 
to check for off-by-one errors), or both?

2. Elmer the Eyeballer is a biologist.  He wants to get a good gut feel 
for his short-read data, because the gut is the source of the hypotheses 
that one then proceeds to pull from one's rear.  Does he also want to 
use the tool to monitor his resequencing progress?  When he's looking at 
SNPs, is he identifying them manually, or looking at the output of a 
SNP-identifying tool?  If the latter, does he just need to see the SNPs 
or is the original read context important?  If a large number of reads 
are identical, does he need to see each individual one?  Also, the same 
questions as for Andrew: zoomed out/zoomed in, genomic context, 
coordinate system, etc.

Sorry for the snark.  I really do care about Elmer.  It's just not 
immediately clear to me that Elmer wouldn't be better served by an 
alignment viewer.  Does he want a web-based aligment viewer, or (again) 
is it important to include other genomic information?

Or more generally: what kinds of questions are people trying to answer 
when they're eyeballing short read data?

I keep asking questions not because I doubt the value of the enterprise, 
but just because I'd like someone to explain it to me in more detail (or 
point me toward a nice review, or help me find a good person to talk to 
about it).  Well, to be honest, I do wonder if it'll be useful in a 
longer term sense.  Does anyone still look at Sanger sequencing traces?  
Once the base-calling algorithms were debugged, how much did people care 
about the underlying trace data?

Mitch

Re: [Gmod-ajax] next generation sequencing visualization

From: Andrew U. <and...@gm...> - 2009-03-11 17:50:32

Mitch Skinner wrote:
> Steve Taylor wrote:
>> Yes...but we really need a decent alignment viewer at the bp level to see SNPs etc. Can GBrowse display alignments in the panel?
>>   
> 
> The volume of data is large, right?  So why would someone want to 
> eyeball it?  Won't people be running programs to identify SNPs, rather 
> than trying to do it manually?

I have actually found the genome browser to be very useful in debugging such 
automated approaches.  It is frequently much easier to look through the output 
of your program visually, in the genome browser, to spot off-by-1 errors and 
such, than it is to write debug code to get the same answer.  Software 
developers would greatly benefit from visualizing the output of their code in 
the browser, even if the data set is gigantic.

The whole power of genome browsers is that a picture is worth 1000 words, and 
visual correlation is easier than looking at tab-delimited debugging logs.

> I worked with biologists for several years, so I know how much they like 
> to eyeball things.  But if the data volume is large, IMHO it's important 
> to push back and advocate automated analysis instead.  I'd hate to do a 
> lot of work only to find that after the initial burst of enthusiasm no 
> one used it.

In defense of the biologists, eyeballing the data is crucial to forming new 
hypotheses.  And they're not just doing it because it's all they know how to do, 
but it is important in building personal priors.  You want to have an 
expectation of some sort before you design and run a large-scale automated 
analysis.  This is especially true if you are moving into genomics territory so 
poorly understood that you might not even know what to expect, so you have click 
around a bit to get an idea.

This is something I try to teach my students in a graduate computational 
genomics class.  So many of them run into writing algorithms based on bad 
assumptions because they haven't even LOOKED at the initial data.  And then they 
wonder why their accuracy is poor.  Genomics has a lot to learn from 
low-throughput biology.

> Currently, there's an assumption built fairly widely into JBrowse (and 
> all other genome browsers as far as I know), which is that the 
> coordinate system defined by the reference sequence doesn't change on 
> the fly.  So it'll take a fair chunk of work to be able to show 
> insertions from resequencing.

Yeah, that's a tough one.  UCSC has a "solution" in their conservation (multiple 
genome alignment) track where they put little tick marks where the insertion in 
the "other" (non-reference) genome occurs, so you don't have to space out the 
reference genome.  You can't see the inserted sequence unless you click out to a 
separate page, but it is a simple solution that is decent.

Cheers,
Andrew

> On the other hand, if you're talking about viewing just a small region, 
> and you want to view it in alignment coordinates, and all of your data 
> is in aligment coordinates, then the JBrowse part of the work should be 
> easy to do.  We've talked about displaying per-base data (like sequence, 
> or a predicted RNA fold) in features; it's not implemented but it should 
> be straightforward to do.
> 
> Mitch
> 
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> Gmod-ajax mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax

Re: [Gmod-ajax] next generation sequencing visualization

From: Ann L. <alo...@gm...> - 2009-03-11 23:42:29

FYI ..

Here is a link to a short read alignment tool you might like to try,
if you haven't already:

http://bowtie-bio.sourceforge.net/index.shtml

My lab is using it in combination with another program from the
Salzberg lab titled Top Hat, which aims to solve the "how do we map
short reads across introns" problem.

All the best,

Ann Loraine

Re: [Gmod-ajax] next generation sequencing visualization

From: Mitch S. <mit...@be...> - 2009-03-12 05:24:46

Ann Loraine wrote:
> http://bowtie-bio.sourceforge.net/index.shtml
>
> My lab is using it in combination with another program from the
> Salzberg lab titled Top Hat, which aims to solve the "how do we map
> short reads across introns" problem.
>   

I skimmed the web pages for bowtie (Burrows-Wheeler!  very cool) and top 
hat.  Does top hat generate its bed and wig output in alignment coords 
or refseq coords?  What do you currently do with the top hat output?

Mitch

Re: [Gmod-ajax] next generation sequencing visualization

From: Dave C. <cle...@ne...> - 2009-03-12 03:26:27

Hello all,

Here's my 2¢ worth.

At the PAG meeting in January almost everyone I saw that was talking about
next-gen sequencing data showed detailed alignments.  My reaction was
similar to Mitch's: Why?  However, I am sympathetic to Andrew's arguments.
There's no substitute for spot checking through eye-balling.

The folks I'm working with at Oregon on some high-throughput data are using
MAQ for alignments and they seem happy with it.  See
http://maq.sourceforge.net/.  They do visualization of the alignments in
Maqviewer.  By the time it makes it into GBrowse (where I come in) it's all
been summarized.  GBrowse supports symantic zooming which would let you go
from WIG to individual alignments (although I haven't tried showing the
alignments yet).  I'll see what I can figure out in the next week or so.

Mitch/Ian: does JBrowse support semantic browsing a la GBrowse, where you
can specify thresholds for switching from one view to the other?

Also, I'm doing a workshop on using GMOD tools for next gen data in a few
weeks.  If you have something in JBrowse you would like me to show, please
let me know.  I suspect it would make a wizzy demo.

Dave C.

On Wed, Mar 11, 2009 at 7:42 PM, Ann Loraine <alo...@gm...> wrote:

> FYI ..
>
> Here is a link to a short read alignment tool you might like to try,
> if you haven't already:
>
> http://bowtie-bio.sourceforge.net/index.shtml
>
> My lab is using it in combination with another program from the
> Salzberg lab titled Top Hat, which aims to solve the "how do we map
> short reads across introns" problem.
>
> All the best,
>
> Ann Loraine
>
>
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> Gmod-ajax mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>

Re: [Gmod-ajax] next generation sequencing visualization

From: Mitch S. <mit...@be...> - 2009-03-12 05:24:18

Dave Clements wrote:
> Mitch/Ian: does JBrowse support semantic browsing a la GBrowse, where 
> you can specify thresholds for switching from one view to the other?

In a sense, yes.  As you zoom in, the track for displaying features 
switches from showing a feature density histogram, to showing individual 
features, to showing features with labels and subfeatures.  There's 
currently not a general mechanism for switching from one kind of view to 
another (e.g., from a wiggle (image) track to a feature track); the 
feature track uses its knowledge about the number of features to choose 
the zoom thresholds where it makes those transitions, so those 
transitions are currently all implemented in FeatureTrack.

But it should be reasonably straightforward to add a new kind of track 
that switches between ImageTrack and FeatureTrack at a set threshold.

Mitch

Re: [Gmod-ajax] next generation sequencing visualization

From: Steve T. <ste...@im...> - 2009-03-12 10:34:35

Hi Mitch,

>> Yes...but we really need a decent alignment viewer at the bp level to 
>> see SNPs etc. Can GBrowse display alignments in the panel?
>>   
> 
> The volume of data is large, right?  So why would someone want to 
> eyeball it?  Won't people be running programs to identify SNPs, rather 
> than trying to do it manually?

Yes, but there will often be a need to eyeball low level data.

> 
> I worked with biologists for several years, so I know how much they like 
> to eyeball things.  But if the data volume is large, IMHO it's important 
> to push back and advocate automated analysis instead.  I'd hate to do a 
> lot of work only to find that after the initial burst of enthusiasm no 
> one used it.

> 
> Currently, there's an assumption built fairly widely into JBrowse (and 
> all other genome browsers as far as I know), which is that the 
> coordinate system defined by the reference sequence doesn't change on 
> the fly.  So it'll take a fair chunk of work to be able to show 
> insertions from resequencing.

I agree this is maybe too much to do now though it is going to be have to be thought about at some stage since there will be many 'reference' genomes. I'd be happy just to see 'simple' alignments for 
now:-).

> 
> On the other hand, if you're talking about viewing just a small region, 
> and you want to view it in alignment coordinates, and all of your data 
> is in aligment coordinates, then the JBrowse part of the work should be 
> easy to do.  We've talked about displaying per-base data (like sequence, 
> or a predicted RNA fold) in features; it's not implemented but it should 
> be straightforward to do.


As an example a popular program is called maq to align Illumina reads to a reference sequence. A useful visualization is its 'pileup' output which shows a vertical output with the reference sequence,
orientation of reads and base changes. In my experience Maqviewer doesn't seem to work on some of the systems I tried (due to some GTK issues) and reports from people that have got it working say its 
pretty basic. I run a core group and getting things running on everyone's desktops (which usually means in a browser) saves everyone's time...

In addition an alignment display would be useful for viewing conservation across species at the base pair level and is already part of the UCSC browser.

Maybe the back end could use SAM/BAM format to store the alignments http://samtools.sourceforge.net/ which appears to be a useful emerging standard. Note there is a ASCII viewer as part of the package 
which I'll check out.

Steve
------------------------------------------------------------------
Medical Sciences Division
Weatherall Institute of Molecular Medicine/Sir William Dunn School
Oxford University

Re: [Gmod-ajax] next generation sequencing visualization

From: Jason S. <ja...@bi...> - 2009-03-12 13:28:02

> As an example a popular program is called maq to align Illumina  
> reads to a reference sequence. A useful visualization is its  
> 'pileup' output which shows a vertical output with the reference  
> sequence, orientation of reads and base changes. In my experience  
> Maqviewer doesn't seem to work on some of the systems I tried (due  
> to some GTK issues) and reports from people that have got it working  
> say its pretty basic. I run a core group and getting things running  
> on everyone's desktops (which usually means in a browser) saves  
> everyone's time...
>
Tools like EagleView from the MarthLab http://bioinformatics.bc.edu/marthlab/EagleView 
  are really appropriate for this too.

> In addition an alignment display would be useful for viewing  
> conservation across species at the base pair level and is already  
> part of the UCSC browser.

The per base conservation can come from wiggle-style plotting I would  
think is already a good way to solve this.

>
> Maybe the back end could use SAM/BAM format to store the alignments http://samtools.sourceforge.net/ 
>  which appears to be a useful emerging standard. Note there is a  
> ASCII viewer as part of the package
> which I'll check out.
>
> Steve
> ------------------------------------------------------------------
> Medical Sciences Division
> Weatherall Institute of Molecular Medicine/Sir William Dunn School
> Oxford University
>
>
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM)  
> are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly  
> and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based  
> development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> Gmod-ajax mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax

Jason Stajich
ja...@bi...

Re: [Gmod-ajax] next generation sequencing visualization

From: Mitch S. <mit...@be...> - 2009-03-12 19:08:01

Steve Taylor wrote:
> In addition an alignment display would be useful for viewing 
> conservation across species at the base pair level and is already part 
> of the UCSC browser.

This part we can almost do now.  The client-side work for showing 
sequence is done, although there's currently no mechanim for 
insertions.  The client side sequence stuff was written with the ref seq 
in mind; for alignments, gaps are easy to represent, and display, but 
insertions would need more work to do something like UCSC does.

It ought to be straightforward, though.  It's the short-read stuff 
that's more of a challenge, although with the incremental feature 
loading that Ian was talking about it should be do-able, as long as the 
density of reads isn't too ungodly high.  We have to do incrememtal 
loading anyway for other kinds of dense tracks.

Actually, what is a reasonable (rough estimate) upper bound on the read 
density?  Is this something we can expect to change over time?  Also, 
does anyone have some data I could test with?

Mitch