Re: [Gmod-ajax] next generation sequencing visualization

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Ian Holmes wrote:
> Incidentally, in case it's not clear, I think that dealing with next-gen 
> sequencing data is a **crucial** issue for JBrowse. Any pushback from us 
> about high-volume feature tracks is simply about the best short-term way 
> to achieve this (innovative visualization strategies, vs simply scaling 
> up the idea of a clickable feature track).

Well, for my part, the pushback is mainly about clarifying the use 
cases.  I'm not saying that short reads aren't important, but so far I 
haven't seen anyone really articulate the detailed use cases that one 
would need to make good implementation decisions.

Use cases so far -

1.  Andrew is a computational biologist.  He's writing software to 
process short-read data, and he'd like to eyeball the input and output 
of his program.  Does it matter to him if he's looking at alignment 
coordinates or genomic coordinates?  How much genomic context does he 
need/want to see?  Does he care about a zoomed-out view (e.g., to see 
what fraction of the genome has been covered) or a zoomed-in view (e.g., 
to check for off-by-one errors), or both?

2. Elmer the Eyeballer is a biologist.  He wants to get a good gut feel 
for his short-read data, because the gut is the source of the hypotheses 
that one then proceeds to pull from one's rear.  Does he also want to 
use the tool to monitor his resequencing progress?  When he's looking at 
SNPs, is he identifying them manually, or looking at the output of a 
SNP-identifying tool?  If the latter, does he just need to see the SNPs 
or is the original read context important?  If a large number of reads 
are identical, does he need to see each individual one?  Also, the same 
questions as for Andrew: zoomed out/zoomed in, genomic context, 
coordinate system, etc.

Sorry for the snark.  I really do care about Elmer.  It's just not 
immediately clear to me that Elmer wouldn't be better served by an 
alignment viewer.  Does he want a web-based aligment viewer, or (again) 
is it important to include other genomic information?

Or more generally: what kinds of questions are people trying to answer 
when they're eyeballing short read data?

I keep asking questions not because I doubt the value of the enterprise, 
but just because I'd like someone to explain it to me in more detail (or 
point me toward a nice review, or help me find a good person to talk to 
about it).  Well, to be honest, I do wonder if it'll be useful in a 
longer term sense.  Does anyone still look at Sanger sequencing traces?  
Once the base-calling algorithms were debugged, how much did people care 
about the underlying trace data?

Mitch