From: Mitch S. <mit...@be...> - 2009-03-12 05:43:15
|
Ian Holmes wrote: > Incidentally, in case it's not clear, I think that dealing with next-gen > sequencing data is a **crucial** issue for JBrowse. Any pushback from us > about high-volume feature tracks is simply about the best short-term way > to achieve this (innovative visualization strategies, vs simply scaling > up the idea of a clickable feature track). Well, for my part, the pushback is mainly about clarifying the use cases. I'm not saying that short reads aren't important, but so far I haven't seen anyone really articulate the detailed use cases that one would need to make good implementation decisions. Use cases so far - 1. Andrew is a computational biologist. He's writing software to process short-read data, and he'd like to eyeball the input and output of his program. Does it matter to him if he's looking at alignment coordinates or genomic coordinates? How much genomic context does he need/want to see? Does he care about a zoomed-out view (e.g., to see what fraction of the genome has been covered) or a zoomed-in view (e.g., to check for off-by-one errors), or both? 2. Elmer the Eyeballer is a biologist. He wants to get a good gut feel for his short-read data, because the gut is the source of the hypotheses that one then proceeds to pull from one's rear. Does he also want to use the tool to monitor his resequencing progress? When he's looking at SNPs, is he identifying them manually, or looking at the output of a SNP-identifying tool? If the latter, does he just need to see the SNPs or is the original read context important? If a large number of reads are identical, does he need to see each individual one? Also, the same questions as for Andrew: zoomed out/zoomed in, genomic context, coordinate system, etc. Sorry for the snark. I really do care about Elmer. It's just not immediately clear to me that Elmer wouldn't be better served by an alignment viewer. Does he want a web-based aligment viewer, or (again) is it important to include other genomic information? Or more generally: what kinds of questions are people trying to answer when they're eyeballing short read data? I keep asking questions not because I doubt the value of the enterprise, but just because I'd like someone to explain it to me in more detail (or point me toward a nice review, or help me find a good person to talk to about it). Well, to be honest, I do wonder if it'll be useful in a longer term sense. Does anyone still look at Sanger sequencing traces? Once the base-calling algorithms were debugged, how much did people care about the underlying trace data? Mitch |