From: Mitch S. <mit...@be...> - 2009-07-08 19:34:51
|
On 07/08/2009 09:24 AM, Caroline wrote: > I've just had a quick look at JBrowse - am I right in thinking that it > expects JSON-formatted feature data? Is there a definition of the data > format anywhere? > The use case of including external data in JBrowse is an important one and I'm very keen on making it work, but there are a bunch of caveats: You're right that the javascript client does expect JSON-formatted feature data, but I've been treating the JSON format as an internal implementation detail rather than a public interface. As JBrowse development has progressed, I've been changing the specifics of what's in the JSON, and I'd like to continue to have the freedom to do that for at least a little while longer. The public interface for getting data into JBrowse is from an implementation of the bioperl Bio::DasI interface (like Bio::DB::GFF, Bio::DB::SeqFeature::Store, or Bio::DB:Das::Chado) or from flatfiles (GFF or BED, currently). JBrowse includes perl scripts for generating the JSON from those sources. If I'm understanding you correctly, you're suggesting that the JBrowse javascript client could get data directly from outside servers. That won't work exactly that way, though, because a javascript client can't get data from more than one server (this is a security restriction imposed by the web browsers). We could theoretically get around that restriction if the datasources each served some javascript glue code (the "mashup" approach), but that means we'd have to completely trust each data source, and that makes me uncomfortable. ("mashups are self-inflicted cross-site-scripting attacks", is the way Doug Crockford puts it) The mashup approach might work okay for plain JBrowse as it is right now, but if someone embedded JBrowse in an application that involved logging in, and allowed arbitrary external URLs, then that approach would be a huge security hole. Instead, the web browser should (in my opinion) be getting everything from a single server. That server might get data from other servers, though (some people distinguish "mashup in the browser" from "mashup in the server"; this would be the latter). So I've been thinking about writing a proxy to convert external URL sources (with, say, DAS or flatfiles) into JBrowse's JSON. The proxy would run on a JBrowse server, get data from an external URL, convert it, and then serve it to the client. It should be pretty straightforward to glue an external-URL front end to JBrowse's JSON-generating code. The JSON-generating code expects a chromosome's worth of data at a time, though, so that approach might not scale well to very large feature sets. How many features (per chromosome) would be in a given track for you? On the other hand, large feature sets might be okay as long as the server serving them (and our hypothetical external-URL front end) supports HTTP caching (conditional GETs) properly. Then the conversion work would only have to be done when the data changes. Still, the JSON-generating code is implemented using bioperl, so there would still be some scalability issues. In that case, an implementation in another language is certainly a possibility. I'm probably not going to be able to get to that for a month or two, though (maybe more). If someone else wants to tackle it, I'd be happy to help and answer questions. If you want something today, you could just run your own JBrowse server and aggregate other people's data and the standard annotation tracks yourself. If you do that, you might also be able to use Ian's TWiki plugin to allow other people to upload their own data; I'm not sure if he considers it a proof-of-concept or production-ready. Regards, Mitch |