Re: [Gmod-ajax] Jbrowse track from external URL?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 07/08/2009 09:24 AM, Caroline wrote:
> I've just had a quick look at JBrowse - am I right in thinking that it
> expects JSON-formatted feature data? Is there a definition of the data
> format anywhere?
>    

The use case of including external data in JBrowse is an important one 
and I'm very keen on making it work, but there are a bunch of caveats:

You're right that the javascript client does expect JSON-formatted 
feature data, but I've been treating the JSON format as an internal 
implementation detail rather than a public interface.  As JBrowse 
development has progressed, I've been changing the specifics of what's 
in the JSON, and I'd like to continue to have the freedom to do that for 
at least a little while longer.

The public interface for getting data into JBrowse is from an 
implementation of the bioperl Bio::DasI interface (like Bio::DB::GFF, 
Bio::DB::SeqFeature::Store, or Bio::DB:Das::Chado) or from flatfiles 
(GFF or BED, currently).  JBrowse includes perl scripts for generating 
the JSON from those sources.

If I'm understanding you correctly, you're suggesting that the JBrowse 
javascript client could get data directly from outside servers.  That 
won't work exactly that way, though, because a javascript client can't 
get data from more than one server (this is a security restriction 
imposed by the web browsers).  We could theoretically get around that 
restriction if the datasources each served some javascript glue code 
(the "mashup" approach), but that means we'd have to completely trust 
each data source, and that makes me uncomfortable.  ("mashups are 
self-inflicted cross-site-scripting attacks", is the way Doug Crockford 
puts it)

The mashup approach might work okay for plain JBrowse as it is right 
now, but if someone embedded JBrowse in an application that involved 
logging in, and allowed arbitrary external URLs, then that approach 
would be a huge security hole.

Instead, the web browser should (in my opinion) be getting everything 
from a single server.  That server might get data from other servers, 
though (some people distinguish "mashup in the browser" from "mashup in 
the server"; this would be the latter).  So I've been thinking about 
writing a proxy to convert external URL sources (with, say, DAS or 
flatfiles) into JBrowse's JSON.  The proxy would run on a JBrowse 
server, get data from an external URL, convert it, and then serve it to 
the client.

It should be pretty straightforward to glue an external-URL front end to 
JBrowse's JSON-generating code.  The JSON-generating code expects a 
chromosome's worth of data at a time, though, so that approach might not 
scale well to very large feature sets.  How many features (per 
chromosome) would be in a given track for you?

On the other hand, large feature sets might be okay as long as the 
server serving them (and our hypothetical external-URL front end) 
supports HTTP caching (conditional GETs) properly.  Then the conversion 
work would only have to be done when the data changes.  Still, the 
JSON-generating code is implemented using bioperl, so there would still 
be some scalability issues.  In that case, an implementation in another 
language is certainly a possibility.

I'm probably not going to be able to get to that for a month or two, 
though (maybe more).  If someone else wants to tackle it, I'd be happy 
to help and answer questions.  If you want something today, you could 
just run your own JBrowse server and aggregate other people's data and 
the standard annotation tracks yourself.  If you do that, you might also 
be able to use Ian's TWiki plugin to allow other people to upload their 
own data; I'm not sure if he considers it a proof-of-concept or 
production-ready.

Regards,
Mitch