From: Mitch S. <mit...@be...> - 2009-11-20 05:53:18
|
Sorry it took so long to get back to you. I've replied inline below - Caroline wrote: >> It should also be possible to use SAM/BAM with JBrowse using >> Bio::SamTools, but I haven't tested it yet. >> > > I had a go at this and it is. Patch attached (first time I've done this > with git, give me a shout if it's not the right format). > Awesome. The patch looks great; I've applied a slightly different version that makes Bio::DB::Sam optional: http://github.com/jbrowse/jbrowse/commit/82f29deedb57051bff81f9dc311bfd80b554e8e9 It's currently just on the "lazyfeatures" branch because I don't think it's that useful without the other stuff on that branch. Longer term, I'd like to try doing pulls from people's git repos, which will (e.g.) preserve authorship information in the git metadata. But patches are also fine and I'm happy to get them. This one worked fine for me. It still uses much more memory than it should; that can be addressed but I think it'll take more work re-arranging the interface between JBrowse's JsonGenerator and its clients. > Awesome! What's the plan for this? I'm trying to knock up a sequence > server (perl, Catalyst) that will hand over our ChIPseq data from BAM > files bit by bit and eventually do the same for remote BAM files, DAS > servers and so on. You mentioned this on the mailing list ages ago, but > this is the first chance I've had to get around to it. > > Can you point me in the right direction for getting it to play nicely > with JBrowse? What will the lazyfeatures track expect from the server? > How does it deal with zooming - can the server just decide to only > return a hist summary at some point? What about caching? Does the > browser grab data in defined chunks? What else should I be worrying > about? I wrote this to try and answer these questions: http://biowiki.org/view/JBrowse/LazyFeatureLoading The short version is: yes, there's a hist summary; currently the hist counts are generated at a zoom level that's hard-coded. That's a terrible hack, and doing something smarter is definitely on the list. The client does grab data in defined chunks, and caches those. After I had implemented lazy loading in JBrowse, I found out about the lazy/partial loading work that Heng Li has done for BAM and that Jim Kent has done for his BigBed/BigWig format. There was a big thread on samtools-devel about it: http://sourceforge.net/mailarchive/forum.php?thread_name=6dce9a0b0911150626o701e07baq2c97c4135e5ffda9%40mail.gmail.com&forum_name=samtools-devel There are a few messages from me in there that try to compare the JBrowse approach to the BAM and BigBed approaches. In the end, each of us came up with something different; Heng Li is using binning, Jim Kent is using r-trees, and I'm using NCLists. I don't think we can directly adopt either of the other two solutions for JBrowse, because they're doing a lot of bit-twiddling that I think would be hard to do in a web browser (I'm happy to have someone prove me wrong though, and I'd be happy to talk about it in more detail if people are interested). So my next thought was to wonder if I (or someone) could write a proxy that could act as a BAM/BigBed client and then serve JSON to JBrowse. I think it could be done but it's not 100% clear in my head how to do it. I'd be happy to talk about what I've been thinking so far if you're interested in tackling this. Earlier this year, I said that I didn't want to make the JBrowse JSON format a public thing because I wanted to be able to change it at will. Thinking about it some more, there are some aspects of the format that I think are pretty solid, and some other parts that are pretty likely to change. It might be possible to split out the likely-to-change bits from the unlikely-to-change bits; earlier I was worried about splitting things up too much and ending up with too many server round-trips, but maybe not. I'll write up a description of what's in there now and then we could talk about where to go from there. Thanks for the patch, Mitch |