You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(21) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(15) |
Feb
(34) |
Mar
(20) |
Apr
(19) |
May
(15) |
Jun
(15) |
Jul
(10) |
Aug
(6) |
Sep
(3) |
Oct
(1) |
Nov
|
Dec
(3) |
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
2009 |
Jan
(3) |
Feb
|
Mar
(27) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(16) |
Aug
(19) |
Sep
(55) |
Oct
(51) |
Nov
(15) |
Dec
(10) |
2010 |
Jan
(11) |
Feb
(3) |
Mar
(22) |
Apr
(13) |
May
(9) |
Jun
(23) |
Jul
(59) |
Aug
(63) |
Sep
(24) |
Oct
(46) |
Nov
(20) |
Dec
(14) |
2011 |
Jan
(16) |
Feb
(16) |
Mar
(4) |
Apr
(9) |
May
(3) |
Jun
(5) |
Jul
(1) |
Aug
(3) |
Sep
(6) |
Oct
(7) |
Nov
|
Dec
(5) |
2012 |
Jan
(6) |
Feb
(37) |
Mar
(24) |
Apr
(24) |
May
(19) |
Jun
(26) |
Jul
(14) |
Aug
(21) |
Sep
(27) |
Oct
(16) |
Nov
(43) |
Dec
(42) |
2013 |
Jan
(24) |
Feb
(26) |
Mar
(31) |
Apr
(56) |
May
(82) |
Jun
(79) |
Jul
(30) |
Aug
(76) |
Sep
(40) |
Oct
(85) |
Nov
(105) |
Dec
(136) |
2014 |
Jan
(92) |
Feb
(84) |
Mar
(48) |
Apr
(84) |
May
(80) |
Jun
(46) |
Jul
(104) |
Aug
(70) |
Sep
(74) |
Oct
(53) |
Nov
(36) |
Dec
(3) |
2015 |
Jan
(10) |
Feb
(37) |
Mar
(52) |
Apr
(30) |
May
(101) |
Jun
(42) |
Jul
(32) |
Aug
(25) |
Sep
(50) |
Oct
(60) |
Nov
(74) |
Dec
(41) |
2016 |
Jan
(26) |
Feb
(42) |
Mar
(89) |
Apr
(26) |
May
(50) |
Jun
(66) |
Jul
(54) |
Aug
(65) |
Sep
(57) |
Oct
(9) |
Nov
(42) |
Dec
(7) |
2017 |
Jan
(37) |
Feb
(24) |
Mar
(22) |
Apr
(22) |
May
(39) |
Jun
(57) |
Jul
(10) |
Aug
(39) |
Sep
(17) |
Oct
(43) |
Nov
(18) |
Dec
(32) |
2018 |
Jan
(31) |
Feb
(29) |
Mar
(23) |
Apr
(31) |
May
(13) |
Jun
(21) |
Jul
(32) |
Aug
(42) |
Sep
(25) |
Oct
(36) |
Nov
(16) |
Dec
(5) |
2019 |
Jan
(35) |
Feb
(25) |
Mar
(13) |
Apr
(3) |
May
(9) |
Jun
(9) |
Jul
(22) |
Aug
(19) |
Sep
(4) |
Oct
(5) |
Nov
(3) |
Dec
(1) |
2020 |
Jan
(9) |
Feb
(22) |
Mar
(13) |
Apr
(7) |
May
(4) |
Jun
(8) |
Jul
(9) |
Aug
(13) |
Sep
(24) |
Oct
(8) |
Nov
(21) |
Dec
(10) |
2021 |
Jan
(9) |
Feb
(4) |
Mar
(33) |
Apr
(9) |
May
(7) |
Jun
(1) |
Jul
(8) |
Aug
(14) |
Sep
(15) |
Oct
(10) |
Nov
(10) |
Dec
(2) |
2022 |
Jan
(8) |
Feb
(14) |
Mar
(17) |
Apr
(6) |
May
(37) |
Jun
(20) |
Jul
(7) |
Aug
(17) |
Sep
(2) |
Oct
(8) |
Nov
(11) |
Dec
|
2023 |
Jan
(6) |
Feb
|
Mar
(3) |
Apr
(6) |
May
(10) |
Jun
(16) |
Jul
(2) |
Aug
(3) |
Sep
(18) |
Oct
(9) |
Nov
(8) |
Dec
(14) |
2024 |
Jan
(5) |
Feb
(2) |
Mar
(11) |
Apr
(10) |
May
(4) |
Jun
(2) |
Jul
(4) |
Aug
|
Sep
|
Oct
(5) |
Nov
(8) |
Dec
|
2025 |
Jan
(3) |
Feb
|
Mar
(3) |
Apr
(7) |
May
(5) |
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Martin A. H. <ma...@ma...> - 2009-11-20 07:14:04
|
Yup, that is exactly where I downloaded from - and the tarball was funny. Now I use a clone from GIT and there is no trouble. How do you determine current version? Is there a version/revision number? Martin On Fri, Nov 20, 2009 at 7:09 AM, Mitch Skinner <mit...@be...>wrote: > Martin A. Hansen wrote: > >> The problem is you need to use the source fetched with git. The tarball is >> outdated. >> > > Again, where did you get the tarball from? > > The right way to get a tarball is from github; go to > http://github.com/jbrowse/jbrowse > > and click on the "download" button. I just tried it out, and it was the > current version. > > Mitch > |
From: Mitch S. <mit...@be...> - 2009-11-20 06:51:17
|
Martin A. Hansen wrote: > I am wondering why Jbrowse files like refSeqs.js and trackInfo.js are > not in JSON format, but rather have a variable name, an equal sign, > and then, on the following lines, a block of JSON code: > > refSeqs = > [ > { > "length" : 50001, > "name" : "ctgA", > "seqDir" : "data/seq/ctgA", > "seqChunkSize" : 20000, > "end" : 50001, > "start" : 0 > } > ] > > Why not encode the entire thing in JSON? It's because of the caching behavior of XMLHTTPRequest in (at least some versions of) IE. For files that the browser knows about directly (files that show up in <script> elements in the HTML), the browser will check if they've changed when the user refreshes the page. But in IE, files that were fetched using XMLHTTPRequest always come from the cache (unless you add a query parameter that changes with each request so that the file never comes from the cache). I wanted the client fetch the file if (and only if) it had changed, so I figured I'd add an assignment line like the one above and put the file in a <script> element rather than getting it via XHR. Also, getting the refseq and track list information using a <script> element means that the data will already have been fetched when the embedder calls the Browser constructor. On the other hand, you could argue that that constructor should just be getting URLs and doing its own fetching, if (say) we wanted to refresh the track list without refreshing the page. As you've noticed, the track data and name tree files are just straight JSON, and those are fetched with XHR, so I guess there's a chance of IE getting stale data if one of those files has changed. Googling just now, it looks like there may be some nicer workarounds for this IE bug: http://en.wikibooks.org/wiki/JavaScript/XMLHttpRequest#Caching so changing this in JBrowse is probably something we should do at some point. Mitch |
From: Mitch S. <mit...@be...> - 2009-11-20 06:20:51
|
James Casbon wrote: > No, I just queried the GFF3 spec and apparently the canonical gene is correct. > > jbrowse's rendering could be considered correct as the glyph does span > the entire feature. I'm not sure it couldn't be improved though by > actually showing the component blocks in the same way as subfeatures. > I talked about this on the GMOD teleconference on Monday, and learned about BioPerl's SeqFeatureI::location mechanism that allows a single feature to have a location with multiple discontinuous spans. JBrowse should probably support this, but unfortunately it currently doesn't. The work-around at the moment would be to model your CDS segments as individual subfeatures, with different (or no) IDs and a Parent attribute pointing to the transcript. So far in my testing, I've mainly encountered situations where people were doing that (modeling CDS segments as multiple subfeatures rather than a single feature with multiple locations). I'd be interested to know if people reading this typically encounter one approach or the other, and whether y'all have strong opinions about this one way or the other. Sorry about that, Mitch |
From: Mitch S. <mit...@be...> - 2009-11-20 06:09:40
|
Martin A. Hansen wrote: > The problem is you need to use the source fetched with git. The > tarball is outdated. Again, where did you get the tarball from? The right way to get a tarball is from github; go to http://github.com/jbrowse/jbrowse and click on the "download" button. I just tried it out, and it was the current version. Mitch |
From: Mitch S. <mit...@be...> - 2009-11-20 05:53:18
|
Sorry it took so long to get back to you. I've replied inline below - Caroline wrote: >> It should also be possible to use SAM/BAM with JBrowse using >> Bio::SamTools, but I haven't tested it yet. >> > > I had a go at this and it is. Patch attached (first time I've done this > with git, give me a shout if it's not the right format). > Awesome. The patch looks great; I've applied a slightly different version that makes Bio::DB::Sam optional: http://github.com/jbrowse/jbrowse/commit/82f29deedb57051bff81f9dc311bfd80b554e8e9 It's currently just on the "lazyfeatures" branch because I don't think it's that useful without the other stuff on that branch. Longer term, I'd like to try doing pulls from people's git repos, which will (e.g.) preserve authorship information in the git metadata. But patches are also fine and I'm happy to get them. This one worked fine for me. It still uses much more memory than it should; that can be addressed but I think it'll take more work re-arranging the interface between JBrowse's JsonGenerator and its clients. > Awesome! What's the plan for this? I'm trying to knock up a sequence > server (perl, Catalyst) that will hand over our ChIPseq data from BAM > files bit by bit and eventually do the same for remote BAM files, DAS > servers and so on. You mentioned this on the mailing list ages ago, but > this is the first chance I've had to get around to it. > > Can you point me in the right direction for getting it to play nicely > with JBrowse? What will the lazyfeatures track expect from the server? > How does it deal with zooming - can the server just decide to only > return a hist summary at some point? What about caching? Does the > browser grab data in defined chunks? What else should I be worrying > about? I wrote this to try and answer these questions: http://biowiki.org/view/JBrowse/LazyFeatureLoading The short version is: yes, there's a hist summary; currently the hist counts are generated at a zoom level that's hard-coded. That's a terrible hack, and doing something smarter is definitely on the list. The client does grab data in defined chunks, and caches those. After I had implemented lazy loading in JBrowse, I found out about the lazy/partial loading work that Heng Li has done for BAM and that Jim Kent has done for his BigBed/BigWig format. There was a big thread on samtools-devel about it: http://sourceforge.net/mailarchive/forum.php?thread_name=6dce9a0b0911150626o701e07baq2c97c4135e5ffda9%40mail.gmail.com&forum_name=samtools-devel There are a few messages from me in there that try to compare the JBrowse approach to the BAM and BigBed approaches. In the end, each of us came up with something different; Heng Li is using binning, Jim Kent is using r-trees, and I'm using NCLists. I don't think we can directly adopt either of the other two solutions for JBrowse, because they're doing a lot of bit-twiddling that I think would be hard to do in a web browser (I'm happy to have someone prove me wrong though, and I'd be happy to talk about it in more detail if people are interested). So my next thought was to wonder if I (or someone) could write a proxy that could act as a BAM/BigBed client and then serve JSON to JBrowse. I think it could be done but it's not 100% clear in my head how to do it. I'd be happy to talk about what I've been thinking so far if you're interested in tackling this. Earlier this year, I said that I didn't want to make the JBrowse JSON format a public thing because I wanted to be able to change it at will. Thinking about it some more, there are some aspects of the format that I think are pretty solid, and some other parts that are pretty likely to change. It might be possible to split out the likely-to-change bits from the unlikely-to-change bits; earlier I was worried about splitting things up too much and ending up with too many server round-trips, but maybe not. I'll write up a description of what's in there now and then we could talk about where to go from there. Thanks for the patch, Mitch |
From: Martin A. H. <ma...@ma...> - 2009-11-16 14:05:30
|
Hello, I am wondering why Jbrowse files like refSeqs.js and trackInfo.js are not in JSON format, but rather have a variable name, an equal sign, and then, on the following lines, a block of JSON code: refSeqs = [ { "length" : 50001, "name" : "ctgA", "seqDir" : "data/seq/ctgA", "seqChunkSize" : 20000, "end" : 50001, "start" : 0 } ] Why not encode the entire thing in JSON? { "refSeqs" : [ { "length" : 50001, "name" : "ctgA", "seqDir" : "data/seq/ctgA", "start" : 0, "end" : 50001, "seqChunkSize" : 20000 } ] } Below is a little test. Cheers, Martin #!/usr/bin/env perl use warnings; use strict; use JSON; use Data::Dumper; my ( $json, $data ); local $/ = undef; $json = <DATA>; $data = JSON::from_json( $json ); print Dumper( $data ); __DATA__ { "refSeqs" : [ { "length" : 50001, "name" : "ctgA", "seqDir" : "data/seq/ctgA", "start" : 0, "end" : 50001, "seqChunkSize" : 20000 } ] } |
From: James C. <ca...@gm...> - 2009-11-15 14:46:26
|
2009/11/15 James Casbon <ca...@gm...>: > Now unfortunately, jbrowse doesn't render the CDSs correctly. This is > because there are multiple lines with the same ID which correspond to > a spliced product that are shown as one giant CDS in jbrowse. > Changing the IDs to be different allows jbrowse to render it > correctly. No, I just queried the GFF3 spec and apparently the canonical gene is correct. jbrowse's rendering could be considered correct as the glyph does span the entire feature. I'm not sure it couldn't be improved though by actually showing the component blocks in the same way as subfeatures. James |
From: James C. <ca...@gm...> - 2009-11-15 14:04:33
|
Hi everyone, Just before I get to my point, I want to say congratulations on JBrowse. I think it is the best genome browser I have seen by a long way. So I thought I would try out jbrowse with some GFF3 and to warm me up I went straight to 'the canonical gene' from here: http://www.sequenceontology.org/gff3.shtml Now unfortunately, jbrowse doesn't render the CDSs correctly. This is because there are multiple lines with the same ID which correspond to a spliced product that are shown as one giant CDS in jbrowse. Changing the IDs to be different allows jbrowse to render it correctly. So looking further up the GFF3 definition: "ID Indicates the name of the feature. IDs must be unique within the scope of the GFF file." Hmmm... so the canonical gene is wrong! I suppose this should really get flagged as a warning when the GFF is parsed rather than silently merging features. James |
From: Dave C. G. H. D. <he...@gm...> - 2009-11-14 19:09:28
|
Hi Martin, If grabbing the source with GIT doesn't do the trick for you, there is also a tutorial at http://gmod.ork/wiki/JBrowse_Tutorial, which was written this August. Dave C. On Sat, Nov 14, 2009 at 4:27 PM, Martin A. Hansen <ma...@ma...> wrote: > The problem is you need to use the source fetched with git. The tarball is > outdated. > > > Martin > > On Sat, Nov 14, 2009 at 3:10 PM, Martin A. Hansen <ma...@ma...> wrote: >> >> I grabbed a fresh copy of Jbrowse and set about to install the test data >> following the tutorial: >> >> http://jbrowse.org/code/jbrowse-master/docs/tutorial/ >> >> However, at step one: >> >> bin/prepare-refseqs.pl --fasta docs/tutorial/data_files/volvox.fa >> >> Unknown option: fasta >> USAGE: >> bin/prepare-refseqs.pl [--out <output directory>] --gff <gff file >> describing refseqs> >> OR: >> bin/prepare-refseqs.pl [--out <output directory>] [--noseq] >> [--seqdir <sequence data directory>] --conf <JBrowse config file> --refs >> <list of refseq names> --refids <list of refseq IDs> >> <output directory>: defaults to "data" >> <sequence data directory>: chunks of sequence go here; defaults to >> "<output directory>/seq" >> >> --noseq: do not prepare sequence data for the client. >> >> You can use a GFF file to describe the reference sequences, or >> you can use a JBrowse config file and a list of refseq names >> or a list of refseq IDs. If you use a GFF file, it should >> contain ##sequence-region lines as described in the GFF specs. >> If you use a JBrowse config file, you can either provide a >> (comma-separated) list of refseq names, or (if the names >> aren't globally unique) a list of refseq IDs. >> >> >> What is going on? >> >> >> >> Martin > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax > > -- Please keep responses on the list! http://gmod.org/wiki/January_2010_GMOD_Meeting http://gmod.org/wiki/GMOD_News Was this helpful? http://gmod.org/wiki/Help_Desk_Feedback |
From: Martin A. H. <ma...@ma...> - 2009-11-14 15:27:52
|
The problem is you need to use the source fetched with git. The tarball is outdated. Martin On Sat, Nov 14, 2009 at 3:10 PM, Martin A. Hansen <ma...@ma...> wrote: > I grabbed a fresh copy of Jbrowse and set about to install the test data > following the tutorial: > > http://jbrowse.org/code/jbrowse-master/docs/tutorial/ > > However, at step one: > > bin/prepare-refseqs.pl --fasta docs/tutorial/data_files/volvox.fa > > > Unknown option: fasta > USAGE: > bin/prepare-refseqs.pl [--out <output directory>] --gff <gff file > describing refseqs> > OR: > bin/prepare-refseqs.pl [--out <output directory>] [--noseq] > [--seqdir <sequence data directory>] --conf <JBrowse config file> --refs > <list of refseq names> --refids <list of refseq IDs> > <output directory>: defaults to "data" > <sequence data directory>: chunks of sequence go here; defaults to > "<output directory>/seq" > > --noseq: do not prepare sequence data for the client. > > You can use a GFF file to describe the reference sequences, or > you can use a JBrowse config file and a list of refseq names > or a list of refseq IDs. If you use a GFF file, it should > contain ##sequence-region lines as described in the GFF specs. > If you use a JBrowse config file, you can either provide a > (comma-separated) list of refseq names, or (if the names > aren't globally unique) a list of refseq IDs. > > > What is going on? > > > > Martin > |
From: Martin A. H. <ma...@ma...> - 2009-11-14 14:10:47
|
I grabbed a fresh copy of Jbrowse and set about to install the test data following the tutorial: http://jbrowse.org/code/jbrowse-master/docs/tutorial/ However, at step one: bin/prepare-refseqs.pl --fasta docs/tutorial/data_files/volvox.fa Unknown option: fasta USAGE: bin/prepare-refseqs.pl [--out <output directory>] --gff <gff file describing refseqs> OR: bin/prepare-refseqs.pl [--out <output directory>] [--noseq] [--seqdir <sequence data directory>] --conf <JBrowse config file> --refs <list of refseq names> --refids <list of refseq IDs> <output directory>: defaults to "data" <sequence data directory>: chunks of sequence go here; defaults to "<output directory>/seq" --noseq: do not prepare sequence data for the client. You can use a GFF file to describe the reference sequences, or you can use a JBrowse config file and a list of refseq names or a list of refseq IDs. If you use a GFF file, it should contain ##sequence-region lines as described in the GFF specs. If you use a JBrowse config file, you can either provide a (comma-separated) list of refseq names, or (if the names aren't globally unique) a list of refseq IDs. What is going on? Martin |
From: Scott C. <sc...@sc...> - 2009-11-13 14:49:04
|
Hello, I am pleased to announce that the January GMOD meeting will be taking place on January 14 and 15 in San Diego at the Best Western Seven Seas (the same location as last year). Please see this page for registration information: http://gmod.org/wiki/January_2010_GMOD_Meeting When you go to that page, please take a moment to add suggestions for the agenda. There is no registration fee for this meeting, however there is limited space, so please register early. The proprietors of the Best Western have given us an excellent room rate, and extended it to the previous week, so that people attending the GMOD meeting and the Plant and Animal Genome meeting before it may stay at the Best Western the entire time. Please direct follow up questions to the gmod-devel mailing list: https://lists.sourceforge.net/lists/listinfo/gmod-devel Thanks and I look forward to seeing you in San Diego! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2009-10-21 20:23:55
|
Hi Giles, Don't feel bad about the untimely death of that thread; I seem to do that more than my fair share. I would be interested in trying to get the GBrowse Chado adaptor working with your publicly facing readonly database, though not until November. I'm booked solid until then. I'm glad you have something working, and having a GFF3 dumper is a good thing anyway. Scott On Sun, Oct 18, 2009 at 11:47 AM, Giles Velarde <gv...@sa...> wrote: > > On 16 Oct 2009, at 04:45, Mitch Skinner wrote: > >> I was talking about this thread: >> >> >> http://www.nabble.com/getting-JBrowse-to-run-off-a-Chado-database-where-feature.name-field-is-not-used-td25491225.html >> >> That thread ended a little inconclusively, I thought; Scott had suggested >> a database change ("set feature.name = feature.uniquename") and then Giles >> started working on genedb->gff->jbrowse rather than straight >> genedb->jbrowse. > > > Mitch, Scott, > > Many thanks for looking into this. > > Yes, sorry about not ending that thread. Mitch summed it up quite well. I > did modify try the modifications as you suggested, Scott, and this did get > the script to go a bit further, but then it got stuck somewhere a little > further down the line. At the same time we are developing considerable > infrastructure around GFF, and so decided to look at that too, and this > yielded tangible results more easily. > > Scott, thanks for you offer to look into it. It doesn't look like we'll need > to be using that adapter for now, mainly because the GFF middle-ware > approach is working fine for us and producing all those auto-exported GFF > files can have other unforeseen uses. But if you're keen on expanding the > scope of the adapter, I could point you to our public read-only snapshot for > further testing and would be happy to help you with it. > > >> And then performance became an issue; it sounds like Giles solved it in >> his case, but in general I hope to avoid making people do as much work as he >> has done. >> > > Speaking of which, I ran into another performance issue, this time inside > prepare-refseqs.pl. It has to do with preparing refseqs for several 1000s of > GFF files. When I looked into it I saw that the refSeqs.js was being opened, > parsed, appended to, and closed for every GFF. As this file got larger this > opening and appending step got more expensive. > > I submit a patch that fixes it for me. > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a > charity registered in England with number 1021457 and acompany registered in > England with number 2742969, whose registeredoffice is 215 Euston Road, > London, NW1 2BE. > > In order not to change any original functionality, I have added a couple of > new parameters: > > -gffs, which allows you to supply a comma-separated list of files > -gfffolder, which allows you to supply a folder with gff files in there (it > does not check to see if they really are GFFs) > > This allows prepare-refseqs.pl to process more than one GFF file at once, > which was a serious bottleneck for me. As before, I am not expecting you to > necessarily apply the patch. It's just for you to see how I got around the > problem. I am sure you can come up with a more elegant way! :-) > > > > Oh, and another optimization I have had to do is deploy an instance of > JBrowse per organism. biodb-to-json.pl seems to fall over when the refseqs > file gets too large. This happens to help keep the chromosome/contig-picking > drop-down box populated with a sensible amount of choices, and it seems to > make sense to me to have a separate page per organism. > > >> Part of the reason I was thinking about having an sql-to-json.pl (that >> would take an sql query as a parameter) is this site-specific variation in >> chado usage. I'm not sure how much of it there is, though. >> >> Other reasons were: >> * allowing other kinds of queries than the ones supported by Bio::DasI >> * not having to write Bio::DB::Das::UCSC (unless someone has already >> written this?) >> * not having the bio object intermediate (when I profiled a BED->json >> conversion, the biggest CPU user was Bio::Root::RootI::_rearrange) >> > > I must say I have had to continually fight the temptation to cut through the > middleman and just generate the JSons directly! So, I do indeed see > advantages in your suggestion. Of course, you would need to defensively > validate other people's SQL results, which is never fun. Alternatively, a > simple abstracted/documented programmatic API for generating the JSON might > be a good way, e.g. : > > my $jbrowse = new JBrowseConfig ( $jsonConfigurations ); > my $track = $jbrowse->addTrack({ name => "contig01", etc. } ); > > foreach my $feature ($myCustomFeatureImplementationList) > { > $jbrowse->addFeature( $feature->toJSon() ); > } > > $jbrowse->generate_refseqs(); > $jbrowse->generate_jsons(); > $jbrowse->generate_names(); > > Looking at the bin scripts, you're nearly there right now. > > >> On the other hand, you could argue (as Chris Mungall did to me today) that >> it would be better to work on improving the middleware rather than trying to >> avoid it. And then there's the reasons that people have wanted middleware >> in the past (not writing m*n mappings between m data sources and n >> consumers, but m+n mappings to/from the middleware). I think that would be >> an interesting discussion to have; each of those m*n mappings can be simpler >> than the mappings to and from the middleware, and also take advantage of >> unique features of the source and destination. >> > > If by middleware, if this includes parsing formats like GFF, then I would > tend to agree with him. It's easier for me to justify my time developing > around standards, because of the potential for reuse. The minimal GFF really > does seem like a good solution for us: they're still valid files, and they > take an order of magnitude's less time to produce (because we have so much > annotation in our db). > > Thanks again, guys. I really appreciate you're willingness to help. In our > particular case we have some rather phenomenal scaling issues, so it was > never going to be easy, so rest assured I'll be in touch again if/when I run > into more interesting eventualities. > > Kind Regards, > Giles > >> Mitch >> >> Scott Cain wrote: >>> >>> Hi Mitch and Giles, >>> >>> I must have missed the conversation about Bio::DB::Das::Chado; what >>> were the nature of the problems? I can't say I'm terribly surprised >>> that there were problems, since GeneDB has been using Chado for a long >>> time and so Chado and the gbrowse adaptor have no doubt evolved away >>> from the way GeneDB is using it. >>> >>> Anyway, is there anything I can do? >>> >>> Scott >>> >>> >>> On Thu, Oct 15, 2009 at 7:21 PM, Mitch Skinner >>> <mit...@be...> wrote: >>> >>>> Giles Velarde wrote: >>>> >>>>> Yes I am using this to pull data out of GeneDB here at Sanger, which >>>>> is a Chado database. I have put it up here : >>>>> >>>>> http://github.com/gv1/chado2miniGFF >>>>> >>>>> It's mostly SQLs, inside light Python wrapper. >>>>> >>>>> We usually use Artemis to bulk-export GFF, but in this case we don't >>>>> need all the annotations for the time being, and even if we do use >>>>> some in the future ( for use in extraData) the approach would be to >>>>> selectively put them in. >>>>> >>>> Given the difficulty you had with Bio::DB::Das::Chado, and given that >>>> I'd like to be able to generate jbrowse json from the UCSC database >>>> (and, potentially, other databases) I've been kicking around the idea of >>>> having something like an sql-to-json.pl. You could give it a database >>>> connection and an sql query (and probably some perl callbacks for >>>> munging the results) and it would generate jbrowse json for you. >>>> >>>> Simple cases would be really easy; the jbrowse json is already sort of >>>> tabular. I'm not sure yet about how to deal with subfeatures though, >>>> given that different databases deal with those quite differently. >>>> >>>> FWIW, >>>> Mitch >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>>> is the only developer event you need to attend this year. Jumpstart your >>>> developing skills, take BlackBerry mobile applications to market and >>>> stay >>>> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >>>> http://p.sf.net/sfu/devconference >>>> _______________________________________________ >>>> Gmod-ajax mailing list >>>> Gmo...@li... >>>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax >>>> >>>> >>> >>> >>> >>> >> > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: EMBL C. & C. <ev...@em...> - 2009-10-21 17:44:15
|
Dear colleagues, We invite you to participate in the first EMBO Workshop on Visualizing Biological Data (VizBi, http://www.vizbi.org) 3 - 5 March 2010 at the EMBL's new Advanced Training Centre in Heidelberg, Germany The goal of the workshop is to bring together, for the first time, researchers developing and using visualization systems across all areas of biology, including genomics, sequence analysis, macromolecular structures, systems biology, and imaging (including microscopy and magnetic resonance imaging). We have assembled an authoritative list of 29 invited speakers who will present an exciting program, reviewing the state-of-the-art and perspectives in each of these areas. The primary focus will be on visualizing processed and annotated data in their biological context, rather than on processing of raw data. The workshop is limited in the total number participants, and each participant is normally required to present a poster and to give a 'fastforward' presentation about their work (limited to 30 seconds and 1 slide). To apply to join the workshop, please go to http://vizbi.org and submit an abstract and image related to your work. Submissions close on 16 November 2009. Since places are limited, participants will be selected based on the relevance of their work to the goals of the workshop. Notifications of acceptance will be sent within three weeks after the close of submissions. We plan to award a prize for the submitted image that best conveys a strong scientific message in a visually compelling manner. Please forward this announcement to anyone who may be interested. We hope to see you in Heidelberg next spring! Seán O'Donoghue, EMBL Jim Procter, University of Dundee Nils Gehlenborg, European Bioinformatics Institute Reinhard Schneider, EMBL If you have any questions about the registration process please contact: Adela Valceanu Conference Officer European Molecular Biology Laboratory Meyerhofstr. 1 D-69117 Heidelberg Tel: +49-6221-387 8625 Fax: +49-6221-387 8158 Email: val...@em... For full event listings please visit our website: www.embl.org/events or sign up for our newsletter www.embl.de/events/newsletter |
From: Mitch S. <mit...@be...> - 2009-10-21 03:57:52
|
Thanks for following up on this; it doesn't seem to me that this is a JBrowse bug, though. It's definitely happening deep within bioperl code; the JBrowse code line is just the one that creates the database object. I'm not really sure how to address this; I can't even think of a more useful error message for JBrowse to give. Mitch Giles Velarde wrote: >> >>> Did you write that patch before you started making per-organism >>> JBrowse instances? >>> >> >> No. Once I wrote that patch I got past prepare-refseqs.pl, but then >> found it fell over later. I can't remember where exactly but I think >> it was in biodb-json. > > > It just occured to me that I have just isolated an error in our > Database where there was an orphan mRNA being exported into the GFF. > It happens to occur in the organism that was failing, and it causes : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Smp_196060 doesn't have a primary id > STACK: Error::throw > STACK: Bio::Root::Root::throw > /software/pathogen/external/lib/perl/Bio/Root/Root.pm:328 > STACK: > Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/GFF3Loader.pm:668 > STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/GFF3Loader.pm:647 > STACK: Bio::DB::SeqFeature::Store::GFF3Loader::finish_load > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/GFF3Loader.pm:318 > STACK: Bio::DB::SeqFeature::Store::Loader::load_fh > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/Loader.pm:322 > STACK: Bio::DB::SeqFeature::Store::Loader::load > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/Loader.pm:219 > STACK: Bio::DB::SeqFeature::Store::memory::post_init > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/memory.pm:166 > STACK: Bio::DB::SeqFeature::Store::new > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store.pm:366 > STACK: /nfs/pathdata2/GFF/jbrowse_deployments_test/bin/biodb-to-json.pl:44 > ----------------------------------------------------------- > Could not open database: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Smp_196060 doesn't have a primary id > STACK: Error::throw > STACK: Bio::Root::Root::throw > /software/pathogen/external/lib/perl/Bio/Root/Root.pm:328 > STACK: > Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/GFF3Loader.pm:668 > STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/GFF3Loader.pm:647 > STACK: Bio::DB::SeqFeature::Store::GFF3Loader::finish_load > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/GFF3Loader.pm:318 > STACK: Bio::DB::SeqFeature::Store::Loader::load_fh > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/Loader.pm:322 > STACK: Bio::DB::SeqFeature::Store::Loader::load > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/Loader.pm:219 > STACK: Bio::DB::SeqFeature::Store::memory::post_init > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/memory.pm:166 > STACK: Bio::DB::SeqFeature::Store::new > /software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store.pm:366 > STACK: /nfs/pathdata2/GFF/jbrowse_deployments_test/bin/biodb-to-json.pl:44 > ----------------------------------------------------------- > > For some reason I missed that in the logs originally. > > So I think what's happened is that an error in the exported GFF was > causing biodb to die. By separating things out into different > organisms, I was able to get deployments for all the rest of our > datasets, and then isolate what's going wrong with this one. > > Regards, > Giles > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > compa ny registered in England with number 2742969, whose registered > office is 2 15 Euston Road, London, NW1 2BE. |
From: Mitch S. <mit...@be...> - 2009-10-21 03:51:02
|
I thought about this some more and your use case seems like a reasonable one to me. I tried out your patch in my working copy, and I noticed two things: 1. There's no documentation on those options in the help message 2. The "if" statement that prints out the usage message doesn't check for $gfffolder If you prepare another patch that addresses those things, I'll apply it to jbrowse master. Thanks for the patch, Mitch Giles Velarde wrote: > In our case we export one GFF per top-level source feature in the > database. This can be a chromosome, but is more often a contig (and > there are lots of those). They could be concatenated into one big GFF > of course (I think GFF allows for that?). > >> If you're using GFF files with prepare-refseqs.pl then one option is >> to create one GFF file per organism with all of the sequence-region >> lines in it (and nothing else). Running prepare-refseqs.pl on that >> file should be pretty fast. >> > > Good idea, makes sense as that's the only thing that gets parsed at > that point (though I didn't know that at the time when I first started > looking). |
From: Mitch S. <mit...@be...> - 2009-10-21 03:36:44
|
Martin A. Hansen wrote: > I have studied the GFF3 format, and it is quite clear to me that it > should be possible to parse GFF3 entries in a step-wise manner using > very little memory - including parent/childs Not in the general case, no. Imagine a file with 2.3 million features, all of which are children of one parent feature. Or that those 2.3 million features are children of a bunch of parent features, with all the parents at the end (or at some unpredictable place within the file). If you impose the right kind of (topological) ordering constraint on the GFF, you could improve your average-case memory usage, at the cost of making your GFF parser much less general. Or you could build the ordering into your parser, but then you'd have to do a fairly memory-hungry topological sort at the beginning. > Does Bio::DB support MySQL? The bioperl documentation talks all about this. Mitch |
From: Mitch S. <mit...@be...> - 2009-10-21 03:23:39
|
Martin A. Hansen wrote: > But clicking on a feature shows ID undefined? and strand is shown as > either 1 or -1. The ID thing was kind of a mis-labeling; JBrowse can optionally keep track of a separate "name" and "ID", but sometimes identifiers become names and vice versa. I clarified the click-popup to show name and ID separately, and modified flatfile-to-json.pl to include the ID from the GFF as the ID in JBrowse, rather than the autogenerated one that Bio::DB::SeqFeature::Store creates. > Also, I have a feature that is full chromosome length (2.8M) and 7 > that are 0.5M. Strange? The full length feature is from the sequence-region header; now that you don't need that anymore, you can leave it out. Alternatively, you can use the --type command line option to filter out the full length feature. The long features are probably due to the fact that you have several features with the same ID. I believe the bioperl code is merging them together. Mitch > > > > ##gff-version 3 > S.aur_COL BOWTIE . 2474533 2474567 1 + > . ID=5_BWCOjxwXsN1/1 > S.aur_COL BOWTIE . 1980530 1980564 3 + > . ID=7_NZCOjxwXsN1/1 > S.aur_COL BOWTIE . 2115669 2115703 3 + > . ID=7_NZCOjxwXsN1/1 > S.aur_COL BOWTIE . 2232159 2232193 3 + > . ID=7_NZCOjxwXsN1/1 > S.aur_COL BOWTIE . 530658 530692 3 - > . ID=7_NZCOjxwXsN1/1 > S.aur_COL BOWTIE . 574288 574322 3 - > . ID=7_NZCOjxwXsN1/1 > S.aur_COL BOWTIE . 579500 579534 3 - > . ID=7_NZCOjxwXsN1/1 > S.aur_COL BOWTIE . 2721172 2721206 1 + > . ID=3_2VCOjxwXsN1/1 > S.aur_COL BOWTIE . 575753 575787 3 + > . ID=1_GbCOjxwXsN1/1 > > > refseq.js looks like this: > > > refSeqs = > [ > { > "length" : 2809422, > "name" : "S.aur_COL", > "seqDir" : "data/seq/S.aur_COL", > "start" : 0, > "end" : 2809422, > "seqChunkSize" : 20000 > } > ] > > I cannot tell what version or revision my Jbrowse installation is > (how do I find out?). > > So what is this segment error? And how do I resolve it? > > > > Cheers, > > > > Martin > > > ------------------------------------------------------------------------ > So gehst du mir nicht vor die Tür! Herbsttrends entdecken > <http://redirect.gimas.net/?n=M0910Herbstmode_WW> > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > ------------------------------------------------------------------------ > > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax > |
From: Mitch S. <mit...@be...> - 2009-10-21 03:17:14
|
Colin Davenport wrote: > I think this should help : you need to modify the GFF header > and add a line > ##sequence-region scf1117875581239 1 719819 > Thanks for answering this question, Colin. It seems like this bites a lot of people; I thought it had to be that way because of the chado adapter. But I found out a little while ago from Scott that he implemented some functionality in the chado adapter so that I could re-write the JBrowse code to drop this requirement. So, as of now, you no longer need a sequence-region header in your GFF. Mitch |
From: Caroline <Car...@kc...> - 2009-10-20 12:44:15
|
Hello, On Tue, 2009-10-20 at 00:15 +0100, Mitch Skinner wrote: > It > should also be possible to use SAM/BAM with JBrowse using > Bio::SamTools, > but I haven't tested it yet. I had a go at this and it is. Patch attached (first time I've done this with git, give me a shout if it's not the right format). > Assuming that entire 2.3M is in one track, then even once you get all > that data into JBrowse's JSON format, the client-side code in the > "master" branch will barf on that much data. I'm working on that in the > "lazyfeatures" branch; there are still some rough edges there but you > could try it out and see how well it works for you. Awesome! What's the plan for this? I'm trying to knock up a sequence server (perl, Catalyst) that will hand over our ChIPseq data from BAM files bit by bit and eventually do the same for remote BAM files, DAS servers and so on. You mentioned this on the mailing list ages ago, but this is the first chance I've had to get around to it. Can you point me in the right direction for getting it to play nicely with JBrowse? What will the lazyfeatures track expect from the server? How does it deal with zooming - can the server just decide to only return a hist summary at some point? What about caching? Does the browser grab data in defined chunks? What else should I be worrying about? Cheers, Cass |
From: Giles V. <gv...@sa...> - 2009-10-20 11:07:49
|
> >> Did you write that patch before you started making per-organism >> JBrowse instances? >> > > No. Once I wrote that patch I got past prepare-refseqs.pl, but then > found it fell over later. I can't remember where exactly but I think > it was in biodb-json. It just occured to me that I have just isolated an error in our Database where there was an orphan mRNA being exported into the GFF. It happens to occur in the organism that was failing, and it causes : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Smp_196060 doesn't have a primary id STACK: Error::throw STACK: Bio::Root::Root::throw /software/pathogen/external/lib/perl/Bio/ Root/Root.pm:328 STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables / software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/ SeqFeature/Store/GFF3Loader.pm:668 STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree / software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/ SeqFeature/Store/GFF3Loader.pm:647 STACK: Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /software/ pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/ GFF3Loader.pm:318 STACK: Bio::DB::SeqFeature::Store::Loader::load_fh /software/pathogen/ external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/ Loader.pm:322 STACK: Bio::DB::SeqFeature::Store::Loader::load /software/pathogen/ external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/ Loader.pm:219 STACK: Bio::DB::SeqFeature::Store::memory::post_init /software/ pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/ memory.pm:166 STACK: Bio::DB::SeqFeature::Store::new /software/pathogen/external/lib/ perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store.pm:366 STACK: /nfs/pathdata2/GFF/jbrowse_deployments_test/bin/biodb-to- json.pl:44 ----------------------------------------------------------- Could not open database: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Smp_196060 doesn't have a primary id STACK: Error::throw STACK: Bio::Root::Root::throw /software/pathogen/external/lib/perl/Bio/ Root/Root.pm:328 STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables / software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/ SeqFeature/Store/GFF3Loader.pm:668 STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree / software/pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/ SeqFeature/Store/GFF3Loader.pm:647 STACK: Bio::DB::SeqFeature::Store::GFF3Loader::finish_load /software/ pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/ GFF3Loader.pm:318 STACK: Bio::DB::SeqFeature::Store::Loader::load_fh /software/pathogen/ external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/ Loader.pm:322 STACK: Bio::DB::SeqFeature::Store::Loader::load /software/pathogen/ external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/ Loader.pm:219 STACK: Bio::DB::SeqFeature::Store::memory::post_init /software/ pathogen/external/lib/perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store/ memory.pm:166 STACK: Bio::DB::SeqFeature::Store::new /software/pathogen/external/lib/ perl/lib/site_perl/5.8.8/Bio/DB/SeqFeature/Store.pm:366 STACK: /nfs/pathdata2/GFF/jbrowse_deployments_test/bin/biodb-to- json.pl:44 ----------------------------------------------------------- For some reason I missed that in the logs originally. So I think what's happened is that an error in the exported GFF was causing biodb to die. By separating things out into different organisms, I was able to get deployments for all the rest of our datasets, and then isolate what's going wrong with this one. Regards, Giles -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Giles V. <gv...@sa...> - 2009-10-20 10:58:09
|
On 19 Oct 2009, at 23:17, Mitch Skinner wrote: > Giles Velarde wrote: >> Speaking of which, I ran into another performance issue, this time >> inside prepare-refseqs.pl. It has to do with preparing refseqs for >> several 1000s of GFF files. When I looked into it I saw that the >> refSeqs.js was being opened, parsed, appended to, and closed for >> every GFF. As this file got larger this opening and appending step >> got more expensive. >> >> I submit a patch that fixes it for me. > > The patch looks good to me; I was just about to apply it, but I > wanted to ask some questions first. > > How are your GFF files organized? Do you have a GFF file per > organism, per refseq, or something else? How many organisms do you > have, and how many refseqs do they have? Do your JBrowse instances > have sequence data? > In our case we export one GFF per top-level source feature in the database. This can be a chromosome, but is more often a contig (and there are lots of those). They could be concatenated into one big GFF of course (I think GFF allows for that?). > If you're using GFF files with prepare-refseqs.pl then one option is > to create one GFF file per organism with all of the sequence-region > lines in it (and nothing else). Running prepare-refseqs.pl on that > file should be pretty fast. > Good idea, makes sense as that's the only thing that gets parsed at that point (though I didn't know that at the time when I first started looking). > Did you write that patch before you started making per-organism > JBrowse instances? > No. Once I wrote that patch I got past prepare-refseqs.pl, but then found it fell over later. I can't remember where exactly but I think it was in biodb-json. Regards, Giles -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Martin A. H. <ma...@ma...> - 2009-10-20 06:35:46
|
Hello Mitch, Thanks for the quick fix. Jbrowse is missing a version/revision number for reporting questions/features/bugs? How can I tell you what version I have? I got the code from http://github.com/jbrowse/jbrowse, and I am currently not using the very latest because there has been introduced an evil dependency on Gbrowse.pm that I have failed to resolve so far. Cheers, Martin On Mon, Oct 19, 2009 at 11:10 PM, Mitch Skinner <mit...@be...>wrote: > Martin A. Hansen wrote: > >> generate-json.pl <http://generate-json.pl> >> generate-names.pl <http://generate-names.pl> >> gff-to-json.pl <http://gff-to-json.pl> >> prepare-refseqs.pl <http://prepare-refseqs.pl> >> >> all have a hardcoded path to #!/usr/bin/perl which means that the scripts >> will run on the system's perl and not on my locally installed perl - which >> has the required Bioperl installed. >> >> Please consider changing the shebang to #!/usr/bin/env perl >> > > Done. > > > http://github.com/jbrowse/jbrowse/commit/79ab79e07be80e480d1d20609813226cc1655131 > > Also, you appear to have an old version of JBrowse; the names of the > scripts in the bin/ directory changed a while back. Did you download it > recently? If so, from where? > > Regards, > Mitch > |
From: Martin A. H. <ma...@ma...> - 2009-10-20 06:30:08
|
Hello Mitch, I have studied the GFF3 format, and it is quite clear to me that it should be possible to parse GFF3 entries in a step-wise manner using very little memory - including parent/childs (there is an optional ### record seperator, that if in place should make this much easier). However, it is also quite clear to me why GFF3 don't have a big fan club - it's way too complex. BED files are a decent tabular alternative - however, BED are something out of the UCSC genome browser, and is dependent on chrom, chromStart and chromEnd which does not make it a good generic format for contigs and scaffolds. There will always be conflicts between the original BED format and the different BED-like formats sprouting (e.g. between Jim Kents BED tools and BEDTools). Also, I really dislike the chromEnd not being the exact position as in an array. Finally, the itemRgb field is never used. My resolve is to stay clear of BED files unless you work with the UCSC genome browser. SAM/BAM appears to be a developing alternative - especially because it seems like consensus is developing to use this format. However, I find unnecessarily difficult to parse (the bit field), and I dislike that it is without field restrictions. Oh, I got ranting about formats :P - I should test Jbrowse with a database - which I haven't tried yet. Does Bio::DB support MySQL? The data I have got is Solexa reads mapped to a Bacterial genome using Bowtie. I hacked the Bowtie output into GFF3 like format. Regards, Martin On Tue, Oct 20, 2009 at 1:15 AM, Mitch Skinner <mit...@be...>wrote: > Martin A. Hansen wrote: > >> I was trying to upload 2.3M features from a GFF file and Perl crashed out: >> >> perl(1734) malloc: *** mmap(size=16777216) failed (error code=12) >> *** error: can't allocate region >> *** set a breakpoint in malloc_error_break to debug >> Out of memory! >> >> So Jbrowse doesn't handle tracks with a bit of data? >> > > JBrowse uses BioPerl to parse GFF files. Because of the possibility of > parent/child relationships between features, the BioPerl GFF loader (using > the Bio::DB::SeqFeature::Store adapter) has to load the entire file into > memory, and it stores those features in memory as relatively heavyweight > objects. > > The possibilities I see for dealing with this are: > * Use a database. Then the database takes care of the parent/child > relationships, and it breaks the data down by refseq (you could also break > your GFF down by refseq if you haven't already, which should help) > * Use a different file format. I've tested with BED, which is streamable > (there are no parent-child relationships between rows in BED, which means > that we don't have to load the entire file into memory). It should also be > possible to use SAM/BAM with JBrowse using Bio::SamTools, but I haven't > tested it yet. > > Assuming that entire 2.3M is in one track, then even once you get all that > data into JBrowse's JSON format, the client-side code in the "master" branch > will barf on that much data. I'm working on that in the "lazyfeatures" > branch; there are still some rough edges there but you could try it out and > see how well it works for you. > > What kind of data is this? Is this short-read sequencing data, or SNP > data, or something else? > > Mitch > |
From: Mitch S. <mit...@be...> - 2009-10-19 23:15:51
|
Martin A. Hansen wrote: > I was trying to upload 2.3M features from a GFF file and Perl crashed out: > > perl(1734) malloc: *** mmap(size=16777216) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > Out of memory! > > So Jbrowse doesn't handle tracks with a bit of data? JBrowse uses BioPerl to parse GFF files. Because of the possibility of parent/child relationships between features, the BioPerl GFF loader (using the Bio::DB::SeqFeature::Store adapter) has to load the entire file into memory, and it stores those features in memory as relatively heavyweight objects. The possibilities I see for dealing with this are: * Use a database. Then the database takes care of the parent/child relationships, and it breaks the data down by refseq (you could also break your GFF down by refseq if you haven't already, which should help) * Use a different file format. I've tested with BED, which is streamable (there are no parent-child relationships between rows in BED, which means that we don't have to load the entire file into memory). It should also be possible to use SAM/BAM with JBrowse using Bio::SamTools, but I haven't tested it yet. Assuming that entire 2.3M is in one track, then even once you get all that data into JBrowse's JSON format, the client-side code in the "master" branch will barf on that much data. I'm working on that in the "lazyfeatures" branch; there are still some rough edges there but you could try it out and see how well it works for you. What kind of data is this? Is this short-read sequencing data, or SNP data, or something else? Mitch |