From: Scott C. <sc...@sc...> - 2024-01-11 23:16:49
|
Hi Hans, I'd say the problem is primarily that the snippet you've shown isn't GFF3; it looks much more like GTF (GFF3 would have "tag=value;tag2=value2" in the ninth column, as opposed to 'tag "value"; tag2 "value2"' in the ninth column). While JBrowse 2 does support GTF, it has some drawbacks, the biggest of which is there isn't an indexed form of it, so the tracks will load very slowly since JBrowse has to load the entire file and parse it in order to draw any portion of it. If you had GFF3, each of the exon features would have a "Parent" tag that pointed to the ID of the parent transcript. The other thing that jumps out at me for the data snippet you provided is that all of the transcripts appear to share the same start and end coordinates of the child exons, so they would only show up as individual exons anyway. I would guess that elsewhere in your GTF file you have more complicated examples with transcripts that have multiple exon children. So, it's hard to say what I would expect to see without a better example of your GTF, but you probably want to look at generated GFF3 anyway, so that you can take advantage of tabix indexing of the GFF3 files. Of course, feel free to follow up with more questions or example data and we can figure out where to go from there. Scott On Thu, Jan 11, 2024 at 2:29 PM Hans Vasquez-Gross <hva...@un...> wrote: > Hello All, > > I have the output from isoseq collapse then pigeon index to create a > sorted .gff3 file for a new assembly. Currently, this gff3 file has > transcript and exon definitions. However, when I load this track data on > JBrowse2, it shows the transcripts as one large unit and the exons as a > separate unit. It doesn't seem to correct render the intron/exon > boundaries. The annotation track is on top in yellow and the isoseq_reads > bam file is below. > > Example data: > ##pacbio-collapse-version 1.0 > ##date Thu Jan 11 00:10:30 2024 UTC > ctg_p_c_003493_0_75000_89999 PacBio transcript 11587 12122 . - . gene_id > "PB.32086"; transcript_id "PB.32086.1"; > ctg_p_c_003493_0_75000_89999 PacBio exon 11587 12122 . - . gene_id > "PB.32086"; transcript_id "PB.32086.1"; > ctg_p_c_033075_0 PacBio transcript 20564 22031 . + . gene_id "PB.31043"; > transcript_id "PB.31043.1"; > ctg_p_c_033075_0 PacBio exon 20564 22031 . + . gene_id "PB.31043"; > transcript_id "PB.31043.1"; > ctg_p_c_033075_0 PacBio transcript 20564 21887 . + . gene_id "PB.31043"; > transcript_id "PB.31043.2"; > ctg_p_c_033075_0 PacBio exon 20564 21887 . + . gene_id "PB.31043"; > transcript_id "PB.31043.2"; > ctg_p_c_033075_0 PacBio transcript 20564 21758 . + . gene_id "PB.31043"; > transcript_id "PB.31043.3"; > ctg_p_c_033075_0 PacBio exon 20564 21758 . + . gene_id "PB.31043"; > transcript_id "PB.31043.3"; > > > Any suggestions? > > Thank you, > -Hans > > -- > > > > [image: signature_998258195] > > *Hans Vasquez-Gross, Ph.D* > > Bioinformatics Scientist, > Nevada Bioinformatics Center > > https://www.unr.edu/bioinformatics > > hva...@un... > > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Project Manager (http://gmod.org/) 216-392-3087 WormBase Developer (http://wormbase.org/) Alliance of Genome Resources Group Leader (http://alliancegenome.org/) VirusSeq Project Manager (https://virusseq-dataportal.ca/) Human Cancer Models Initiative Project Manager ( https://hcmi-searchable-catalog.nci.nih.gov/) |