Re: [Gmod-ajax] [apollo] Issues with data visualisation on Apollo

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi there,

This is an interesting thread, basically it appears you want to make some
hg38 type instance for JBrowse data. It might be nice if this was just
already available so that it didn't require a lot of legwork but certainly
happy to help here (we do have one here but I think it could be better
organized http://hg38.jbrowse.org/)

Here is a short summary of some options based on the files you listed

>Capture_seq gencode_pasa_Captureseq_2.pasa_assemblies.gtf
/nfs/production/panda/ensembl/havana/warehouse_th_group/jmg/long_read_pipeline/Capture_seq_2/gencode_pasa_Captureseq_2.pasa_assemblies.gtf.gz
GTF

Convert the gtf to gff, and load with flatfile-to-json (or use gff3tabix,
but I'd suggest flatfile-to-json probably)

Here is one option for how to convert gtf to gff
https://jbrowse.org/docs/faq.html#how-do-i-convert-gtf-to-gff

>GENCODE gencode.v35.annotation.gff3
ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.annotation.gtf.gz
GFF3

Load with flatfile-to-json --gff

>GENCODE_Ensembl Homo_sapiens.GRCh38.101.gff3
ftp://ftp.ensembl.org/pub/release-101/gff3/homo_sapiens/Homo_sapiens.GRCh38.101.gff3.gz
GFF3

Load with flatfile-to-json --gff

>RefSeq hg38.ncbiRefSeq.gtf
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
GTF

Probably convert gtf to gff, and then load with flatfile-to-json

Note that another option is to convert to bigBed. First gtfToGenePred and
then https://gist.github.com/gireeshkbogu/f478ad8495dca56545746cd391615b93

If you want a searchable gene names though, suggest using flatfile-to-json

>RefSeq_GRCh38 GRCh38_latest_genomic.gff
ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz
GFF

This one is tricky because it uses NCBI refnames so you'll have to convert
them to chr1, chr2, etc. from NC_000001.11 etc.

>SLRseq_2 SLRseq.GRCh38.gtf
/nfs/production/panda/ensembl/havana/warehouse_th_group/jmg/long_read_pipeline/SLRseq_2/SLRseq.GRCh38.gtf.gz
GTF

Probably convert to gff3, load with flatfile-to-json

>SLRseq_merged SLRseq_merged.bam
ftp://ftp.ebi.ac.uk/pub/databases/havana/ngs_havana/havana/SLRseq/human/SLRseq_merged.bam
//
ftp://ftp.ebi.ac.uk/pub/databases/havana/ngs_havana/havana/SLRseq/human/SLRseq_merged.bam.bai
BAM

Manually download the BAM and BAI into your data folder, and edit it into
tracks.conf with a text editor, don't have a great add track workflow for
BAM files

[tracks.SLRseq_merged_bam]
urlTemplate=SLRseq_merged.bam

That is all that is needed for your config

>UCSC_MANE mane.0.9.bed
https://hgdownload.soe.ucsc.edu/gbdb/hg38/mane/mane.0.9.ix //
https://hgdownload.soe.ucsc.edu/gbdb/hg38/mane/mane.0.9.ixx //
https://hgdownload.soe.ucsc.edu/gbdb/hg38/mane/mane.0.9.bb BigBed -> BED

You can do two things

1) load this as is in bigBed format. Manually edit this into your
tracks.conf file with a text editor, the bigBed is natively supported

[tracks.MANE]
key=MANE 0.9 BigBed
urlTemplate=mane.0.9.bb

2) convert to gff, load with flatfile-to-json

Note that the trix index (ix and ixx) are not able to be used by jbrowse
currently (that would allow searching for genes in the bigBed files) so if
you want searchable gene names, convert to gff, use flatfile-to-json

>UCSC_ncbiRefSeqOther ncbiRefSeqOther.bed
https://hgdownload.soe.ucsc.edu/gbdb/hg38/ncbiRefSeq/ncbiRefSeqOther.bb
BigBed -> BED

Manually download this into your tracks.conf file with a text editor, the
bigBed is natively supported

[tracks.ncbiRefSeqOther]
key=NCBI RefSeq (other) BigBed
urlTemplate=ncbiRefSeqOther.bb

>UNIPROT UP000005640_9606_proteome.bed
ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/genome_annotation_tracks/UP000005640_9606_beds/UP000005640_9606_proteome.bed
BED

Convert to bigBed or GFF. The extra columns are not loaded if using
flatfile-to-json.pl --bed

>gtexGene --
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/gtexGene.sql //
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/gtexGene.txt.gz SQL

Use ucsc-to-json.pl with these "database" files

>unipAliSwissprot unipAliSwissprot.bed

Convert to bigBed or GFF, extra columns will not be loaded

-Colin

On Tue, Nov 17, 2020 at 1:35 PM Nathan Dunn <nat...@lb...> wrote:

>
> Shamika,
>
> I don’t know the exact answer to your question, but usually you have to
> supply a —type argument to process the top-level information you want to
> display, so you could potentially put in multiple tracks.
>
> e.g., for just coding genes:
>
> ~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --gff
> GRCh38_latest_genomic.gff —type mRNA --trackLabel RefSeq_GRCh38_mRNA --out
> dest_loc
>
>
> For multiple types, from the script itself you get:
>
> =item --type <feature types to process>
>
> Only process features of the given type.  Can take either single type
> names, e.g. "mRNA", or type names qualified by "source" name, for
> whatever definition of "source" your data file might have.  For
> example, "mRNA:exonerate" will filter for only mRNA features that have
> a source of "exonerate".
>
> Multiple type names can be specified by separating the type names with
> commas, e.g. C<--type mRNA:exonerate,ncRNA>.
>
> Might be easier to play with a small scaffold at first until you get what
> you want.
>
> Someone on the gmod-ajax panel will likely know more than me about the
> UCSC piece.
>
> Nathan
>
>
> On Nov 17, 2020, at 2:33 AM, smm <sm...@eb...> wrote:
>
> Hello,
>
> These are three example commands that I have run. None of them show any
> error. But the visualization is not ideal on Apollo.
>
> 1. GFF
> ~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --gff
> GRCh38_latest_genomic.gff --trackLabel RefSeq_GRCh38 --out dest_loc
>
> 2. BigBED to BED
> ./bigBedToBed mane.0.9.bb mane.0.9.bed
> ~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --bed
> mane.0.9.bed --trackLabel UCSC_MANE --out dest_loc
>
> 3. SQL
> ~/web-apollo-test-server/Apollo-2.6.1/bin/ucsc-to-json.pl --in source_loc
> --track gtexGene --out dest_loc
>
> Please let me know if you require any other information. The source for
> each file is available here-
> https://docs.google.com/spreadsheets/d/1l_O-GYGqyU6Sk9hBSWFwp-ehjr0RAQndpylnYOrzmzA/edit?usp=sharing
>
> Regards,
> Shamika
>
> On 16/11/2020 16:41, Nathan Dunn wrote:
>
>
> I’d have to see the individual data and commands you were using for each.
>
> More info can be found here https://jbrowse.org/docs/html_features.html and
> here https://jbrowse.org/docs/flatfile-to-json.pl.html
>
> That being said, it LOOKS like the filter you are running is for genes,
> where typically the top-level should be mRNA (for your working attachment),
> but that is just a guess.
>
> If you provide a snippet, command, and output I can provide some more
> direct feedback.
>
> Nathan
>
>
>
> On Nov 16, 2020, at 7:26 AM, Shamika Mohanan <sm...@eb...> wrote:
>
> Hello,
>
> I am trying to load GFF/GTF/BED/BigBED/BAM/UCSC SQL files using the
> command line scripts available in Apollo-2.6.1/bin.
>
> Most files load properly and are available on Apollo as seen in
> attachment_1. There are few files that do load but do not display the data
> properly as seen in attachment_2.
>
> I have listed the files that show this problem here-
> https://docs.google.com/spreadsheets/d/1l_O-GYGqyU6Sk9hBSWFwp-ehjr0RAQndpylnYOrzmzA/edit?usp=sharing
>
> Should I set some option when running the scripts?
>
> Regards,
> Shamika
> <attachment_2.png><attachment_1.png>
>
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an
> email to apo...@lb....
>