From: Colin <col...@gm...> - 2020-11-17 20:25:43
|
Hi there, This is an interesting thread, basically it appears you want to make some hg38 type instance for JBrowse data. It might be nice if this was just already available so that it didn't require a lot of legwork but certainly happy to help here (we do have one here but I think it could be better organized http://hg38.jbrowse.org/) Here is a short summary of some options based on the files you listed >Capture_seq gencode_pasa_Captureseq_2.pasa_assemblies.gtf /nfs/production/panda/ensembl/havana/warehouse_th_group/jmg/long_read_pipeline/Capture_seq_2/gencode_pasa_Captureseq_2.pasa_assemblies.gtf.gz GTF Convert the gtf to gff, and load with flatfile-to-json (or use gff3tabix, but I'd suggest flatfile-to-json probably) Here is one option for how to convert gtf to gff https://jbrowse.org/docs/faq.html#how-do-i-convert-gtf-to-gff >GENCODE gencode.v35.annotation.gff3 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.annotation.gtf.gz GFF3 Load with flatfile-to-json --gff >GENCODE_Ensembl Homo_sapiens.GRCh38.101.gff3 ftp://ftp.ensembl.org/pub/release-101/gff3/homo_sapiens/Homo_sapiens.GRCh38.101.gff3.gz GFF3 Load with flatfile-to-json --gff >RefSeq hg38.ncbiRefSeq.gtf http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz GTF Probably convert gtf to gff, and then load with flatfile-to-json Note that another option is to convert to bigBed. First gtfToGenePred and then https://gist.github.com/gireeshkbogu/f478ad8495dca56545746cd391615b93 If you want a searchable gene names though, suggest using flatfile-to-json >RefSeq_GRCh38 GRCh38_latest_genomic.gff ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz GFF This one is tricky because it uses NCBI refnames so you'll have to convert them to chr1, chr2, etc. from NC_000001.11 etc. >SLRseq_2 SLRseq.GRCh38.gtf /nfs/production/panda/ensembl/havana/warehouse_th_group/jmg/long_read_pipeline/SLRseq_2/SLRseq.GRCh38.gtf.gz GTF Probably convert to gff3, load with flatfile-to-json >SLRseq_merged SLRseq_merged.bam ftp://ftp.ebi.ac.uk/pub/databases/havana/ngs_havana/havana/SLRseq/human/SLRseq_merged.bam // ftp://ftp.ebi.ac.uk/pub/databases/havana/ngs_havana/havana/SLRseq/human/SLRseq_merged.bam.bai BAM Manually download the BAM and BAI into your data folder, and edit it into tracks.conf with a text editor, don't have a great add track workflow for BAM files [tracks.SLRseq_merged_bam] urlTemplate=SLRseq_merged.bam That is all that is needed for your config >UCSC_MANE mane.0.9.bed https://hgdownload.soe.ucsc.edu/gbdb/hg38/mane/mane.0.9.ix // https://hgdownload.soe.ucsc.edu/gbdb/hg38/mane/mane.0.9.ixx // https://hgdownload.soe.ucsc.edu/gbdb/hg38/mane/mane.0.9.bb BigBed -> BED You can do two things 1) load this as is in bigBed format. Manually edit this into your tracks.conf file with a text editor, the bigBed is natively supported [tracks.MANE] key=MANE 0.9 BigBed urlTemplate=mane.0.9.bb 2) convert to gff, load with flatfile-to-json Note that the trix index (ix and ixx) are not able to be used by jbrowse currently (that would allow searching for genes in the bigBed files) so if you want searchable gene names, convert to gff, use flatfile-to-json >UCSC_ncbiRefSeqOther ncbiRefSeqOther.bed https://hgdownload.soe.ucsc.edu/gbdb/hg38/ncbiRefSeq/ncbiRefSeqOther.bb BigBed -> BED Manually download this into your tracks.conf file with a text editor, the bigBed is natively supported [tracks.ncbiRefSeqOther] key=NCBI RefSeq (other) BigBed urlTemplate=ncbiRefSeqOther.bb >UNIPROT UP000005640_9606_proteome.bed ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/genome_annotation_tracks/UP000005640_9606_beds/UP000005640_9606_proteome.bed BED Convert to bigBed or GFF. The extra columns are not loaded if using flatfile-to-json.pl --bed >gtexGene -- http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/gtexGene.sql // http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/gtexGene.txt.gz SQL Use ucsc-to-json.pl with these "database" files >unipAliSwissprot unipAliSwissprot.bed Convert to bigBed or GFF, extra columns will not be loaded -Colin On Tue, Nov 17, 2020 at 1:35 PM Nathan Dunn <nat...@lb...> wrote: > > Shamika, > > I don’t know the exact answer to your question, but usually you have to > supply a —type argument to process the top-level information you want to > display, so you could potentially put in multiple tracks. > > e.g., for just coding genes: > > ~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --gff > GRCh38_latest_genomic.gff —type mRNA --trackLabel RefSeq_GRCh38_mRNA --out > dest_loc > > > For multiple types, from the script itself you get: > > =item --type <feature types to process> > > Only process features of the given type. Can take either single type > names, e.g. "mRNA", or type names qualified by "source" name, for > whatever definition of "source" your data file might have. For > example, "mRNA:exonerate" will filter for only mRNA features that have > a source of "exonerate". > > Multiple type names can be specified by separating the type names with > commas, e.g. C<--type mRNA:exonerate,ncRNA>. > > Might be easier to play with a small scaffold at first until you get what > you want. > > Someone on the gmod-ajax panel will likely know more than me about the > UCSC piece. > > Nathan > > > On Nov 17, 2020, at 2:33 AM, smm <sm...@eb...> wrote: > > Hello, > > These are three example commands that I have run. None of them show any > error. But the visualization is not ideal on Apollo. > > 1. GFF > ~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --gff > GRCh38_latest_genomic.gff --trackLabel RefSeq_GRCh38 --out dest_loc > > 2. BigBED to BED > ./bigBedToBed mane.0.9.bb mane.0.9.bed > ~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --bed > mane.0.9.bed --trackLabel UCSC_MANE --out dest_loc > > 3. SQL > ~/web-apollo-test-server/Apollo-2.6.1/bin/ucsc-to-json.pl --in source_loc > --track gtexGene --out dest_loc > > Please let me know if you require any other information. The source for > each file is available here- > https://docs.google.com/spreadsheets/d/1l_O-GYGqyU6Sk9hBSWFwp-ehjr0RAQndpylnYOrzmzA/edit?usp=sharing > > Regards, > Shamika > > On 16/11/2020 16:41, Nathan Dunn wrote: > > > I’d have to see the individual data and commands you were using for each. > > More info can be found here https://jbrowse.org/docs/html_features.html and > here https://jbrowse.org/docs/flatfile-to-json.pl.html > > That being said, it LOOKS like the filter you are running is for genes, > where typically the top-level should be mRNA (for your working attachment), > but that is just a guess. > > If you provide a snippet, command, and output I can provide some more > direct feedback. > > Nathan > > > > On Nov 16, 2020, at 7:26 AM, Shamika Mohanan <sm...@eb...> wrote: > > Hello, > > I am trying to load GFF/GTF/BED/BigBED/BAM/UCSC SQL files using the > command line scripts available in Apollo-2.6.1/bin. > > Most files load properly and are available on Apollo as seen in > attachment_1. There are few files that do load but do not display the data > properly as seen in attachment_2. > > I have listed the files that show this problem here- > https://docs.google.com/spreadsheets/d/1l_O-GYGqyU6Sk9hBSWFwp-ehjr0RAQndpylnYOrzmzA/edit?usp=sharing > > Should I set some option when running the scripts? > > Regards, > Shamika > <attachment_2.png><attachment_1.png> > > > -- > To unsubscribe from this group and stop receiving emails from it, send an > email to apo...@lb.... > |