Re: [svtoolkit-help] Gene Overlap Annotator
Status: Beta
Brought to you by:
bhandsaker
From: Bob H. <han...@br...> - 2015-06-12 20:07:28
|
Hi, Markus, If the GTF is reasonably similar to gencode, then it should work. The recognized feature types are gene, transcript, exon, UTR, CDS, start_codon and stop_codon. The gene features must contain a gene_id attribute (and optional gene_name). The transcript features must contain a transcript_id attribute (and optional transcript_name). For efficiency, we require that the input GTF files are sorted in a particular order and obey certain conventions: 1. Every "gene" record must have a unique gene_id and the coordinates in the gene record must contain union of all the coordinates in all records with the same gene_id. 2. Records in the input file grouped by gene_id, and among the records with the same gene_id, the gene record must come first. 3. The sort order of gene records (i.e. between the groups) is based on chromosome, then start, then end coordinate. 4. The sort order of records within each gene record group is that the gene record must come first, followed by all other records ordered by chromosome, then start, then end coordinate. 5. Chromosomes must be in reference sequence order. 6. Genes are not allowed to span chromosomes. The SortGTFFile utility is a rather quick-and-dirty piece of code to convert gencode GTFs into an order compatible with the above. One thing I see that it assumes is that gene_id is the first attribute in each record (although the rest of the code does not depend on this). SortGTFFile does not fully parse the records. Hope this helps. If you have an example file you can't get correctly sorted, let me know. You can always get a little bit of command line help (at least a listing of the arguments) with "java -cp SVToolkit.jar:GenomeAnalysisTK.jar org.broadinstitute.sv.apps.SortGTFFile --help" -Bob On 6/10/15 11:15 AM, Markus Sällman Almén wrote: > Hi, > I have used the deletion pipeline and would now like to use the Gene > Overlap Annotator to classify my deletions. This is a non-human > organism so I need to use my own GTF annotation file, which do not > work as it is even if it is pretty similar to GENCODEs GTF. > The specifications for this file is not clear to me, including the > sort order. Also, the link to the SortGTFFile utility in the > documentation seems dead. > > Could you please provide some clearer specifications of the GTF file > and information about > SortGTFFile? > > Best, > Markus > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > svtoolkit-help mailing list > svt...@li... > https://lists.sourceforge.net/lists/listinfo/svtoolkit-help |