|
From: Petr D. <pd...@sa...> - 2012-11-26 12:57:18
|
Hi Gonzalo, I welcome the idea of standardizing the functional annotations. Here is an example of a wildly evolved format that we have been using so far: ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence of the ALT alleles from Ensembl 66 VEP v2.4, format transcriptId:geneName:consequence[:codingSeqPosition:proteinPosition:proteinAlleles:proteinPredictions]+...[+gerpScore]"> and two concrete examples, the first for a multiallelic site: CSQ=621:S>R:Grantham,110:Allele,C:Gene,Ssh3 +ENSMUST00000037992:ENSMUSG00000034616:SYNONYMOUS_CODING:1863:621:S>S:Allele,A:Gene,Ssh3 CSQ=ENST00000382410:DEFB125:NON_SYNONYMOUS_CODING:184:62:H>Y:SIFT,tolerated(0.41):PolyPhen,benign(0):Condel,neutral(0.015):Grantham,83 I am curious what other formats are in use? I'd prefer not to introduce whitespaces in the INFO field or change the column delimiters to spaces or extend to whitespaces; it would break existing software and wouldn't bring much benefit. Petr On Mon, 2012-11-26 at 11:20 +0000, Peter Cock wrote: > On Mon, Nov 26, 2012 at 5:12 AM, Eric Banks <eb...@br...> wrote: > > Hi Bradford, > > > > I do understand where you're coming from, but truthfully I'd prefer to go in > > the opposite direction once we're open to changing delimiters. I've never > > quite understood why VCF is tab-delimited and not whitespace-delimited. > > Tab separated makes it easy to use in Galaxy, R, etc, even Excel - please > keep that. It is a good thing! > > > You wouldn't believe how many times people have manually generated > > VCFs that were space-delimited and couldn't understand why they were > > failing in VCF parsers. > > I'd be asking why doesn't your parser give a clearer error message? > If you've seen people fall over this pothole many times the parser > concerned should be fixed. > > > I'd much rather that all whitespace be treated equally (as it is > > visually). It makes for a much simpler spec. > > The problem with white space is you can't see how many characters > there are - spaces and tabs are not treated equally visually. What > would you expect if there were several spaces in a row? If you treat > it as one separator you prevent using empty cells (I'm thinking in > terms of generalities here, not just VCF). > > Regards, > > Peter -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |