|
From: Heng Li <lh...@sa...> - 2012-11-26 16:50:33
|
I agree that we should mandate TAB as the field separator. The parser should reject a VCF using space as the separator. I am also inclined to disallow space in INFO, though not strongly. Using underlines seems a better solution to me. If the purpose of detailed functional annotations is for human to read, underlines look better than spaces as you will not confuse an underline with a TAB. Heng On Nov 26, 2012, at 7:52 AM, Petr Danecek wrote: > Hi Gonzalo, > > I welcome the idea of standardizing the functional annotations. Here is > an example of a wildly evolved format that we have been using so far: > > ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence of the ALT > alleles from Ensembl 66 VEP v2.4, format > transcriptId:geneName:consequence[:codingSeqPosition:proteinPosition:proteinAlleles:proteinPredictions]+...[+gerpScore]"> > > and two concrete examples, the first for a multiallelic site: > > CSQ=621:S>R:Grantham,110:Allele,C:Gene,Ssh3 > +ENSMUST00000037992:ENSMUSG00000034616:SYNONYMOUS_CODING:1863:621:S>S:Allele,A:Gene,Ssh3 > > CSQ=ENST00000382410:DEFB125:NON_SYNONYMOUS_CODING:184:62:H>Y:SIFT,tolerated(0.41):PolyPhen,benign(0):Condel,neutral(0.015):Grantham,83 > > I am curious what other formats are in use? > > > I'd prefer not to introduce whitespaces in the INFO field or change the > column delimiters to spaces or extend to whitespaces; it would break > existing software and wouldn't bring much benefit. > > Petr > > > > On Mon, 2012-11-26 at 11:20 +0000, Peter Cock wrote: >> On Mon, Nov 26, 2012 at 5:12 AM, Eric Banks <eb...@br...> wrote: >>> Hi Bradford, >>> >>> I do understand where you're coming from, but truthfully I'd prefer to go in >>> the opposite direction once we're open to changing delimiters. I've never >>> quite understood why VCF is tab-delimited and not whitespace-delimited. >> >> Tab separated makes it easy to use in Galaxy, R, etc, even Excel - please >> keep that. It is a good thing! >> >>> You wouldn't believe how many times people have manually generated >>> VCFs that were space-delimited and couldn't understand why they were >>> failing in VCF parsers. >> >> I'd be asking why doesn't your parser give a clearer error message? >> If you've seen people fall over this pothole many times the parser >> concerned should be fixed. >> >>> I'd much rather that all whitespace be treated equally (as it is >>> visually). It makes for a much simpler spec. >> >> The problem with white space is you can't see how many characters >> there are - spaces and tabs are not treated equally visually. What >> would you expect if there were several spaces in a row? If you treat >> it as one separator you prevent using empty cells (I'm thinking in >> terms of generalities here, not just VCF). >> >> Regards, >> >> Peter > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > _______________________________________________ > VCFtools-spec mailing list > VCF...@li... > https://lists.sourceforge.net/lists/listinfo/vcftools-spec -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |