From: James B. <jk...@sa...> - 2010-05-21 08:25:41
|
On Thu, May 20, 2010 at 11:15:10AM -0400, Heng Li wrote: > Firstly, I should say I am happy with the coding system and I do not see > the ability to keep long event is a must-have feature (I just think it > would be good to encode more "natural" machine events). I'm wondering if, with a few restrictions, it may be possible to convert from one to another. For example "4A or 5A" is nothing more than A A A "A or AA". Similarly we can string runs of bases together to go backwards. The complications come from a generalised system allowing more than homopolymers. A code meaning "AAAA or GGT or CTTCG" for example couldn't be flattened into a series of smaller codes easily, but I can't see any reason for generating such things. That seems to imply that it's easier to take runs of probable homopolymers (A A A A+AA) and collapse into longer single events, provided these runs have equal similar distribution of likelihoods across the non-called bases. (Which should be true when they're artificially generated from a 454 long homopolymer.) > In general, we should keep the readability, backward compatibility and > context independency (we do not need to remember the header for most > processing) of SAM, while allow BAM to be changed radically. This is what I was referring to before about APIs vs human readable (or trivially parseable) formats. The simplicity of SAM means there is a low barrier to take up, while BAM allows for programmatic efficiency. If the API to bam remains the same (ie picard and samtools) then it doesn't really matter too much how the underlying binary format looks. James -- James Bonfield (jk...@sa...) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |