From: Heng Li <lh...@sa...> - 2009-03-18 22:34:42
|
Hello Jim, In SAM, we allow a read stored in multiple records. Say the sequence of a read "rd1" is "acgttgca". Its first 4bp mapped to chr1:100 and the last to chr2:200. We may save the alignment as: rd1 chr1 100 4M4H acgt .... (other alignments) rd1 chr2 200 4H4M tgca I intend to reuse MRNM and MPOS fields to indicate the position of the other part. We may also add additional fields to keep this. I have not thought it through. I think SAM (or a slightly modified version of SAM) can store alignments on a De Brujin graph. It is interesting to know you have a better way to compressed the sequence data. How is it achieved? Thanks, Heng On 18 Mar 2009, at 22:03, Knight, James wrote: > Excellent. Thank you. > > The sticking point I've run into with every alignment format has > been the belief that all reads can be aligned contiguously, despite > split alignments caused by rearrangements, by mapping transcript > data to the genome, and other examples I can think of. > > And, the inclusion of split alignments then allows for the > description of full alignments of DeBrujin graph assembly alignments. > > I just heard about the SAM/BAM format at the Genome Informatics > Alliance conference, and checked my internal alignment data > structures, finding that I'm using about 0.3 bytes per base to store > what looks like the same information. Is there interest in > discussing this? > > Jim > > -------------------------- > Sent using BlackBerry > > > From: Tim Fennell > To: Knight, James {454_~Branford} > Cc: Toby Bloom ; sam...@li... > Sent: Wed Mar 18 17:21:53 2009 > Subject: Re: [Fwd: Re: sam question] > > That's a great question Jim. I don't know if there's a way in the > current spec to store the alignments of reads that span breakpoints > of transversions. I haven't personally thought about this but I'm > cc'ing the samtools-help mailing list where other people involved in > the SAM spec may have an answer. If not we should think about how > to encode this in the next iteration of the spec. > > -t > > On Mar 18, 2009, at 4:53 PM, Knight, James wrote: > >> What would happen with an inversion where one half of the read is >> in the opposite direction from the other? >> >> Jim >> >> -------------------------- >> Sent using BlackBerry >> >> >> From: Toby Bloom >> To: Knight, James {454_~Branford} >> Sent: Wed Mar 18 16:29:06 2009 >> Subject: [Fwd: Re: sam question] >> >> Hope this answers your question. >> >> -------- Original Message -------- >> Subject: >> Re: sam question >> Date: >> Wed, 18 Mar 2009 16:25:51 -0400 >> From: >> Tim Fennell <tfe...@br...> >> To: >> Toby Bloom <bl...@br...> >> References: >> <49C...@br...> >> >> >> Right. We'd represent that as a single entry in the sam file with >> the >> whole read. The CIGAR string which records the alignment has >> operators for encoding such events. >> >> The three operators of interest are: >> I = insertion (bases in the read that are not in the reference) >> D = deletion (bases in the reference that are not in the read) >> N = skip (like a deletion, but intended to be longer and imply a >> non- >> deletion event like splicing) >> >> So if you had a 300 base read that had a small insertion after 30 >> bases and then hit a splice site at 150 bases and the next exon is >> 10,000 bases away you might have a cigar string that looks like this: >> 30M4I150M10000N126M >> >> -t >> >> > > ------------------------------------------------------------------------------ > Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) > are > powering Web 2.0 with engaging, cross-platform capabilities. Quickly > and > easily build your RIAs with Flex Builder, the Eclipse(TM)based > development > software that enables intelligent coding and step-through debugging. > Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com_______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |