From: Heng Li <lh...@sa...> - 2010-06-22 04:33:19
|
On Mon, Jun 21, 2010 at 11:59:34PM -0400, Heng Li wrote: > > > On Tue, Jun 22, 2010 at 09:47:16AM +0800, Colin Hercus wrote: > > Hi Tim, > > > > Thanks for the quick reply. > > > > Thought of that one but I think it needs a large -ve number of N's, > > 68M-2000N32M > > That is a problem because CIGAR disallows negative lengths. I do not > have a satisfactory solution. Perhaps I would store the split read in > two records and treat one part as single-end read. This is certainly not > optimal, though. Here is another (possibly bad) solution: to modify the SEQ and the QUAL fields such that we can use a CIGAR like 68M2000N32M to describe the split alignment. For example, suppose the raw reads are AAACCC and GGGG and the alignment is: CCC><GGGG<>AAA; we use CCCAAA as the SEQ field instead of AAACCC. Just a random thought. Heng > > BTW, this is actually an example that arbitrarily defining orientation > in the @RG header is not always straightforward. When we have PacBio's > strobe reads, it will be even more difficult. > > Heng > > > > > [image: Screenshot.png] > > Colin > > > > On Tue, Jun 22, 2010 at 9:28 AM, Tim Fennell <tfe...@br...>wrote: > > > > > I think I'd be tempted to represent this as one primary record per end, > > > with the split end having a large N operation in the middle of it's cigar. > > > So if the junction turned up at base 70/101 I'd pull together the split > > > read alignment and generate a single cigar or 69M2000N32M. The advantages I > > > see of doing it this way: > > > > > > 1) You still only have one sam record per end so all your usual inferences > > > apply > > > 2) All your bases from the one read are in one place > > > 3) You can actually count your split reads easily by asking how many reads > > > have jump-sized skips in them > > > > > > -t > > > > > > On Jun 21, 2010, at 9:22 PM, Colin Hercus wrote: > > > > > > > Hi, > > > > > > > > I'm not sure how how to represent paired end reads when one read has > > > split alignments and was wondering if someone could advise on the best > > > method. > > > > > > > > The issue arises with Illumina mate pair libraries where the junction > > > from cicularisation may land in one of the reads and result in a split > > > alignment for the read. > > > > > > > > SAM specifications allow two primary alignments for a split read and it > > > seems OK to use this to split one read but then we have the other read which > > > isn't split. To store two proper pairs with mate alignment locations, isize > > > etc. we need to store two alignments for the unsplit read. So what do we do > > > with the unsplit read, two records both primary or one primary and one > > > secondary but both with a primary mate? > > > > > > > > Any suggestions on how to represent this in SAM would be appreciated. > > > > > > > > Thanks, Colin > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > ThinkGeek and WIRED's GeekDad team up for the Ultimate > > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > > > > lucky parental unit. See the prize list and enter to win: > > > > > > > http://p.sf.net/sfu/thinkgeek-promo_______________________________________________ > > > > Samtools-devel mailing list > > > > Sam...@li... > > > > https://lists.sourceforge.net/lists/listinfo/samtools-devel > > > > > > > > > > > ------------------------------------------------------------------------------ > > ThinkGeek and WIRED's GeekDad team up for the Ultimate > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > > lucky parental unit. See the prize list and enter to win: > > http://p.sf.net/sfu/thinkgeek-promo > > > _______________________________________________ > > Samtools-devel mailing list > > Sam...@li... > > https://lists.sourceforge.net/lists/listinfo/samtools-devel > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > Samtools-devel mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-devel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |