From: Dobin, A. <do...@cs...> - 2012-12-10 19:39:13
|
Hi Alec, Heng, thanks for your replies. It would be nice if SAM-community could come up with some guidelines for storing chimeric alignments. For example, it's not clear to me how to set RNEXT/PNEXT for the case when one of the mates is split chimerically. As far as I understand current SAM specifications, both pieces of that mate should have RNEXT/PNEXT set at the other mate. I imagine that it would be more informative if all the pieces of the read we referring to the next one in a "circular fashion". I do not have a problem with adding the "secondary alignment" flag to portions of a chimeric read if that's what required for Picard to work. That will make .sam compatible with Picard, and samtools won't have a problem with that either. However, in the absence of a common standard, it might create a problem for other downstream tools. Cheers Alex From: Heng Li [mailto:lh...@sa...] Sent: Monday, December 10, 2012 12:29 PM To: Alec Wysoker Cc: Dobin, Alexander; 'sam...@li...' Subject: Re: [Samtools-help] chimeric alignments with Picard It is right that the SAM spec does not describe the standard way to store chimeric alignments, but it would be good for Picard to accept such alignments. Given longer reads, chimeric alignments will be more frequent. We will lose data if we just drop them or report one segment only. Picard may ignore chimeric alignment for long-range operations such as MarkDuplicates. I am always concerned that Picard's ValidateSamFile is a little misleading. From its name, we may think a SAM rejected by Picard is invalid, but frequently it is not the case. Picard in fact rejects valid BAMs containing features not well supported by Picard or some details that might look like errors (e.g. demanding '*' for unmapped reads). I think it is more appropriate to call it CheckSamFile. Also a better output would be a report of not supported features rather than complaining these features are errors, something like this: === START === BAM missing terminator block Yes BAM containing reads without mapQ No BAM containing chimeric alignments Yes (MarkDuplicates/SamToFastq not working) Unmapped reads having non-'*' CIGAR No The file is valid. MarkDuplicates/SamToFastq do not work. The file is not Picard compatible. === END === Heng On Dec 10, 2012, at 11:04 AM, Alec Wysoker wrote: Hi Alex, Unfortunately, there isn't really a standard. It is a limitation of SAM format. Some aligners produce multiple primary alignments for the same read, which as you have discovered, Picard complains about. -Alec On Dec 7, 2012, at 4:43 PM, Dobin, Alexander wrote: Dear All, what is the standard (or most common way) to represent chimeric alignments in SAM? I searched the forum but could not find any discussions with detailed recommendations. The chimeric alignments are generated from RNA-seq data with our mapper (STAR), and users require that the alignments are compatible with Picard. In particular, we are having problem with an alignment where a chimeric junctions splits one of the mates between chromosomes (say mate1 into mate1a and mate1b), and mate1b piece aligns concordantly with mate2. To validate such an alignment with Picard, I had to mark one of the three records as secondary alignment. Here is a simple example I tried to work out in Picard: @SQ SN:chr9 LN:141213431 @SQ SN:chr22 LN:51304566 SINATRA_0006:3:3:6387:5665#0 65 chr22 23632554 3 47M29S chr9 133729520 0 SINATRA_0006:3:3:6387:5665#0 65 chr9 133729451 3 47S29M = 133729520 145 SINATRA_0006:3:3:6387:5665#0 129 chr9 133729520 3 76M = 133729451 -145 The records 1 and 2 belong to mate1, record3 - to mate2. ValidateSamFile.jar reports multiple errors on this file: ERROR: Record 1, Read name SINATRA_0006:3:3:6387:5665#0, Mate alignment does not match alignment start of mate ERROR: Record 2, Read name SINATRA_0006:3:3:6387:5665#0, Mate alignment does not match alignment start of mate ERROR: Record 2, Read name SINATRA_0006:3:3:6387:5665#0, Mate reference index (MRNM) does not match reference index of mate ERROR: Record 1, Read name SINATRA_0006:3:3:6387:5665#0, Both mates are marked as first of pair ERROR: Read name SINATRA_0006:3:3:6387:5665#0, Mate not found for paired read If I add 0x100 bit (secondary alignment) on record 2 Picard does not report any errors, however, it seems that it does not try to match RNEXT or PNEXT of the secondary alignment to any records. I also tried to use FixMateInformation.jar, which yielded: SINATRA_0006:3:3:6387:5665#0 65 chr22 23632554 3 47M29S chr9 133729451 0 SINATRA_0006:3:3:6387:5665#0 65 chr9 133729451 3 47S29M chr22 23632554 0 SINATRA_0006:3:3:6387:5665#0 129 chr9 133729520 3 76M = 133729451 -145 It is strange that now record 2 points to record one in RNEXT/PNEXT, even though the belong to the same mate, and the ValidateSamFile.jar reports a few errors. Any insight would be greatly appreciated! Alex ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d_______________________________________________ Samtools-help mailing list Sam...@li...<mailto:Sam...@li...> https://lists.sourceforge.net/lists/listinfo/samtools-help ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d_______________________________________________ Samtools-help mailing list Sam...@li...<mailto:Sam...@li...> https://lists.sourceforge.net/lists/listinfo/samtools-help = -- The Wellcome Trust Sanger Institute is operated by Genome Research = Limited, a charity registered in England with number 1021457 and a compa= ny registered in England with number 2742969, whose registered office is 2= 15 Euston Road, London, NW1 2BE. = |