From: Alec W. <al...@br...> - 2014-03-27 19:16:10
|
Hi Jason, Did you create MC tags yourself? Or did they arrive as a result of running MergeBamAlignment or FixMateInformation? If you got the MCs via MergeBamAlignment or FixMateInformation, then getting latest version of Picard (1.110) and running whichever program you ran again on the same inputs as the first time will no longer produce MC tags, which should eliminate the CleanSam and ValidateSamFile problems. That should get you through the night until we fix the problems below. If you created the MC tags yourself, then until we fix the problems you'll need to ensure that they are soft clipped appropriately so as not to hang off the end of the reference. -Alec On Mar 27, 2014, at 12:49 PM, Alec Wysoker <al...@br...> wrote: > Hi Jason, > > Ugh. Several problems here. > > Picard release 1.109 was aggressively setting mate CIGAR (MC attribute), e.g. in FixMateInformation. This was reverted in release 1.110. > The validation of mate CIGAR is happening way too often, which is why you are seeing the error so many times. > CleanSam should fix the mate CIGAR. I don't know why it is not. > CleanSam should suppress warnings about CIGAR hanging off the end of reference, since it is fixing them. > I'm not sure why IGNORE option to ValidateSamFile is not working. > I don't have a good explanation for why you don't see these errors until 7M records into the file, other than perhaps you are getting toward the end of the chromosome where these errors are more likely to occur. > > I can't promise when these issues will be fixed, but this is a bad situation. I'll talk to folks here and figure out when this can be worked on. > > -Alec > > > > On Mar 27, 2014, at 11:51 AM, "Kost, Jason" <Jas...@um...> wrote: > >> Hello, >> >> As the subject says, I'm experiencing some strange behavior from CleanSam and ValidateSam and the above SAM validation error. The first oddity is that the message does not arise when FixMateInformation is run (immediately before CleanSam). Secondly, in each sample processed this error doesn't show up until after the 7,000,000'th read is processed, then it appears that the error is thrown 4 times for read to which it applies. A sample of the CleanSam output is included below: >> >> [Wed Mar 26 23:38:40 EDT 2014] net.sf.picard.sam.CleanSam INPUT=/tmp/97017/JG-710558.fixed.sorted.sam OUTPUT=/tmp/97017/JG-710558.cleaned.sam TMP_DIR=[/tmp/97017] VALIDATION_STRINGENCY=STRICT MAX_RECORDS_IN_RAM=10000000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false >> [Wed Mar 26 23:38:40 EDT 2014] Executing as jek29w@c05b10 on Linux 2.6.32-358.6.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15; Picard version: 1.109(1716) IntelDeflater >> INFO 2014-03-26 23:38:51 CleanSam Processed 1,000,000 records. Elapsed time: 00:00:10s. Time for last 1,000,000: 10s. Last read position: chr1:26,888,047 >> INFO 2014-03-26 23:39:01 CleanSam Processed 2,000,000 records. Elapsed time: 00:00:20s. Time for last 1,000,000: 9s. Last read position: chr1:56,837,177 >> INFO 2014-03-26 23:39:11 CleanSam Processed 3,000,000 records. Elapsed time: 00:00:30s. Time for last 1,000,000: 9s. Last read position: chr1:102,252,191 >> INFO 2014-03-26 23:39:23 CleanSam Processed 4,000,000 records. Elapsed time: 00:00:43s. Time for last 1,000,000: 12s. Last read position: chr1:147,586,597 >> INFO 2014-03-26 23:39:33 CleanSam Processed 5,000,000 records. Elapsed time: 00:00:53s. Time for last 1,000,000: 9s. Last read position: chr1:163,805,372 >> INFO 2014-03-26 23:39:43 CleanSam Processed 6,000,000 records. Elapsed time: 00:01:03s. Time for last 1,000,000: 10s. Last read position: chr1:202,915,732 >> INFO 2014-03-26 23:39:53 CleanSam Processed 7,000,000 records. Elapsed time: 00:01:12s. Time for last 1,000,000: 9s. Last read position: chr1:237,886,535 >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2304:20450:57091#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2304:20450:57091#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2304:20450:57091#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2304:20450:57091#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2214:18616:46388#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2214:18616:46388#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2214:18616:46388#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2214:18616:46388#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2205:12823:35897#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2205:12823:35897#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2205:12823:35897#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2205:12823:35897#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2307:6327:43815#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2307:6327:43815#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2307:6327:43815#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> Ignoring SAM validation error: ERROR: Read name HWI-ST1197R_0077_FC:1:2307:6327:43815#CGATGTCGATGT, Mate CIGAR M operator maps off end of reference >> >> Now, I suspect that I can suppress output of this message by CleanSam by setting VALIDATION_STRINGENCY to SILENT. However, I'm not completely certain about this given the behavior I'm seeing from ValidateSamFile. ValidateSamFile is throwing the same SAM validation error, but I cannot seem to suppress it no matter what I do. Setting VALIDATION_STRINGENCY to either LENIENT or SILENT has no effect. I have also tried using IGNORE=CIGAR_MAPS_OFF_REFERENCE similarly with no effect, and can find no other IGNORE option listed in the documentation which seems applicable. This is rather confusing for two reasons; first off, I would have expected CleanSam to clip the offending reads thus rendering the error moot. Secondly, I would have thought that setting the stringency to silent would at least have suppressed the output of the error, but it appears that neither is occurring. Is there something I've missed? Is there a way to fix the underlying problem in the data, or at the very least to actually suppress the output of this message by ValidateSamFile? >> >> Sincerely, >> Jason Kost >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Samtools-help mailing list >> Sam...@li... >> https://lists.sourceforge.net/lists/listinfo/samtools-help > |