|
From: Alec W. <al...@br...> - 2012-12-11 15:50:12
|
Hi Heng, I'm reluctant to change the name of ValidateSamFile at this point given how long the name has been out there. I don't understand the sample output format below -- could you explain it a little more? ValidateSamFile already distinguishes between errors and warnings, and it can be invoked with IGNORE_WARNINGS=true to display only errors. -Alec On Dec 10, 2012, at 12:28 PM, Heng Li wrote: > It is right that the SAM spec does not describe the standard way to store chimeric alignments, but it would be good for Picard to accept such alignments. Given longer reads, chimeric alignments will be more frequent. We will lose data if we just drop them or report one segment only. Picard may ignore chimeric alignment for long-range operations such as MarkDuplicates. > > I am always concerned that Picard's ValidateSamFile is a little misleading. From its name, we may think a SAM rejected by Picard is invalid, but frequently it is not the case. Picard in fact rejects valid BAMs containing features not well supported by Picard or some details that might look like errors (e.g. demanding '*' for unmapped reads). I think it is more appropriate to call it CheckSamFile. Also a better output would be a report of not supported features rather than complaining these features are errors, something like this: > > === START === > BAM missing terminator block Yes > BAM containing reads without mapQ No > BAM containing chimeric alignments Yes (MarkDuplicates/SamToFastq not working) > Unmapped reads having non-'*' CIGAR No > > The file is valid. > MarkDuplicates/SamToFastq do not work. > The file is not Picard compatible. > === END === > > Heng > |