Differences between Picard and SAM
Picard attempts to conform to the SAM format specification, but there are a few areas in which it diverges:
- Strictness: In many cases Picard complains about constructs that are allowed in the SAM spec. In some cases passing VALIDATION_STRINGENCY=LENIENT or SILENT will allow the program to continue, but in other cases the requirement is essential to the program's correct execution. Typically a Picard program will fail in these cases.
- Multi-segment reads: Picard can handle unpaired reads (i.e. single-end), or paired reads, but is not prepared to handle more than two segments for a read.
- Secondary alignments: A number of Picard programs can handle secondary alignments, but typically these programs either ignore these alignments or pass them from input to output unchanged. MergeBamAlignment has the PRIMARY_ALIGNMENT_STRATEGY that can be used to determine how the program will select a primary alignment among multiple alignments for a segment in an aligner's output.
- TLEN: The original definition of the TLEN field of a SAM record was the distance between the 5' ends, with leftmost segment having positive value and rightmost segment negative. This is what Picard implements. At some point, the spec was changed to define TLEN as the distance between the leftmost mapped base to the rightmost mapped base, with leftmost segment having positive value and rightmost segment having negative value.
- CIGAR validation: Picard's validation of CIGAR strings is more stringent than that allowed by SAM spec. Picard's CIGAR validation is oriented toward resequencing data rather than assembly. If these validations get in your way, you can turn down VALIDATION_STRINGENCY, or for ValidateSamFile, use the IGNORE option to turn off validations you don't want.
- queryname sort order: queryname sort order is not clearly defined in the SAM spec. Picard implements queryname order as a simple lexical ordering. samtools implements queryname sort order differently than Picard.