From: Fred L. <fl...@sd...> - 2011-12-14 08:57:09
|
How about something like this for Picard: * Bump the SAM spec version number to 1.5 when the semantics of the "B" operator are finalized. * Add method "boolean SAMRecord.convertToVersion(String version)", which will convert the current SAM record to the specified version, if necessary. If information is lost (e.g., QUAL values are merged), the return value is true, otherwise the return value is false. If it is impossible to do the conversion, or a bad version number is specified, an exception is thrown. * For reading SAM files: o Add a method SAMFileReader.setAcceptedVersion(String accepted_version, boolean allow_conversion). o By default, setAcceptedVersion("1.4", true) is automatically called when opening a SAM/BAM file, which converts all records to version 1.4. o For filtering programs such as SortSam, setAcceptedVersion(null, false) should be called (no conversion necessary or allowed). o If the header's version number is greater than accepted_version: + A warning is emitted when the header is read, unless ValidationStringency is set to SILENT. + Incoming SAM records are scanned for incompatible operators. If incompatible operators are found: # If allow_conversion is false, then a fatal exception is generated. # Otherwise, convertToVersion(accepted_version) is called on the current SAM record, and a warning is emitted if information is lost, unless ValidationStringency is set to SILENT. o (Alternatively, we could check all SAM records for illegal operators, no matter what the header's version number is, but this would be more expensive.) * For writing SAM files: o Add method SAMFileHeader.setVersion(String version), allowing the user to set the VN tag in the header. o If a CIGAR string is created that has illegal operator's wrt the version number in the header, a fatal exception is generated. o For writing version 1.5 records (that may contain the "B" operator), SAMFileHeader.setVersion("1.5") needs to be called. The goals behind this are: * Allow the user to specify precisely how to handle unexpected operators and different SAM versions. * Minimize the risk of silent data loss (lossy conversion with no user notification). * Be a little bit more general than "setRemoveB" so that fewer changes will need to be made in the future. FL |