|
From: Alec W. <al...@br...> - 2009-07-20 16:39:49
|
Hi Sean, Just to expand on Bob's response -- where there is no mate alignment that is the "right" one for a particular alignment, I don't think you want to select an arbitrary mate alignment. Rather, the HI and IH tags allow you to create a linked list of alignments for the same read. You probably want to use the first alignment in the linked list for MRNM/MPOS, so that you can traverse the linked list and find all the candidate mate alignments. -Alec Bob Handsaker wrote: > Sean A. Irvine wrote: >> Thanks again. We now have a satisfactory resolution for the situation >> of reads mapping off the left end of a reference using soft clipping. >> >> Bob Handsaker wrote: >> > > I think 0x0008 should be interpreted to apply to the current >> record, and >> > > you are correct to set it on the first and last records above. >> [...] >> > > read-name 73 myref 81 255 25M * >> 0 0 TTCTGAGTGTACTTTATTATATGAG * >> >> Regarding this record, we are assuming the purpose of the flags field >> is to allow quick selection/filtering of SAM records based on flag >> settings. The reason we are not particularly happy with the current >> solution of marking the mate as unmapped is that makes it difficult to >> separate records worth processing for structural variance analysis >> (where there are good mappings for the mate), from those where there >> are actually no good mate mappings at all. >> >> We will probably go with Alec's suggestion of selecting an arbitrary >> MRNM/MPOS in order for us to allow 0x0008 to be unset (as we don't >> want to disable other validation). However, in general this seems >> rather unsatisfactory to us, since we can see no rational grounds >> for picking one set of MRNM/MPOS values over any of the others. > It seems to me that if you want to analyze structural variations and > you are going to the trouble to keep multiple alignments, then you > will want to "see" all of the possible mappings for both ends. > In the example you sent, you might have one set of alignments for this > pair with aberrant spacing but another set of alignments for this pair > with very plausible spacing. > It's just my two cents, but I'm not sure trying to do filtering on the > flags is the best approach. >> >> > > However, I think there are a couple of small problems with the sam >> > > records in this example: >> >> We agree with your suggestions (and yes, we did name the reads that >> way for clarity). >> >> >> Alex Wysoker wrote: >> > > * set MRNM and MPOS to refer to the "first" SAMRecord for the >> mate. When I say first, I am >> > > referring to the ordered list of alignments for a read, as >> defined by IH, HI, CC and CP tags. >> > > You can then locate all the candidate alignments for the mate. >> >> Regarding these optional tags, could you please clarify: >> >> HI - Are the HI index values for a read assumed to be in increasing >> order in the SAM file? We are using the Java samtools and its built-in >> sorting capability, thus any HI values we set would be the order the >> records were added, not the order they appear in the file. > I don't believe there was any intent to require that HI has to follow > the order of records in the file. > Sorting the file with a different sort order would reorder these > records and we certainly didn't intend to make more work for sorting. > The intent was that HI/IH in conjunction with CC/CP would allow you to > create (and navigate, using the bam index) a linked list of alignments > for the same read when the file is sorted in coordinate order. > I don't know of anyone who is actually using these tags in this way, > however. > |