From: Tim F. <tfe...@br...> - 2009-06-18 17:55:54
|
Hi Dave, I think what we're doing is slightly different that you've described, but close. We are, in essence, treating PG in a very similar manner to the way we treat the RG tag. More specifically, the header block can contain any number of @PG records, and then ever read has a PG attribute against it telling you which PG header goes with it. Here's an example from a recent file: .... @PG ID:0 VN:0.7.1-9 CL:/seq/software/picard/current/3rd_party/maq/maq map -D -s 0 -a 1500 -e 150 ... .... 42DFUAAXX090605:1:115:430:822#0 163 chrM 7 99 101M = 169 262 AGGTCT... ABB?BB... PG:Z:0 Given that we currently align all reads within a read group with the same command this does seem a little duplicated, but it definitely lets us track exactly how each read was aligned (or rather, by which aligner with which options). If we, at a later date, decide to do something like try to align all reads with bwa and then take all read that didn't align and run them through a second aligner, it'd allow us to track that too. -t On Jun 18, 2009, at 10:21 AM, Dave Larson wrote: > I had a question regarding the usage of the @PG header that I > hadn't found addressed in the archives. @PG seems to have some > weaknesses when storing multiple programs in the file. The biggest > one of these being that, if you have multiple programs generating > alignment records, there is currently no way to tie which reads were > aligned with which program. I've seen some forwarded emails > indicating that The Broad is circumventing this shortcoming by > replacing the ID field of @PG with the read group ID the program > entry is tied to. I've cc'd Tim directly so he can clarify if I am > mistaken and provide some additional input as they've clearly been > thinking about this issue much longer than I. > It seems to me that a community agreed upon convention for > creating files of alignment records generated from multiple programs > would be useful, especially for consortium type projects. I'm sure > there are many ways of accomplishing this. Personally, I think the > clearest would be two additional header fields. One optional tag > (PI?) for @RG indicating the program ID that generated the records > and a second non-optional tag (NM?) in @PG to store the program name > and free up ID to be a unique identifier for the program entry. > Alternatively, I suppose this sort of thing could be resolved with > user defined header tags, but it seemed of such general utility that > I thought I would query the list. > > Thanks, > > Dave |