From: Alec W. <al...@br...> - 2013-01-10 16:22:05
|
Hi Bradford, I think this would be better as a separate utility. My concern is that adding complexity to the UI what is now a pretty simple program will add to our support burden, and I don't think that your need to do this operation in an automated way is a common use case for the community in general. -Alec On Jan 9, 2013, at 9:40 PM, bradford powell wrote: > If I were just needing to do this for a few samples, "samtools view input.bam | sed <blah blah blah> | samtools view -Sb" would be an option, but this is intended to be for when multiple files need to be changed. Use case: the folks who did sequencing for you mangled the sample identifiers and you want to fix them. Manually editing each readgroup would be error prone. > > And actually, the case of creating a different RG ID is the very thing that prompted me to write this. For various reasons, RG IDs could be non-unique between files (i.e. programs adding a RG header may use a default of 0 or 1). This can cause problems for downstream programs that assume the RG ID will be unique to sample. > > My patch as written preserves the existing functionality of AddOrReplaceReadGroups, but adds the abilities to > i) only change one part of the header if you want (say, the sample name or library) > ii) edit a single readgroup within a file (i.e. not just replace any existing read groups with a single read group in the output) and > iii) edit the RG ID > > If you think that this would be better as a separate or external utility, then I can maintain it as such. I thought that since it does not alter the existing use of AddOrReplaceReadGroups (unless I missed something) and since others may find it useful that I would offer it up. > > -- Bradford > > > On Wed, Jan 9, 2013 at 4:37 PM, Alec Wysoker <al...@br...> wrote: > Hi Bradford, > > I have an alternative suggestion that doesn't require programmatic modifications: > Extract the header text from the input BAM, either with picard ViewSamFile or samtools view. > Hand-edit the header text to modify RG attributes. > Use picard ReplaceSamHeader to stuff that into the BAM. > Note that this doesn't allow you to create a different RG ID from the input, but I'm not sure if that is necessary. > > -Alec > > On Jan 8, 2013, at 2:54 PM, bradford powell wrote: > >> Sometimes, all you need to do is change just a little bit of an existing readgroup-- maybe you have a bunch of files where someone left a tag out, or you want to change the readgroup IDs because downstream programs want the readgroup IDs to be unique between files. >> >> I'm not sure if this would be best as a separate utility (UpdateReadGroup?), but the functionality is so similar to AddOrReplaceReadGroups that I propose the attached patch. >> >> It adds one new option (OLDRGID). If specified, the readgroup with ID==OLDRGID will be modified by options on the command line (tags not provided on the command line will be passed through unmodified). If RGID is specified in addition to OLDRGID, then reads with readgroup OLDRGID in the INPUT file will be changed to have RGID in the output file (other readgroups will remain intact). >> >> Most of the patch involves making additional arguments optional (there is still code to verify that LB, PL, SM, and PU are defined for the output readgroup). >> >> Example usage: >> >> java -jar dist/AddOrReplaceReadGroups.jar I=input.bam O=output.bam OLDRGID=0 RGID=8675309 SM=newsamplename >> >> -- Bradford >> <update-existing-readgroup.patch>------------------------------------------------------------------------------ >> Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS >> and more. Get SQL Server skills now (including 2012) with LearnDevNow - >> 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. >> SALE $99.99 this month only - learn more at: >> http://p.sf.net/sfu/learnmore_122512_______________________________________________ >> Samtools-devel mailing list >> Sam...@li... >> https://lists.sourceforge.net/lists/listinfo/samtools-devel > > |