From: Lucas S. <lsw...@bc...> - 2010-04-27 23:32:06
|
Will ReplaceSamHeader work on a bam file? Or would I need to convert bam to sam, use ReplaceSamHeader, then convert sam back to bam? ~Thanks, Lucas Alec Wysoker wrote: > Hi Lucas, > > In BAM, the alignment records do not have reference names. They have > indices into a table in the header. The list of reference sequences > is represented in two ways in the header. There is a binary > representation, and there is (optionally) a text representation. I > don't know how you created your BAM, but it has at least one of these. > > This is what you can do: > > 1. Use Picard's ViewSam command to dump the BAM to text. You can > pipe this into 'head' in order to grab the header lines rather > than converting the entire BAM to text. > 2. Edit the header that ViewSam dumped, fixing the reference > sequence names. > 3. Pass the file containing the header to ReplaceSamHeader with the > HEADER= option. > > -Alec > > Lucas Swanson wrote: >> Hi Alec, >> >> Will that replace the reference names in each alignment record >> though? I do not think the .bam file I am using has a header section. >> >> ~Thanks, >> Lucas >> >> Alec Wysoker wrote: >>> Hi Lucas, >>> >>> You can use ReplaceSamHeader from the Picard package, but despite >>> the name, it does not replace the header in place, but writes out a >>> new file with the header you supply. >>> >>> -Alec >>> >>> Lucas Swanson wrote: >>>> Thanks, but is there a quick and simple way to change all the >>>> reference names in my 7gig .bam file to use dashes rather than colons? >>>> >>>> ~Lucas >>>> >>>> >>>> Michael James Clark wrote: >>>> >>>>> In short: no. But there is a solution that will make it work. >>>>> >>>>> Replace the colon with a dash. I'm guessing your FASTA reference >>>>> genome >>>>> contig names had colons in them. You should not use FASTA files >>>>> with colons. >>>>> >>>>> The colon doesn't work because samtools uses the colon to signify >>>>> the end of >>>>> the contig name. >>>>> E.g.: >>>>> samtools view in.bam chr1:1000-2000. >>>>> >>>>> In your case, once you change the colons to dashes: >>>>> >>>>> samtools view in.bam k31-100019:1000-2000 >>>>> >>>>> Hope that helps. >>>>> >>>>> MJ >>>>> >>>>> >>>>> On 4/26/10 5:47 PM, "Lucas Swanson" <lsw...@bc...> wrote: >>>>> >>>>> >>>>>> I am having some trouble viewing reads aligned to a region and >>>>>> think it >>>>>> might be because my reference names have a colon in them, e.g. >>>>>> "k31:1000019". >>>>>> >>>>>> So the command I am giving and the output is: >>>>>> [lswanson@lucass01 orphan_info]$ ./samtools view >>>>>> full_test_orphan.bam >>>>>> "k31:1000019" >>>>>> [bam_header_read] EOF marker is absent. >>>>>> [main_samview] fail to get the reference name. Continue anyway. >>>>>> >>>>>> If I just do: >>>>>> [lswanson@lucass01 orphan_info]$ ./samtools view >>>>>> full_test_orphan.bam | >>>>>> head -n1 >>>>>> [bam_header_read] EOF marker is absent. >>>>>> SOLEXA3_96:5:6:1385:2018 73 k31:1000019 1 23 >>>>>> 50M =1 0 >>>>>> GGTATGAATGCCCTCGCCGAAGCACGCCACATCACCAGTAGAAGCCATCT >>>>>> @B8@@ABB@@;86?29/9>:<6:?.<29=8%%%%%%%%%%%%%%%%%%%% XT:A:U >>>>>> NM:i:1 >>>>>> SM:i:23 AM:i:0 X0:i:1 X1:i:4 XM:i:1 XO:i:0 XG:i:0 MD:Z:33T16 >>>>>> >>>>>> So I know that there is at least one read mapping to the >>>>>> reference named >>>>>> "k31:1000019". >>>>>> >>>>>> Is there any way to get the reads aligned to a reference with a >>>>>> colon in >>>>>> the reference name? >>>>>> >>>>>> ~Thanks, >>>>>> Lucas Swanson >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> Samtools-help mailing list >>>>>> Sam...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/samtools-help >>>>>> >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Samtools-help mailing list >>>> Sam...@li... >>>> https://lists.sourceforge.net/lists/listinfo/samtools-help >>>> >> |