From: Biocyberman <bio...@gm...> - 2015-03-12 10:05:38
|
Hello Useq community, I am trying to use Useq for RNAseq analysis with novoalignCS as the aligner. My questions: How can I avoid this manual fixing of the header? And how can I use the BAM file directly for STP? Here are some information: I produced each BAM file per lane for one sample. I used SamTranscriptomeParser to convert the BAM file to chromosomes coordinates. Here come my problems: 1. I can't use the BAM file directly for '-f' argument, I get around this by converting the BAM file to SAM and feed it to STP. But I still wonder if there is a way to use the BAM file directly. 2. I can't merge the BAM files after step 1. For example: Sample1_Lane1.BAM, Sample1_Lane2.BAM, and Sample1_Lane3.BAM. The problem is that the @SQ lines of these BAM files are different, both at SN and LN values. If I do with test data of small number of reads, there are more differences in SN values because STP throw away all chromosomes that do not have reads. Below is first 30 lines from my RAW bam header, BEFORE the conversion @HD VN:1.0 SO:unsorted @PG ID:novoalignCS PN:novoalignCS VN:V1.05.00 CL:novoalignCS -d rn6.rnaseq.n60k.cnx -f L03.xsq -F XSQ LT2 -o SAM -r Random @SQ SN:chr1 LN:282763074 AS:rn6.rnaseq.cnx @SQ SN:chr2 LN:266435125 AS:rn6.rnaseq.cnx @SQ SN:chr3 LN:177699992 AS:rn6.rnaseq.cnx @SQ SN:chr4 LN:184226339 AS:rn6.rnaseq.cnx @SQ SN:chr5 LN:173707219 AS:rn6.rnaseq.cnx @SQ SN:chr6 LN:147991367 AS:rn6.rnaseq.cnx @SQ SN:chr7 LN:145729302 AS:rn6.rnaseq.cnx @SQ SN:chr8 LN:133307652 AS:rn6.rnaseq.cnx @SQ SN:chr9 LN:122095297 AS:rn6.rnaseq.cnx @SQ SN:chr10 LN:112626471 AS:rn6.rnaseq.cnx @SQ SN:chr11 LN:90463843 AS:rn6.rnaseq.cnx @SQ SN:chr12 LN:52716770 AS:rn6.rnaseq.cnx @SQ SN:chr13 LN:114033958 AS:rn6.rnaseq.cnx @SQ SN:chr14 LN:115493446 AS:rn6.rnaseq.cnx @SQ SN:chr15 LN:111246239 AS:rn6.rnaseq.cnx @SQ SN:chr16 LN:90668790 AS:rn6.rnaseq.cnx @SQ SN:chr17 LN:90843779 AS:rn6.rnaseq.cnx @SQ SN:chr18 LN:88201929 AS:rn6.rnaseq.cnx @SQ SN:chr19 LN:62275575 AS:rn6.rnaseq.cnx @SQ SN:chr20 LN:56205956 AS:rn6.rnaseq.cnx @SQ SN:chrX LN:159970021 AS:rn6.rnaseq.cnx @SQ SN:chrY LN:3310458 AS:rn6.rnaseq.cnx @SQ SN:Zbtb22:chr20:5478760-5478829_5479425-5479494 LN:138 AS:rn6.rnaseq.cnx @SQ SN:Taf11:chr20:7477919-7477988_7482497-7482566 LN:138 AS:rn6.rnaseq.cnx @SQ SN:Taf11:chr20:7477185-7477254_7478450-7478519 LN:138 AS:rn6.rnaseq.cnx @SQ SN:Taf11:chr20:7478469-7478538_7480277-7480346 LN:138 AS:rn6.rnaseq.cnx @SQ SN:Taf11:chr20:7477185-7477254_7477891-7477960 LN:138 AS:rn6.rnaseq.cnx @SQ SN:Taf11:chr20:7480357-7480426_7482497-7482566 LN:138 AS:rn6.rnaseq.cnx And below is the header produced by STP after the conversion: @HD VN:1.4 SO:coordinate @RG ID:2.LT2.3 SM:LT2 CN:AfMD LB:LT2_2 PL:SOLiD PU:2.LT2.3 @SQ SN:chr10 LN:112626471 AS:rn6.rnaseq.cnx @SQ SN:chr11 LN:90463843 AS:rn6.rnaseq.cnx @SQ SN:chr12 LN:52716770 AS:rn6.rnaseq.cnx @SQ SN:chr13 LN:114033958 AS:rn6.rnaseq.cnx @SQ SN:chr5 LN:173707219 AS:rn6.rnaseq.cnx @SQ SN:chr6 LN:147991367 AS:rn6.rnaseq.cnx @SQ SN:chrY LN:3310458 AS:rn6.rnaseq.cnx @SQ SN:chr7 LN:145729302 AS:rn6.rnaseq.cnx @SQ SN:chr8 LN:133307652 AS:rn6.rnaseq.cnx @SQ SN:chr9 LN:122095297 AS:rn6.rnaseq.cnx @SQ SN:chrX LN:159970021 AS:rn6.rnaseq.cnx @SQ SN:chr15 LN:111246239 AS:rn6.rnaseq.cnx @SQ SN:chr14 LN:115493446 AS:rn6.rnaseq.cnx @SQ SN:chr17 LN:90843779 AS:rn6.rnaseq.cnx @SQ SN:chr16 LN:90668790 AS:rn6.rnaseq.cnx @SQ SN:chr19 LN:62275575 AS:rn6.rnaseq.cnx @SQ SN:chr1 LN:282763074 AS:rn6.rnaseq.cnx @SQ SN:chr18 LN:88201929 AS:rn6.rnaseq.cnx @SQ SN:chr2 LN:266435125 AS:rn6.rnaseq.cnx @SQ SN:chr3 LN:177699992 AS:rn6.rnaseq.cnx @SQ SN:chr4 LN:184226339 AS:rn6.rnaseq.cnx @PG ID:SamTranscriptomeParser CL: args [12 Mar 2015 9:53] USeq_8.8.9 -f LT2.2.LT2.3_unsorted.fifo.sam -a 900 -n 100 -u -s LT2.2.LT2.3_unsorted.tmp.bam @PG ID:novoalignCS PN:novoalignCS VN:V1.05.00 CL:novoalignCS -d rn6.rnaseq.n60k.cnx -f L03.xsq -F XSQ LT2 -o SAM -r Random As you may see, the canonical chromosomes got reordered, chromosome 20 is thrown out. If I have @RG lines, they would be thrown out also. I can extract the header I want to use from the raw bam file: samtools view -H raw_alignment.bam| egrep -v "@SQ\s+SN:[A-Za-z0-9]+:.*$|@HD". Then I can add the header with: "SamTranscriptomeParser -f test.sam -s test.bam -h head.txt " Thanks Vang |