Hi Alec,

Thank you for your helpful suggestions. I will try following your suggestions and see what will happen.

Cheers,

Yibo

On Jul 17, 2014, at 11:34, Alec Wysoker <alecw@broadinstitute.org> wrote:

Hi Yibo,

First of all, FYI this is a stack trace:

Exception in thread "main" net.sf.samtools.SAMException: Exception when processing alignment for BAM index HS2:410:C21E6ACXX:4:2316:13634:18930 1/2 97b aligned read.
    at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)
    at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)
    at net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:100)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
    at net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)
Caused by: net.sf.samtools.SAMException: Exception creating BAM index for record HS2:410:C21E6ACXX:4:2316:13634:18930 1/2 97b aligned read.
    at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
    at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
    ... 5 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32770
    at net.sf.samtools.BAMIndexer$BAMIndexBuilder.processAlignment(BAMIndexer.java:276)
    at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:92)
    ... 6 more

Regarding your original question, the problem is that apparently your have a contig that is longer than 512MB.  The BAM index format cannot handle a contig this size.  If you don't need a BAM index for your process, then you can pass CREATE_INDEX=False to all the Picard programs that write a BAM file, and this problem should go away.  If you need a BAM index, because you're going to query by genomic locus, then you've got a problem.

Regarding using MergeBamAlignment, I believe it should be able to handle the multiple alignments produced by Tophat.  I've never tried it with STAR, but I would suggest you try it without removing multiple mappings, then run ValidateSamFile on the BAM produced, and if that succeeds then you can assume it handles the multiple mappings produced by the aligner correctly.

-Alec

On 7/17/14, 2:23 PM, Yibo wrote:
Hi Alec,

Thank you very much for your reply and suggestion. I am not sure what the stack trace is, and I only had the running log reporting the error message (see attached). Yes, for Tophat2 output, there is an unmapped BAM file, but for STAR, there is no unmapped BAM file. Before I try to merge them, I have a question: I am using unique mapping results from the mapping output. So, should I remove multiple mapping results and then merge with unmapped file, or not remove and merge? Thanks again.

Yibo







On Jul 17, 2014, at 6:11, Alec Wysoker <alecw@broadinstitute.org> wrote:

Hi Yibo,

It would help if you send the stack trace that should be printed along with the error.  However, the fact that you need to reduce the validation stringency on the various PIcard tools means that you are likely to run into various problems.  I suggest you change your workflow as follows: rather than using the output of your aligner directly, use Picard MergeBamAlignment to combine the aligner output with a valid unmapped BAM.  This should eliminate all the Picard validation issues.  If you don't have an unmapped BAM, you can create one with Picard FastqToSam.

-Alec

On 7/16/14, 3:18 PM, Yibo wrote:
Dear all,

I am using Picard tools but encountered some problems. I want to use GATK to call snp for RNA-seq data. I have used two methods, Tophat2 and STAR, to perform RNA-seq mapping. Because I do not have my own reference genome, I used the genome of a related species as the reference (different genus). However, for the Tophat2 and STAR mapping output (I used only unique mapping output), when using Picardtools (SortSam, AddOrReplaceReadGroups or MarkDuplicates), some errors occurred.

I am using Java 1.7.0_45 and Picard-tools-1.117 (or 1.77). The error messages are as follows:

(1) For STAR output, I successfully used AddOrReplaceReadGroups to add read groups and sort by coordinate. However, when using MarkDuplicates, an error occurred as follows (running log attached):

"Exception in thread "main" htsjdk.samtools.SAMException: Exception when processing alignment for BAM index HS2:410:C21E6ACXX:4:1109:16160:63599 2/2 99b aligned read."

I checked this read and it is a read pair and seems ok. I attached the paired-end reads information as follows:

HS2:410:C21E6ACXX:4:1109:16160:63599	163	1	536940431	255	9S90M	=	536940464	99	CAGAAAATGAACTGTTTGGAGAAAGATGCCTGCAGAAACCTACATAGCAGCAAGAAGATCCAGAATGAATTTTGGGGTACAATTGATTGAACTGAAGGG	BBBFFFFFFFFFFIFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFF<BBFFFFFFFFFBFFFFFFFFF	RG:Z:LANE11A	NH:i:1	HI:i:1	jI:B:i,-1	jM:B:c,-1	nM:i:10	AS:i:134
HS2:410:C21E6ACXX:4:1109:16160:63599	83	1	536940464	255	66M33S	=	536940431	-99	CATAGCAGCAAGAAGATCCAGAATGAATTTTGGGGTACAATTGATTGAACTGAAGGGGGCTGAACACAATTATTTTGAATGTAAACTCTTATGCCAAAG	FFFFFFFBFFFFFFFFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFIIIIIIIIIIIIIIIIIIFIFFFFFFFFFFBBB	RG:Z:LANE11A	NH:i:1	HI:i:1	jI:B:i,-1	jM:B:c,-1	nM:i:10	AS:i:134

Additionally, I used ValidateSamFile tool to check my sam file, and found as follows:

Error Type      Count
ERROR:MATE_NOT_FOUND    37
WARNING:MISSING_TAG_NM  48684343

Because only 37 reads is mate_not_found, I deleted these reads from my sam file. Then rerunning MarkDuplicates, the same error still occurred.

My scripts are as follows:

java -Xmx4G -jar /u/home/y/ybhu/picard-tools-1.117/AddOrReplaceReadGroups.jar INPUT=${path_input}/${samp}_uniquelymapped.sam OUTPUT=${path_input}/${samp}_rg.sam SORT_ORDER=coordinate RGID=LANE11A RGPL=ILLUMINA RGLB=LIB1A RGPU=LANE1 RGSM=1A

java -Xmx4G -jar /u/home/y/ybhu/picard-tools-1.117/MarkDuplicates.jar I=${path_input}/${samp}_rg.bam O=${path_input}/${samp}_dedup.bam METRICS_FILE=${path_input}/${samp}_dedupmetrics.txt REMOVE_DUPLICATES=TRUE CREATE_INDEX=TRUE ASSUME_SORTED=TRUE VALIDATION_STRINGENCY=SILENT



(2) For Tophat2 output, I used different steps (first use SortSam to sort by coordinate, then AddOrReplaceReadGroups to add read groups). However, for these two steps, error messages also occurred (running log as attached).

For SortSam tool, 11449 reads were reported "Insert size out of range". Then I changed the argument "VALIDATION_STRINGENCY=STRICT" to "LENIENT", the SortSam can ignore these errors and finish sorting.

But when running AddOrReplaceReadGroups, a same error occurred (similar to STAR running error):

"Exception in thread "main" net.sf.samtools.SAMException: Exception when processing alignment for BAM index HS2:410:C21E6ACXX:4:2316:13634:18930 1/2 97b aligned read."

I pasted the read information here:

HS2:410:C21E6ACXX:4:2316:13634:18930	73	1	536905899	50	32M2D57M4I4M	*	0	0	GTAAATTTTTGTAAATGTTCAATGAGGTGCTGAAACATGTATGTTCTTTAGCTTTCCCATTTAGAAGACAACATAAATCTTAATTTTTCCACTAATT	B<BFFFFFFFFFFFF0<BFFIIIF0BFBFFFIIIIIBFIBFFFBFFFIFFBFFFFFFFFFFIIBFFIIIFIIIIIFIFFFFFFFFBFFFBBBFFFBF	AS:i:-126	XN:i:0	XM:i:20	XO:i:2	XG:i:6	NM:i:26	MD:Z:2C1G10A3T0G2A0T0A3A0A1^AA3T9G4A1A0G2T24T1A0A5G2	YT:Z:UU	NH:i:1.

My scripts are as follows:

java -Xmx3G -jar /u/local/apps/picard-tools/1.77/SortSam.jar INPUT=${path_input}/${samp}_uniquelymapped.sam OUTPUT=${path_input}/${samp}_unique_sorted.sam SORT_ORDER=coordinate CREATE_INDEX=TRUE VALIDATION_STRINGENCY=LENIENT

java -Xmx3G -jar /u/local/apps/picard-tools/1.77/AddOrReplaceReadGroups.jar INPUT=${path_input}/${samp}_unique_sorted.bam OUTPUT=${path_input}/${samp}_rg.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE RGID=LANE11A RGPL=ILLUMINA RGLB=LIB1A RGPU=LANE1 RGSM=1A VALIDATION_STRINGENCY=LENIENT

In summary, the errors for STAR and Tophat2 seem to be caused by creating BAM index for these reported reads. But I do no know why and how to resolve it.

Your comments are very appreciated, and thanks in advance.

Yibo

Department of Ecology and Evolutionary Biology,
University of California, Los Angeles



------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds


_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help