From: David N. <dav...@gm...> - 2012-04-12 13:49:52
|
Hmm. That error you are seeing is from Picard. STP calls SortSam internally. Looks like it is trying to write a short that is too big, possibly due to the huge chromosome name? Or too many chromosome names since these have not been converted to genomic space. Use of the -u option won't change much of anything except redirect the failed alignment to a file. The big problem is you're going to have transcript alignments intermingled with your genomic alignments and won't be able to map the former to the latter. I don't think you can use your partially converted sam file. Need to rebuild the novoindex and realign. -cheers, D From: Jon Manning <Jon...@ed...> Date: Thu, 12 Apr 2012 14:43:11 +0100 To: David Nix <dav...@gm...> Cc: <use...@li...> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs Okay, that's good to know- thanks. In the meantime I tried a fix suggested by Zayed at Novocraft, namely to not use '-u' and thereby to exclude unmapped reads. Both this and using USeq 8.2.2 (I was on 8.2.1) changed the error to: Exception in thread "main" java.lang.IllegalArgumentException: Value (70699) to large to be written as ushort. at net.sf.samtools.util.BinaryCodec.writeUShort(BinaryCodec.java:324) at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:114) at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:37) at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:21 0) at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150) at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:157) at net.sf.picard.sam.SortSam.doWork(SortSam.java:67) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.jav a:175) at edu.utah.seq.data.sam.PicardSortSam.<init>(PicardSortSam.java:81) at edu.utah.seq.parsers.SamTranscriptomeParser.addHeaderAndSort(SamTranscriptom eParser.java:482) at edu.utah.seq.parsers.SamTranscriptomeParser.doWork(SamTranscriptomeParser.ja va:101) at edu.utah.seq.parsers.SamTranscriptomeParser.<init>(SamTranscriptomeParser.ja va:55) at edu.utah.seq.parsers.SamTranscriptomeParser.main(SamTranscriptomeParser.java :495) I realise I'm working with a bad SAM file from your point of view, but do you think this error is part of the same thing, or something new? Jon On 12/04/2012 14:12, David Nix wrote: > > Yes that's incorrect. Don't add the xxxTranscripts.fasta. All of the splice > junctions are in the xxxSplices.fasta file. I'll cc Colin here to correct > this in the Novocraft docs. See also > http://useq.sourceforge.net/usageRNASeq.html > > > > > Not sure about the chr1 vs 1 . Off the top of my head I don't think there > should be a problem with USeq apps. But then again we haven't tested them. > Most of the genome browsers will probably complain unless you register a > synonyms table. Sounds like the ensembl browser wont though so maybe it isn't > an issue. > > > > > -cheers, D > > > > > From: Jon Manning <Jon...@ed...> > Date: Thu, 12 Apr 2012 14:04:45 +0100 > To: David Nix <dav...@gm...> > Cc: <use...@li...> > Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while > processing Novoalign RNA-seq outputs > > > > > > > Hi David, > > Thanks for the quick reply. Following the Novoalign folks' instructions the > transcripts were indeed added to the index. Excerpt from their docs: > > novoindex Transcriptome.nix geneMaskedGenome.fasta > refFlatRad45Num60kMin10Splices.fasta refFlatRad45Num60kMin10Transcripts.fasta > Is that not the right thing to do? Should it just be the genome and the > splices? > > I'm working primarily with Ensembl data so I'd like to keep my chromosomes > 'sans chr' - unless of course the USeq apps require it? > > Thanks, > > Jon > > > > On 12/04/2012 12:45, David Nix wrote: >> >> Did you by chance add the transcripts to your genome index from the >> MakeTranscriptome App? These take the form of >> ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708... >> >> >> >> >> That also could be the problem. -cheers, D >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Ahh, looks like you've joined your gene name using a : . Use an _ . The STP >> uses the : to split the splice junction chromosome name into it's component >> parts. A good junction should look like >> >> >> >> >> ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513 >> >> >> >> >> Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be >> Rps3_ENSRNOT00000023935:1:156811472-156811541...... >> >> >> >> >> As such STP isn't able to recognize the alignment as needing conversion to >> genomic coordinates. >> >> >> >> >> Also, it would be a good idea to rename your chromosomes to the standard UCSC >> nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others switched >> a couple years back. >> >> >> >> >> Yes, all splice junction header lines are stripped from the SAM header, they >> aren't needed after genomic coordinate conversion. >> >> >> >> >> -cheers, D >> >> >> >> >> From: Jon Manning <Jon...@ed...> >> Date: Thu, 12 Apr 2012 10:18:32 +0100 >> To: <use...@li...> >> Subject: [Useq-users] Error with USeq SamTranscriptomeParser while >> processing Novoalign RNA-seq outputs >> >> >> >> >> >> >> Hello, >> >> I've been working through the Novoalign RNA-seq instructions >> <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+an >> d+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am >> stuck at the last stage, where reads are converted back to genomic >> coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able >> to help. >> >> When it gets to the 'Adding SAM header, sorting, and writing bam output with >> Picard's SortSam...' stage I'm getting errors like: >> >> Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing >> text SAM file. RNAME >> 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500- >> 156812688_156814773-156814868_156815362-156815456_156815668-156815799_1568167 >> 28-156816770' not found in any SQ record; Line 27 >> Line: EBRI093151:81:FC:1:1:3202:1108 133 >> Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-1 >> 56812688_156814773-156814868_156815362-156815456_156815668-156815799_15681672 >> 8-156816770 375 0 * = 375 0 >> AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGCA >> G >> B=#==A>ABCCBBAB############################################################## >> # PG:Z:novoalign ZS:Z:QC >> >> I've checked, and these lines ARE present in the input SAM file (made by >> Novoalign), but not in the temporary SAM files I can see created by >> SamTranscriptomeParser, so I suspect they may be lost somehow. >> >> I'm not sure how to go about debugging this myself, so all pointers >> appreciated. >> >> Thanks, >> >> Jon Manning >> >> >> >> The University of Edinburgh is a charitable body, registered in Scotland, >> with registration number SC005336. >> ----------------------------------------------------------------------------- >> - For Developers, A Lot Can Happen In A Second. Boundary is the first to >> Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try >> it FREE! >> http://p.sf.net/sfu/Boundary-d2dvs2__________________________________________ >> _____ Useq-users mailing list >> Use...@li...https://lists.sourceforge.net/lists/listinfo/ >> useq-users > > > -- > Dr Jonathan Manning > Bioinformatics Team > Centre for Cardiovascular Science > University of Edinburgh > Queens Medical Research Institute > 47 Little France Crescent > Edinburgh EH16 4TJ > United Kingdom > T: +44 131 242 6700 > F: +44 131 242 6782 > E: jma...@st... > > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. -- Dr Jonathan Manning Bioinformatics Team Centre for Cardiovascular Science University of Edinburgh Queens Medical Research Institute 47 Little France Crescent Edinburgh EH16 4TJ United Kingdom T: +44 131 242 6700 F: +44 131 242 6782 E: jma...@st... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. |