From: David N. <dav...@gm...> - 2012-04-12 14:16:57
|
Yes, if you save it as a sam it bypasses Picard's SortSam and just writes out the alignments. -cheers, D From: Jon Manning <Jon...@ed...> Date: Thu, 12 Apr 2012 15:15:10 +0100 To: David Nix <dav...@gm...> Cc: <use...@li...> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs Thanks for the pointers- don't worrry, I will be re-running the alignment. However, specifying '-s output.sam' did at least make things run without error- Zayed indicated that the BAM conversion was the problem, due to the 'absence of a valid sequence dictionary'. But things are much clearer now than they were this morning- thank you. Jon On 12/04/2012 14:49, David Nix wrote: > > Hmm. That error you are seeing is from Picard. STP calls SortSam internally. > Looks like it is trying to write a short that is too big, possibly due to the > huge chromosome name? Or too many chromosome names since these have not been > converted to genomic space. > > > > > Use of the -u option won't change much of anything except redirect the failed > alignment to a file. > > > > > The big problem is you're going to have transcript alignments intermingled > with your genomic alignments and won't be able to map the former to the > latter. > > > > > I don't think you can use your partially converted sam file. Need to rebuild > the novoindex and realign. > > > > > -cheers, D > > > > > From: Jon Manning <Jon...@ed...> > Date: Thu, 12 Apr 2012 14:43:11 +0100 > To: David Nix <dav...@gm...> > Cc: <use...@li...> > Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while > processing Novoalign RNA-seq outputs > > > > > > > Okay, that's good to know- thanks. > > In the meantime I tried a fix suggested by Zayed at Novocraft, namely to not > use '-u' and thereby to exclude unmapped reads. Both this and using USeq 8.2.2 > (I was on 8.2.1) changed the error to: > > Exception in thread "main" java.lang.IllegalArgumentException: Value (70699) > to large to be written as ushort. > at net.sf.samtools.util.BinaryCodec.writeUShort(BinaryCodec.java:324) > at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:114) > at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:37) > at > net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:210) > at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150) > at > net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:157) > at net.sf.picard.sam.SortSam.doWork(SortSam.java:67) > at > net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java: > 175) > at edu.utah.seq.data.sam.PicardSortSam.<init>(PicardSortSam.java:81) > at > edu.utah.seq.parsers.SamTranscriptomeParser.addHeaderAndSort(SamTranscriptomeP > arser.java:482) > at > edu.utah.seq.parsers.SamTranscriptomeParser.doWork(SamTranscriptomeParser.java > :101) > at > edu.utah.seq.parsers.SamTranscriptomeParser.<init>(SamTranscriptomeParser.java > :55) > at > edu.utah.seq.parsers.SamTranscriptomeParser.main(SamTranscriptomeParser.java:4 > 95) > > I realise I'm working with a bad SAM file from your point of view, but do you > think this error is part of the same thing, or something new? > > Jon > > > On 12/04/2012 14:12, David Nix wrote: >> >> Yes that's incorrect. Don't add the xxxTranscripts.fasta. All of the splice >> junctions are in the xxxSplices.fasta file. I'll cc Colin here to correct >> this in the Novocraft docs. See also >> http://useq.sourceforge.net/usageRNASeq.html >> >> >> >> >> Not sure about the chr1 vs 1 . Off the top of my head I don't think there >> should be a problem with USeq apps. But then again we haven't tested them. >> Most of the genome browsers will probably complain unless you register a >> synonyms table. Sounds like the ensembl browser wont though so maybe it >> isn't an issue. >> >> >> >> >> -cheers, D >> >> >> >> >> From: Jon Manning <Jon...@ed...> >> Date: Thu, 12 Apr 2012 14:04:45 +0100 >> To: David Nix <dav...@gm...> >> Cc: <use...@li...> >> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while >> processing Novoalign RNA-seq outputs >> >> >> >> >> >> >> Hi David, >> >> Thanks for the quick reply. Following the Novoalign folks' instructions the >> transcripts were indeed added to the index. Excerpt from their docs: >> >> novoindex Transcriptome.nix geneMaskedGenome.fasta >> refFlatRad45Num60kMin10Splices.fasta >> refFlatRad45Num60kMin10Transcripts.fasta >> Is that not the right thing to do? Should it just be the genome and the >> splices? >> >> I'm working primarily with Ensembl data so I'd like to keep my chromosomes >> 'sans chr' - unless of course the USeq apps require it? >> >> Thanks, >> >> Jon >> >> >> >> On 12/04/2012 12:45, David Nix wrote: >>> >>> Did you by chance add the transcripts to your genome index from the >>> MakeTranscriptome App? These take the form of >>> ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708... >>> >>> >>> >>> >>> That also could be the problem. -cheers, D >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Ahh, looks like you've joined your gene name using a : . Use an _ . The >>> STP uses the : to split the splice junction chromosome name into it's >>> component parts. A good junction should look like >>> >>> >>> >>> >>> ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513 >>> >>> >>> >>> >>> Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be >>> Rps3_ENSRNOT00000023935:1:156811472-156811541...... >>> >>> >>> >>> >>> As such STP isn't able to recognize the alignment as needing conversion to >>> genomic coordinates. >>> >>> >>> >>> >>> Also, it would be a good idea to rename your chromosomes to the standard >>> UCSC nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others >>> switched a couple years back. >>> >>> >>> >>> >>> Yes, all splice junction header lines are stripped from the SAM header, they >>> aren't needed after genomic coordinate conversion. >>> >>> >>> >>> >>> -cheers, D >>> >>> >>> >>> >>> From: Jon Manning <Jon...@ed...> >>> Date: Thu, 12 Apr 2012 10:18:32 +0100 >>> To: <use...@li...> >>> Subject: [Useq-users] Error with USeq SamTranscriptomeParser while >>> processing Novoalign RNA-seq outputs >>> >>> >>> >>> >>> >>> >>> Hello, >>> >>> I've been working through the Novoalign RNA-seq instructions >>> <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+a >>> nd+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am >>> stuck at the last stage, where reads are converted back to genomic >>> coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able >>> to help. >>> >>> When it gets to the 'Adding SAM header, sorting, and writing bam output >>> with Picard's SortSam...' stage I'm getting errors like: >>> >>> Exception in thread "main" net.sf.samtools.SAMFormatException: Error >>> parsing text SAM file. RNAME >>> 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500 >>> -156812688_156814773-156814868_156815362-156815456_156815668-156815799_15681 >>> 6728-156816770' not found in any SQ record; Line 27 >>> Line: EBRI093151:81:FC:1:1:3202:1108 133 >>> Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500- >>> 156812688_156814773-156814868_156815362-156815456_156815668-156815799_156816 >>> 728-156816770 375 0 * = 375 0 >>> AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGC >>> AG >>> B=#==A>ABCCBBAB############################################################# >>> ## PG:Z:novoalign ZS:Z:QC >>> >>> I've checked, and these lines ARE present in the input SAM file (made by >>> Novoalign), but not in the temporary SAM files I can see created by >>> SamTranscriptomeParser, so I suspect they may be lost somehow. >>> >>> I'm not sure how to go about debugging this myself, so all pointers >>> appreciated. >>> >>> Thanks, >>> >>> Jon Manning >>> >>> >>> >>> The University of Edinburgh is a charitable body, registered in Scotland, >>> with registration number SC005336. >>> ---------------------------------------------------------------------------- >>> -- For Developers, A Lot Can Happen In A Second. Boundary is the first to >>> Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try >>> it FREE! >>> http://p.sf.net/sfu/Boundary-d2dvs2_________________________________________ >>> ______ Useq-users mailing list >>> Use...@li...https://lists.sourceforge.net/lists/listinfo >>> /useq-users >> >> >> -- >> Dr Jonathan Manning >> Bioinformatics Team >> Centre for Cardiovascular Science >> University of Edinburgh >> Queens Medical Research Institute >> 47 Little France Crescent >> Edinburgh EH16 4TJ >> United Kingdom >> T: +44 131 242 6700 >> F: +44 131 242 6782 >> E: jma...@st... >> >> >> The University of Edinburgh is a charitable body, registered in Scotland, >> with registration number SC005336. > > > -- > Dr Jonathan Manning > Bioinformatics Team > Centre for Cardiovascular Science > University of Edinburgh > Queens Medical Research Institute > 47 Little France Crescent > Edinburgh EH16 4TJ > United Kingdom > T: +44 131 242 6700 > F: +44 131 242 6782 > E: jma...@st... > > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. -- Dr Jonathan Manning Bioinformatics Team Centre for Cardiovascular Science University of Edinburgh Queens Medical Research Institute 47 Little France Crescent Edinburgh EH16 4TJ United Kingdom T: +44 131 242 6700 F: +44 131 242 6782 E: jma...@st... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. |