|
From: David N. <dav...@gm...> - 2012-04-12 13:49:52
|
Hmm. That error you are seeing is from Picard. STP calls SortSam
internally. Looks like it is trying to write a short that is too big,
possibly due to the huge chromosome name? Or too many chromosome names since
these have not been converted to genomic space.
Use of the -u option won't change much of anything except redirect the
failed alignment to a file.
The big problem is you're going to have transcript alignments intermingled
with your genomic alignments and won't be able to map the former to the
latter.
I don't think you can use your partially converted sam file. Need to
rebuild the novoindex and realign.
-cheers, D
From: Jon Manning <Jon...@ed...>
Date: Thu, 12 Apr 2012 14:43:11 +0100
To: David Nix <dav...@gm...>
Cc: <use...@li...>
Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while
processing Novoalign RNA-seq outputs
Okay, that's good to know- thanks.
In the meantime I tried a fix suggested by Zayed at Novocraft, namely to
not use '-u' and thereby to exclude unmapped reads. Both this and using USeq
8.2.2 (I was on 8.2.1) changed the error to:
Exception in thread "main" java.lang.IllegalArgumentException: Value
(70699) to large to be written as ushort.
at net.sf.samtools.util.BinaryCodec.writeUShort(BinaryCodec.java:324)
at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:114)
at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:37)
at
net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:21
0)
at
net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
at
net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:157)
at net.sf.picard.sam.SortSam.doWork(SortSam.java:67)
at
net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.jav
a:175)
at edu.utah.seq.data.sam.PicardSortSam.<init>(PicardSortSam.java:81)
at
edu.utah.seq.parsers.SamTranscriptomeParser.addHeaderAndSort(SamTranscriptom
eParser.java:482)
at
edu.utah.seq.parsers.SamTranscriptomeParser.doWork(SamTranscriptomeParser.ja
va:101)
at
edu.utah.seq.parsers.SamTranscriptomeParser.<init>(SamTranscriptomeParser.ja
va:55)
at
edu.utah.seq.parsers.SamTranscriptomeParser.main(SamTranscriptomeParser.java
:495)
I realise I'm working with a bad SAM file from your point of view, but do
you think this error is part of the same thing, or something new?
Jon
On 12/04/2012 14:12, David Nix wrote:
>
> Yes that's incorrect. Don't add the xxxTranscripts.fasta. All of the splice
> junctions are in the xxxSplices.fasta file. I'll cc Colin here to correct
> this in the Novocraft docs. See also
> http://useq.sourceforge.net/usageRNASeq.html
>
>
>
>
> Not sure about the chr1 vs 1 . Off the top of my head I don't think there
> should be a problem with USeq apps. But then again we haven't tested them.
> Most of the genome browsers will probably complain unless you register a
> synonyms table. Sounds like the ensembl browser wont though so maybe it isn't
> an issue.
>
>
>
>
> -cheers, D
>
>
>
>
> From: Jon Manning <Jon...@ed...>
> Date: Thu, 12 Apr 2012 14:04:45 +0100
> To: David Nix <dav...@gm...>
> Cc: <use...@li...>
> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while
> processing Novoalign RNA-seq outputs
>
>
>
>
>
>
> Hi David,
>
> Thanks for the quick reply. Following the Novoalign folks' instructions the
> transcripts were indeed added to the index. Excerpt from their docs:
>
> novoindex Transcriptome.nix geneMaskedGenome.fasta
> refFlatRad45Num60kMin10Splices.fasta refFlatRad45Num60kMin10Transcripts.fasta
> Is that not the right thing to do? Should it just be the genome and the
> splices?
>
> I'm working primarily with Ensembl data so I'd like to keep my chromosomes
> 'sans chr' - unless of course the USeq apps require it?
>
> Thanks,
>
> Jon
>
>
>
> On 12/04/2012 12:45, David Nix wrote:
>>
>> Did you by chance add the transcripts to your genome index from the
>> MakeTranscriptome App? These take the form of
>> ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708...
>>
>>
>>
>>
>> That also could be the problem. -cheers, D
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Ahh, looks like you've joined your gene name using a : . Use an _ . The STP
>> uses the : to split the splice junction chromosome name into it's component
>> parts. A good junction should look like
>>
>>
>>
>>
>> ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513
>>
>>
>>
>>
>> Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be
>> Rps3_ENSRNOT00000023935:1:156811472-156811541......
>>
>>
>>
>>
>> As such STP isn't able to recognize the alignment as needing conversion to
>> genomic coordinates.
>>
>>
>>
>>
>> Also, it would be a good idea to rename your chromosomes to the standard UCSC
>> nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others switched
>> a couple years back.
>>
>>
>>
>>
>> Yes, all splice junction header lines are stripped from the SAM header, they
>> aren't needed after genomic coordinate conversion.
>>
>>
>>
>>
>> -cheers, D
>>
>>
>>
>>
>> From: Jon Manning <Jon...@ed...>
>> Date: Thu, 12 Apr 2012 10:18:32 +0100
>> To: <use...@li...>
>> Subject: [Useq-users] Error with USeq SamTranscriptomeParser while
>> processing Novoalign RNA-seq outputs
>>
>>
>>
>>
>>
>>
>> Hello,
>>
>> I've been working through the Novoalign RNA-seq instructions
>> <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+an
>> d+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am
>> stuck at the last stage, where reads are converted back to genomic
>> coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able
>> to help.
>>
>> When it gets to the 'Adding SAM header, sorting, and writing bam output with
>> Picard's SortSam...' stage I'm getting errors like:
>>
>> Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing
>> text SAM file. RNAME
>> 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-
>> 156812688_156814773-156814868_156815362-156815456_156815668-156815799_1568167
>> 28-156816770' not found in any SQ record; Line 27
>> Line: EBRI093151:81:FC:1:1:3202:1108 133
>> Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-1
>> 56812688_156814773-156814868_156815362-156815456_156815668-156815799_15681672
>> 8-156816770 375 0 * = 375 0
>> AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGCA
>> G
>> B=#==A>ABCCBBAB##############################################################
>> # PG:Z:novoalign ZS:Z:QC
>>
>> I've checked, and these lines ARE present in the input SAM file (made by
>> Novoalign), but not in the temporary SAM files I can see created by
>> SamTranscriptomeParser, so I suspect they may be lost somehow.
>>
>> I'm not sure how to go about debugging this myself, so all pointers
>> appreciated.
>>
>> Thanks,
>>
>> Jon Manning
>>
>>
>>
>> The University of Edinburgh is a charitable body, registered in Scotland,
>> with registration number SC005336.
>> -----------------------------------------------------------------------------
>> - For Developers, A Lot Can Happen In A Second. Boundary is the first to
>> Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try
>> it FREE!
>> http://p.sf.net/sfu/Boundary-d2dvs2__________________________________________
>> _____ Useq-users mailing list
>> Use...@li...://lists.sourceforge.net/lists/listinfo/
>> useq-users
>
>
> --
> Dr Jonathan Manning
> Bioinformatics Team
> Centre for Cardiovascular Science
> University of Edinburgh
> Queens Medical Research Institute
> 47 Little France Crescent
> Edinburgh EH16 4TJ
> United Kingdom
> T: +44 131 242 6700
> F: +44 131 242 6782
> E: jma...@st...
>
>
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
--
Dr Jonathan Manning
Bioinformatics Team
Centre for Cardiovascular Science
University of Edinburgh
Queens Medical Research Institute
47 Little France Crescent
Edinburgh EH16 4TJ
United Kingdom
T: +44 131 242 6700
F: +44 131 242 6782
E: jma...@st...
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
|