|
From: David N. <dav...@gm...> - 2012-04-12 14:16:57
|
Yes, if you save it as a sam it bypasses Picard's SortSam and just writes
out the alignments. -cheers, D
From: Jon Manning <Jon...@ed...>
Date: Thu, 12 Apr 2012 15:15:10 +0100
To: David Nix <dav...@gm...>
Cc: <use...@li...>
Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while
processing Novoalign RNA-seq outputs
Thanks for the pointers- don't worrry, I will be re-running the alignment.
However, specifying '-s output.sam' did at least make things run without
error- Zayed indicated that the BAM conversion was the problem, due to the
'absence of a valid sequence dictionary'.
But things are much clearer now than they were this morning- thank you.
Jon
On 12/04/2012 14:49, David Nix wrote:
>
> Hmm. That error you are seeing is from Picard. STP calls SortSam internally.
> Looks like it is trying to write a short that is too big, possibly due to the
> huge chromosome name? Or too many chromosome names since these have not been
> converted to genomic space.
>
>
>
>
> Use of the -u option won't change much of anything except redirect the failed
> alignment to a file.
>
>
>
>
> The big problem is you're going to have transcript alignments intermingled
> with your genomic alignments and won't be able to map the former to the
> latter.
>
>
>
>
> I don't think you can use your partially converted sam file. Need to rebuild
> the novoindex and realign.
>
>
>
>
> -cheers, D
>
>
>
>
> From: Jon Manning <Jon...@ed...>
> Date: Thu, 12 Apr 2012 14:43:11 +0100
> To: David Nix <dav...@gm...>
> Cc: <use...@li...>
> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while
> processing Novoalign RNA-seq outputs
>
>
>
>
>
>
> Okay, that's good to know- thanks.
>
> In the meantime I tried a fix suggested by Zayed at Novocraft, namely to not
> use '-u' and thereby to exclude unmapped reads. Both this and using USeq 8.2.2
> (I was on 8.2.1) changed the error to:
>
> Exception in thread "main" java.lang.IllegalArgumentException: Value (70699)
> to large to be written as ushort.
> at net.sf.samtools.util.BinaryCodec.writeUShort(BinaryCodec.java:324)
> at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:114)
> at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:37)
> at
> net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:210)
> at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
> at
> net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:157)
> at net.sf.picard.sam.SortSam.doWork(SortSam.java:67)
> at
> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:
> 175)
> at edu.utah.seq.data.sam.PicardSortSam.<init>(PicardSortSam.java:81)
> at
> edu.utah.seq.parsers.SamTranscriptomeParser.addHeaderAndSort(SamTranscriptomeP
> arser.java:482)
> at
> edu.utah.seq.parsers.SamTranscriptomeParser.doWork(SamTranscriptomeParser.java
> :101)
> at
> edu.utah.seq.parsers.SamTranscriptomeParser.<init>(SamTranscriptomeParser.java
> :55)
> at
> edu.utah.seq.parsers.SamTranscriptomeParser.main(SamTranscriptomeParser.java:4
> 95)
>
> I realise I'm working with a bad SAM file from your point of view, but do you
> think this error is part of the same thing, or something new?
>
> Jon
>
>
> On 12/04/2012 14:12, David Nix wrote:
>>
>> Yes that's incorrect. Don't add the xxxTranscripts.fasta. All of the splice
>> junctions are in the xxxSplices.fasta file. I'll cc Colin here to correct
>> this in the Novocraft docs. See also
>> http://useq.sourceforge.net/usageRNASeq.html
>>
>>
>>
>>
>> Not sure about the chr1 vs 1 . Off the top of my head I don't think there
>> should be a problem with USeq apps. But then again we haven't tested them.
>> Most of the genome browsers will probably complain unless you register a
>> synonyms table. Sounds like the ensembl browser wont though so maybe it
>> isn't an issue.
>>
>>
>>
>>
>> -cheers, D
>>
>>
>>
>>
>> From: Jon Manning <Jon...@ed...>
>> Date: Thu, 12 Apr 2012 14:04:45 +0100
>> To: David Nix <dav...@gm...>
>> Cc: <use...@li...>
>> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while
>> processing Novoalign RNA-seq outputs
>>
>>
>>
>>
>>
>>
>> Hi David,
>>
>> Thanks for the quick reply. Following the Novoalign folks' instructions the
>> transcripts were indeed added to the index. Excerpt from their docs:
>>
>> novoindex Transcriptome.nix geneMaskedGenome.fasta
>> refFlatRad45Num60kMin10Splices.fasta
>> refFlatRad45Num60kMin10Transcripts.fasta
>> Is that not the right thing to do? Should it just be the genome and the
>> splices?
>>
>> I'm working primarily with Ensembl data so I'd like to keep my chromosomes
>> 'sans chr' - unless of course the USeq apps require it?
>>
>> Thanks,
>>
>> Jon
>>
>>
>>
>> On 12/04/2012 12:45, David Nix wrote:
>>>
>>> Did you by chance add the transcripts to your genome index from the
>>> MakeTranscriptome App? These take the form of
>>> ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708...
>>>
>>>
>>>
>>>
>>> That also could be the problem. -cheers, D
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Ahh, looks like you've joined your gene name using a : . Use an _ . The
>>> STP uses the : to split the splice junction chromosome name into it's
>>> component parts. A good junction should look like
>>>
>>>
>>>
>>>
>>> ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513
>>>
>>>
>>>
>>>
>>> Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be
>>> Rps3_ENSRNOT00000023935:1:156811472-156811541......
>>>
>>>
>>>
>>>
>>> As such STP isn't able to recognize the alignment as needing conversion to
>>> genomic coordinates.
>>>
>>>
>>>
>>>
>>> Also, it would be a good idea to rename your chromosomes to the standard
>>> UCSC nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others
>>> switched a couple years back.
>>>
>>>
>>>
>>>
>>> Yes, all splice junction header lines are stripped from the SAM header, they
>>> aren't needed after genomic coordinate conversion.
>>>
>>>
>>>
>>>
>>> -cheers, D
>>>
>>>
>>>
>>>
>>> From: Jon Manning <Jon...@ed...>
>>> Date: Thu, 12 Apr 2012 10:18:32 +0100
>>> To: <use...@li...>
>>> Subject: [Useq-users] Error with USeq SamTranscriptomeParser while
>>> processing Novoalign RNA-seq outputs
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hello,
>>>
>>> I've been working through the Novoalign RNA-seq instructions
>>> <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+a
>>> nd+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am
>>> stuck at the last stage, where reads are converted back to genomic
>>> coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able
>>> to help.
>>>
>>> When it gets to the 'Adding SAM header, sorting, and writing bam output
>>> with Picard's SortSam...' stage I'm getting errors like:
>>>
>>> Exception in thread "main" net.sf.samtools.SAMFormatException: Error
>>> parsing text SAM file. RNAME
>>> 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500
>>> -156812688_156814773-156814868_156815362-156815456_156815668-156815799_15681
>>> 6728-156816770' not found in any SQ record; Line 27
>>> Line: EBRI093151:81:FC:1:1:3202:1108 133
>>> Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-
>>> 156812688_156814773-156814868_156815362-156815456_156815668-156815799_156816
>>> 728-156816770 375 0 * = 375 0
>>> AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGC
>>> AG
>>> B=#==A>ABCCBBAB#############################################################
>>> ## PG:Z:novoalign ZS:Z:QC
>>>
>>> I've checked, and these lines ARE present in the input SAM file (made by
>>> Novoalign), but not in the temporary SAM files I can see created by
>>> SamTranscriptomeParser, so I suspect they may be lost somehow.
>>>
>>> I'm not sure how to go about debugging this myself, so all pointers
>>> appreciated.
>>>
>>> Thanks,
>>>
>>> Jon Manning
>>>
>>>
>>>
>>> The University of Edinburgh is a charitable body, registered in Scotland,
>>> with registration number SC005336.
>>> ----------------------------------------------------------------------------
>>> -- For Developers, A Lot Can Happen In A Second. Boundary is the first to
>>> Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try
>>> it FREE!
>>> http://p.sf.net/sfu/Boundary-d2dvs2_________________________________________
>>> ______ Useq-users mailing list
>>> Use...@li...://lists.sourceforge.net/lists/listinfo
>>> /useq-users
>>
>>
>> --
>> Dr Jonathan Manning
>> Bioinformatics Team
>> Centre for Cardiovascular Science
>> University of Edinburgh
>> Queens Medical Research Institute
>> 47 Little France Crescent
>> Edinburgh EH16 4TJ
>> United Kingdom
>> T: +44 131 242 6700
>> F: +44 131 242 6782
>> E: jma...@st...
>>
>>
>> The University of Edinburgh is a charitable body, registered in Scotland,
>> with registration number SC005336.
>
>
> --
> Dr Jonathan Manning
> Bioinformatics Team
> Centre for Cardiovascular Science
> University of Edinburgh
> Queens Medical Research Institute
> 47 Little France Crescent
> Edinburgh EH16 4TJ
> United Kingdom
> T: +44 131 242 6700
> F: +44 131 242 6782
> E: jma...@st...
>
>
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
--
Dr Jonathan Manning
Bioinformatics Team
Centre for Cardiovascular Science
University of Edinburgh
Queens Medical Research Institute
47 Little France Crescent
Edinburgh EH16 4TJ
United Kingdom
T: +44 131 242 6700
F: +44 131 242 6782
E: jma...@st...
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
|