From: David N. <dav...@gm...> - 2012-04-12 13:13:14
|
Yes that's incorrect. Don't add the xxxTranscripts.fasta. All of the splice junctions are in the xxxSplices.fasta file. I'll cc Colin here to correct this in the Novocraft docs. See also http://useq.sourceforge.net/usageRNASeq.html Not sure about the chr1 vs 1 . Off the top of my head I don't think there should be a problem with USeq apps. But then again we haven't tested them. Most of the genome browsers will probably complain unless you register a synonyms table. Sounds like the ensembl browser wont though so maybe it isn't an issue. -cheers, D From: Jon Manning <Jon...@ed...> Date: Thu, 12 Apr 2012 14:04:45 +0100 To: David Nix <dav...@gm...> Cc: <use...@li...> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs Hi David, Thanks for the quick reply. Following the Novoalign folks' instructions the transcripts were indeed added to the index. Excerpt from their docs: novoindex Transcriptome.nix geneMaskedGenome.fasta refFlatRad45Num60kMin10Splices.fasta refFlatRad45Num60kMin10Transcripts.fasta Is that not the right thing to do? Should it just be the genome and the splices? I'm working primarily with Ensembl data so I'd like to keep my chromosomes 'sans chr' - unless of course the USeq apps require it? Thanks, Jon On 12/04/2012 12:45, David Nix wrote: > > Did you by chance add the transcripts to your genome index from the > MakeTranscriptome App? These take the form of > ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708... > > > > > That also could be the problem. -cheers, D > > > > > > > > > > > > > > > > > > > > > > > > > > Ahh, looks like you've joined your gene name using a : . Use an _ . The STP > uses the : to split the splice junction chromosome name into it's component > parts. A good junction should look like > > > > > ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513 > > > > > Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be > Rps3_ENSRNOT00000023935:1:156811472-156811541...... > > > > > As such STP isn't able to recognize the alignment as needing conversion to > genomic coordinates. > > > > > Also, it would be a good idea to rename your chromosomes to the standard UCSC > nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others switched > a couple years back. > > > > > Yes, all splice junction header lines are stripped from the SAM header, they > aren't needed after genomic coordinate conversion. > > > > > -cheers, D > > > > > From: Jon Manning <Jon...@ed...> > Date: Thu, 12 Apr 2012 10:18:32 +0100 > To: <use...@li...> > Subject: [Useq-users] Error with USeq SamTranscriptomeParser while > processing Novoalign RNA-seq outputs > > > > > > > Hello, > > I've been working through the Novoalign RNA-seq instructions > <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+and > +the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am > stuck at the last stage, where reads are converted back to genomic coordinates > with USeq SamTranscriptomeParser, and I'm hoping you may be able to help. > > When it gets to the 'Adding SAM header, sorting, and writing bam output with > Picard's SortSam...' stage I'm getting errors like: > > Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing > text SAM file. RNAME > 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-1 > 56812688_156814773-156814868_156815362-156815456_156815668-156815799_156816728 > -156816770' not found in any SQ record; Line 27 > Line: EBRI093151:81:FC:1:1:3202:1108 133 > Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-15 > 6812688_156814773-156814868_156815362-156815456_156815668-156815799_156816728- > 156816770 375 0 * = 375 0 > AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGCAG > B=#==A>ABCCBBAB############################################################### > PG:Z:novoalign ZS:Z:QC > > I've checked, and these lines ARE present in the input SAM file (made by > Novoalign), but not in the temporary SAM files I can see created by > SamTranscriptomeParser, so I suspect they may be lost somehow. > > I'm not sure how to go about debugging this myself, so all pointers > appreciated. > > Thanks, > > Jon Manning > > > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. Boundary is the first to > Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try > it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2___________________________________________ > ____ Useq-users mailing list > Use...@li...https://lists.sourceforge.net/lists/listinfo/u > seq-users -- Dr Jonathan Manning Bioinformatics Team Centre for Cardiovascular Science University of Edinburgh Queens Medical Research Institute 47 Little France Crescent Edinburgh EH16 4TJ United Kingdom T: +44 131 242 6700 F: +44 131 242 6782 E: jma...@st... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. |