Thread: [Transdecoder-users] ORFs without ATG as start

[Transdecoder-users] ORFs without ATG as start

From: Andre M. <And...@cr...> - 2014-05-14 18:57:04

Hi,

First of all thanks for providing a tool to predict ORFs.

I am surprised to see, that so many of my ORFs predicted with
transdecoder do not start with ATG (5prime_partial, 650 out of 1807).

To me it seems like a malfunction of the software.

Is there an easy way to trim the ORFs to the nearest start codon?

Thanks
André

-- 
André Minoche, Postdoc

Centre for Genomic Regulation (CRG)
Doctor Aiguader, 88, 4th floor
08003 Barcelona
        
http://seq.crg.es

Re: [Transdecoder-users] ORFs without ATG as start

From: Matthew M. <mat...@un...> - 2014-05-14 19:45:29

Andre,

Alternatively, this could be a metric of assembly quality (e.g.
http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract)
which may be a function of the input data. Can you tell us a little bit
about the type and quantity of data used in this assembly?

Matt

 __________________________________
*Matthew MacManes*, Ph.D.
University of New Hampshire  I  Assistant Professor
Department of Molecular, Cellular, & Biomedical Sciences
Durham, NH  03824
Phone: 603-862-4052  I  Twitter: @PeroMHC <https://twitter.com/PeroMHC>
Web: genomebio.org
Office: 189 Rudman Hall I Lab: 145 Rudman Hall

On Wed, May 14, 2014 at 2:44 PM, Andre Minoche <And...@cr...> wrote:

>
> Hi,
>
> First of all thanks for providing a tool to predict ORFs.
>
> I am surprised to see, that so many of my ORFs predicted with
> transdecoder do not start with ATG (5prime_partial, 650 out of 1807).
>
> To me it seems like a malfunction of the software.
>
> Is there an easy way to trim the ORFs to the nearest start codon?
>
> Thanks
> André
>
> --
> André Minoche, Postdoc
>
> Centre for Genomic Regulation (CRG)
> Doctor Aiguader, 88, 4th floor
> 08003 Barcelona
>
> http://seq.crg.es
>
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform
> available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Transdecoder-users mailing list
> Tra...@li...
> https://lists.sourceforge.net/lists/listinfo/transdecoder-users
>

Re: [Transdecoder-users] ORFs without ATG as start

From: Andre M. <And...@cr...> - 2014-05-14 21:57:24

Dear Matt,

Thank you for your quick reply.

The quality of the reference sequence from which I derived the cds is not an issue.

I have full length transcript cDNA sequences I aligned to a high quality genome assembly. From the alignment coordinates and using the genome assembly as source, I derived high quality mRNA sequences I used as input for transdecoder.

Deep RNA-seq data confirms the full length nature of my cDNA sequences. At the locations where my full length transcripts align I also have Augustus gene models supported by RNA-seq data. The ORFs of the predicted gene models generally correspond to the ORFs of transdecoder, with the exception, that in many cases transdecoder ORFs start upstream of the Augustus start codon and not with an ATG (5prime_partial, ORFs).

Best regards
André

On 14 May 2014, at 21:44, Matthew MacManes <mat...@un...<mailto:mat...@un...>> wrote:

Andre,

Alternatively, this could be a metric of assembly quality (e.g. http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract) which may be a function of the input data. Can you tell us a little bit about the type and quantity of data used in this assembly?

Matt

__________________________________
Matthew MacManes, Ph.D.
University of New Hampshire I Assistant Professor
Department of Molecular, Cellular, & Biomedical Sciences
Durham, NH 03824
Phone: 603-862-4052 I Twitter: @PeroMHC<https://twitter.com/PeroMHC>
Web: genomebio.org<http://genomebio.org/>
Office: 189 Rudman Hall I Lab: 145 Rudman Hall

On Wed, May 14, 2014 at 2:44 PM, Andre Minoche <And...@cr...<mailto:And...@cr...>> wrote:

Hi,

First of all thanks for providing a tool to predict ORFs.

I am surprised to see, that so many of my ORFs predicted with
transdecoder do not start with ATG (5prime_partial, 650 out of 1807).

To me it seems like a malfunction of the software.

Is there an easy way to trim the ORFs to the nearest start codon?

Thanks
André

--
André Minoche, Postdoc

Centre for Genomic Regulation (CRG)
Doctor Aiguader, 88, 4th floor
08003 Barcelona

http://seq.crg.es<http://seq.crg.es/>

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Transdecoder-users mailing list
Tra...@li...<mailto:Tra...@li...>
https://lists.sourceforge.net/lists/listinfo/transdecoder-users

Re: [Transdecoder-users] ORFs without ATG as start

From: Andre M. <And...@cr...> - 2014-05-14 21:57:22

Dear Matt,

Thank you for your quick reply.

The quality of the reference sequence from which I derived the cds is not an issue.

I have full length transcript cDNA sequences I aligned to a high quality genome assembly. From the alignment coordinates and using the genome assembly as source, I derived high quality mRNA sequences I used as input for transdecoder.

Deep RNA-seq data confirms the full length nature of my cDNA sequences. At the locations where my full length transcripts align I also have Augustus gene models supported by RNA-seq data. The ORFs of the predicted gene models generally correspond to the ORFs of transdecoder, with the exception, that in many cases transdecoder ORFs start upstream of the Augustus start codon and not with an ATG (5prime_partial, ORFs).

Best regards
André

On 14 May 2014, at 21:44, Matthew MacManes <mat...@un...<mailto:mat...@un...>> wrote:

Andre,

Alternatively, this could be a metric of assembly quality (e.g. http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract) which may be a function of the input data. Can you tell us a little bit about the type and quantity of data used in this assembly?

Matt

__________________________________
Matthew MacManes, Ph.D.
University of New Hampshire I Assistant Professor
Department of Molecular, Cellular, & Biomedical Sciences
Durham, NH 03824
Phone: 603-862-4052 I Twitter: @PeroMHC<https://twitter.com/PeroMHC>
Web: genomebio.org<http://genomebio.org/>
Office: 189 Rudman Hall I Lab: 145 Rudman Hall

On Wed, May 14, 2014 at 2:44 PM, Andre Minoche <And...@cr...<mailto:And...@cr...>> wrote:

Hi,

First of all thanks for providing a tool to predict ORFs.

I am surprised to see, that so many of my ORFs predicted with
transdecoder do not start with ATG (5prime_partial, 650 out of 1807).

To me it seems like a malfunction of the software.

Is there an easy way to trim the ORFs to the nearest start codon?

Thanks
André

--
André Minoche, Postdoc

Centre for Genomic Regulation (CRG)
Doctor Aiguader, 88, 4th floor
08003 Barcelona

http://seq.crg.es<http://seq.crg.es/>

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Transdecoder-users mailing list
Tra...@li...<mailto:Tra...@li...>
https://lists.sourceforge.net/lists/listinfo/transdecoder-users

Re: [Transdecoder-users] ORFs without ATG as start

From: Brian H. <bh...@br...> - 2014-05-14 22:05:13

Hi Andre,

Transdecoder doesn't have a start-codon finding function.  It finds the
longest ORFs, and in the case where there isn't an intervening in-frame
stop-codon between your optimal start codon and the 5' end of the sequence,
it's going to simply assume that the transcript is not full-length and give
you the full translation from the beginning of the transcript.

Usually, there's an in-frame stop codon in the 5' UTR that prevents most
full-length transcripts from being reported as 5' partials.  I suppose, if
you have a GC-rich target (rarer random stop codons), this could definitely
be more of an issue.

The high prevalence of 5' partials tends to be a general indicator of not
having full-length transcripts, resulting from the 3' coverage bias typical
of poly-A captured cDNA sequencing.  I'm sure there are occassions where a
genuine full-length transcript has an in-frame translatable 5' UTR, but in
my experience, these have generally been the rare exceptions.

cheers,

~brian


On Wed, May 14, 2014 at 5:57 PM, Andre Minoche <And...@cr...> wrote:

> Dear Matt,
>
> Thank you for your quick reply.
>
> The quality of the reference sequence from which I derived the cds is not
> an issue.
>
> I have full length transcript cDNA sequences I aligned to a high quality
> genome assembly.  From the alignment coordinates and using the genome
> assembly as source, I derived high quality mRNA sequences I used as input
> for transdecoder.
>
> Deep RNA-seq data confirms the full length nature of my cDNA sequences. At
> the locations where my full length transcripts align I also have Augustus
> gene models supported by RNA-seq data. The ORFs of the predicted gene
> models generally correspond to the ORFs of transdecoder, with the
> exception, that in many cases transdecoder ORFs start upstream of the
> Augustus start codon and not with an ATG (5prime_partial, ORFs).
>
> Best regards
> André
>
> On 14 May 2014, at 21:44, Matthew MacManes <mat...@un...>
> wrote:
>
> Andre,
>
> Alternatively, this could be a metric of assembly quality (e.g.
> http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract)
> which may be a function of the input data. Can you tell us a little bit
> about the type and quantity of data used in this assembly?
>
> Matt
>
> __________________________________
> *Matthew MacManes*, Ph.D.
> University of New Hampshire  I  Assistant Professor
> Department of Molecular, Cellular, & Biomedical Sciences
> Durham, NH  03824
> Phone: 603-862-4052  I  Twitter: @PeroMHC <https://twitter.com/PeroMHC>
> Web: genomebio.org
> Office: 189 Rudman Hall I Lab: 145 Rudman Hall
>
>
> On Wed, May 14, 2014 at 2:44 PM, Andre Minoche <And...@cr...>wrote:
>
>>
>> Hi,
>>
>> First of all thanks for providing a tool to predict ORFs.
>>
>> I am surprised to see, that so many of my ORFs predicted with
>> transdecoder do not start with ATG (5prime_partial, 650 out of 1807).
>>
>> To me it seems like a malfunction of the software.
>>
>> Is there an easy way to trim the ORFs to the nearest start codon?
>>
>> Thanks
>> André
>>
>> --
>> André Minoche, Postdoc
>>
>> Centre for Genomic Regulation (CRG)
>> Doctor Aiguader, 88, 4th floor
>> 08003 Barcelona
>>
>> http://seq.crg.es
>>
>>
>>
>> ------------------------------------------------------------------------------
>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>> Instantly run your Selenium tests across 300+ browser/OS combos.
>> Get unparalleled scalability from the best Selenium testing platform
>> available
>> Simple to use. Nothing to install. Get started now for free."
>> http://p.sf.net/sfu/SauceLabs
>> _______________________________________________
>> Transdecoder-users mailing list
>> Tra...@li...
>> https://lists.sourceforge.net/lists/listinfo/transdecoder-users
>>
>
>
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform
> available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Transdecoder-users mailing list
> Tra...@li...
> https://lists.sourceforge.net/lists/listinfo/transdecoder-users
>
>


-- 
--
Brian J. Haas
The Broad Institute
http://broad.mit.edu/~bhaas

Thread: [Transdecoder-users] ORFs without ATG as start

Extracting likely coding regions from transcript sequences

transdecoder-users