When I search for the definition of "Transition/transversion ratio" on the internet, I dont get a clear answer. What exactly is this?
Then, the "Nonsynonmous/synonymous ratio":
why are there 4 values for this? Is one for the codon and the others for the three positions within the codon?
Therefore what is the DN/DS value for this example?
Sorry if this question is stupid.
Thank you in advance,
Peter Thorpe
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I see you are confused with some concepts. Can you tell me what do you use codonphyml for? Just to understand the context of your doubts.
Idenpendent of that, I will write a very general and basic answer below, since I am assuming you are not familiar with some ideas behind Codonphyml.
Codonphyml is a tool that reconstruct phylogenetic trees from DNA/CODON/AMINO ACID sequences. We do so by calculating the most likely ancestor of two given sequences. FOr example, you want to see the tree of mammals. You give Codonphyml the DNA sequence of each known mammal, and Codonphyml will return a tree that represents the evolutionary relationship between them, that means, who share common ancestors.
To calculate the ancestor nodes in a tree, we get as starting point the given DNA/CODON sequences, which are made up of the nucleotides C,T , A, G, the DNA symbols. Moreover, a CODON is made up of 3 consecutive DNA symbols, which can also be translated into AMINO ACIDS. OK.
For example:
DOG
Symbol A C T T T C G A C
Position 1 2 3 4 5 6 7 8 9
FOX A C C G T C G A C
Position 1 2 3 4 5 6 7 8 9
COW A C G T A C G A C
Position 1 2 3 4 5 6 7 8 9
So how the tree will look like? Or who share common ancestors.
Possible trees will be the ones below, which is the most likely?
To answer that, Codonphyml calculates the probability of mutations at the letters in given positions of the sequences. For example, in position 3, DOG has a T, FOX has a C and COW has a G.
1- Transition/Transversion:
When a A mutates into a G or a G mutates into a A, or T mutates into a C or C mutates into a T we call that a TRANSITION.
If A mutates into T or C, or T mutates into A or G, or C mutates into A or G, or G mutates into C or T we call that TRANSVERSION.
Thus, the transition transversion ratio just represents wheters TRANSTIONS are more likely than TRANSVERSIONS (ratio >1 ) or the other way (ratio <1).
2- Nonsynonmous/synonymous
When we look at CODONS, we are grouping DNA symbols by 3 ... for example, DOG CODON version would be
DOG
Symbol ACT TTC GAC
Position 1 2 3
Codons are translated into AMINO ACIDS, the building blocks of proteins. Actually there are more CODONS than AMINO ACID, if you just count the possibilities it will be 64 possible codons (444), while there are only 20 AMINO ACIDS.
Thus, different CODONS will be translated into the same AMINO ACID.
When a mutation occurs that mutates a CODON, without changing the AMINO ACID, this is a SYNONYMOUS mutation.
If the mutation changes the resulting AMINO ACID, it is NONSYNONYMOUS.
So we have another ratio.
In your example, you have 4 ratios because you use a probabilistic model that tries to consider that, along the DNA sequence, different places will behave in different ways with respect to the NONSYNONYMOUS/SYNONYMOUS ratio.
Please, if you have any further doubt, let me know.
Best
Marcelo
Last edit: Marcelo S Zanetti 2015-01-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much for taking the time to discuss this will me!!!! Much appreciated.
I an interested in pathogen-secreted proteins which are "put/secreted" into the plant in order to gain a parasitic advantage. We assume these are under selection pressure and these may be recognised by the plant defense systems leading to a defence response. In evolutionary terms, this version of the pathogen gene that is recognised by the plant is "punished".
Therefore, my candidates gene that I have identified may have a greater DN/DS ratio than 1, if the above statement is true.
I have identified orthologues of my genes of interested by reciprocal best blast hit analyses of 6 species, aligned the protein sequence and backtranslated the DNA sequences to this. Then, I subjected the aligned DNA seq of the homologues to CodonPhyml analysis in order to determine if these sequences are under selection pressure. (I have around 300 alignment files of different candidates/clusters).
my command was:
filenames=*.phy
for f in ${filenames}
do
echo "Running codonphyml ${f}"
cmd="codonphyml -i ${f} -m GY --fmodel F3X4 -t e -f empirical -w g -a e"
echo ${cmd}
eval ${cmd}
done
Is the a better suited probabilistic model I can use for my problem than what I have put above to get one DN/DS value?
Also, is CodoPhyml able to return the codons that it thinks are under selection pressure, if they are?
I hope this makes sense,
Thanks,
Peter Thorpe
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Peter, sourceforge was offline so I could not reply before.
So you are using the gamma model for omega (-w g). This means you have ,by default, 4 different values of omega. Each different omega value will be attributed to a different group of codon sites, but currently codonphyml does not return which codon sites have a give omega value. But this is a feature that some of our colleagues have been working on. I can implement it as well when I have some spare time.
So if one of the omega values is significatively more than 1, then you know that some sites are under selective pressure, currently we cannot tell you which ones because we need to implement that.
o0, o1, o2, o3 are the different omega values I told you about. You can have more if you need, but the code runs slower. p0...p3 represents the proportion of all codon sites that have the respective omega value. But when you are using the gamma model, this probabilities are all the same. More details you could get by using the discrete prob. model. I think it is the -w d option.
For example, your result:
Omega / Probability: o0=0.00010000 p0=0.25000000
Omega / Probability: o1=0.00014905 p1=0.25000000
Omega / Probability: o2=0.02008926 p2=0.25000000
Omega / Probability: o3=1.14867031 p3=0.25000000
You results show that it is likely that some sites (o3 value) are under selective pressure. Is 1.1486 siginificant value?
Let me know if you have further problems. Will try to show the codon sites in codonphyml asap.
Best
Last edit: Marcelo S Zanetti 2015-02-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You said above "... In your example, you have 4 ratios because you use a probabilistic model that tries to consider that, along the DNA sequence, different places will behave in different ways with respect to the NONSYNONYMOUS/SYNONYMOUS ratio...."
How would you interpret the values above, these values are so far apart from each other?
what does o0= represent?
what does o1= represent?
what does o2= represent?
what does o3= represent?
what does o4= represent?
Thanks,
Pete
Last edit: Peter Thorpe 2015-01-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry your reply get nested in the conversation above (i didnt see it) ... I have read it now. Yes this does make sense. This was what I have thought the result would mean.
I would be very keen for CodonPhyml to return the sites under selection (as you mentioned - this would be a great addition to the package). I think that would be amazing!! I know Codeml does this.
I have some good results ;)
Thank you very much for your time Marcelo - you have helped a lot.
Cheers,
Peter Thorpe
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
While I am not read with codonphyml, you can still use it to generate the tree, than you can use codeml, with that tree as input, together with your msa, to get the codon sites under selective pressure, running a similar model. So you don't need to wait.
Best
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Users - please help!!,
I have run CodonPhyml on my DNA/aligned orthologous clusters (RBBH output) and got the .stats and .tree file.
I am interested in the DN/DS ratio. However, I am having trouble "exactly" interpreting what is in the file.
I have:
.....................................................................
. Transition/transversion ratio: 2.05408
. Nonsynonmous/synonymous ratio:
Omega / Probability: o0=0.00010000 p0=0.25000000
Omega / Probability: o1=0.00014905 p1=0.25000000
Omega / Probability: o2=0.02008926 p2=0.25000000
Omega / Probability: o3=1.14867031 p3=0.25000000
......................................................................
When I search for the definition of "Transition/transversion ratio" on the internet, I dont get a clear answer. What exactly is this?
Then, the "Nonsynonmous/synonymous ratio":
why are there 4 values for this? Is one for the codon and the others for the three positions within the codon?
Therefore what is the DN/DS value for this example?
Sorry if this question is stupid.
Thank you in advance,
Peter Thorpe
Hi Peter,
Thank you for using Codonphyml.
I see you are confused with some concepts. Can you tell me what do you use codonphyml for? Just to understand the context of your doubts.
Idenpendent of that, I will write a very general and basic answer below, since I am assuming you are not familiar with some ideas behind Codonphyml.
Codonphyml is a tool that reconstruct phylogenetic trees from DNA/CODON/AMINO ACID sequences. We do so by calculating the most likely ancestor of two given sequences. FOr example, you want to see the tree of mammals. You give Codonphyml the DNA sequence of each known mammal, and Codonphyml will return a tree that represents the evolutionary relationship between them, that means, who share common ancestors.
To calculate the ancestor nodes in a tree, we get as starting point the given DNA/CODON sequences, which are made up of the nucleotides C,T , A, G, the DNA symbols. Moreover, a CODON is made up of 3 consecutive DNA symbols, which can also be translated into AMINO ACIDS. OK.
For example:
DOG
Symbol A C T T T C G A C
Position 1 2 3 4 5 6 7 8 9
FOX A C C G T C G A C
Position 1 2 3 4 5 6 7 8 9
COW A C G T A C G A C
Position 1 2 3 4 5 6 7 8 9
So how the tree will look like? Or who share common ancestors.
Possible trees will be the ones below, which is the most likely?
To answer that, Codonphyml calculates the probability of mutations at the letters in given positions of the sequences. For example, in position 3, DOG has a T, FOX has a C and COW has a G.
1- Transition/Transversion:
When a A mutates into a G or a G mutates into a A, or T mutates into a C or C mutates into a T we call that a TRANSITION.
If A mutates into T or C, or T mutates into A or G, or C mutates into A or G, or G mutates into C or T we call that TRANSVERSION.
Thus, the transition transversion ratio just represents wheters TRANSTIONS are more likely than TRANSVERSIONS (ratio >1 ) or the other way (ratio <1).
2- Nonsynonmous/synonymous
When we look at CODONS, we are grouping DNA symbols by 3 ... for example, DOG CODON version would be
DOG
Symbol ACT TTC GAC
Position 1 2 3
Codons are translated into AMINO ACIDS, the building blocks of proteins. Actually there are more CODONS than AMINO ACID, if you just count the possibilities it will be 64 possible codons (444), while there are only 20 AMINO ACIDS.
Thus, different CODONS will be translated into the same AMINO ACID.
When a mutation occurs that mutates a CODON, without changing the AMINO ACID, this is a SYNONYMOUS mutation.
If the mutation changes the resulting AMINO ACID, it is NONSYNONYMOUS.
So we have another ratio.
In your example, you have 4 ratios because you use a probabilistic model that tries to consider that, along the DNA sequence, different places will behave in different ways with respect to the NONSYNONYMOUS/SYNONYMOUS ratio.
Please, if you have any further doubt, let me know.
Best
Marcelo
Last edit: Marcelo S Zanetti 2015-01-28
HI Marcelo,
Thank you very much for taking the time to discuss this will me!!!! Much appreciated.
I an interested in pathogen-secreted proteins which are "put/secreted" into the plant in order to gain a parasitic advantage. We assume these are under selection pressure and these may be recognised by the plant defense systems leading to a defence response. In evolutionary terms, this version of the pathogen gene that is recognised by the plant is "punished".
Therefore, my candidates gene that I have identified may have a greater DN/DS ratio than 1, if the above statement is true.
I have identified orthologues of my genes of interested by reciprocal best blast hit analyses of 6 species, aligned the protein sequence and backtranslated the DNA sequences to this. Then, I subjected the aligned DNA seq of the homologues to CodonPhyml analysis in order to determine if these sequences are under selection pressure. (I have around 300 alignment files of different candidates/clusters).
my command was:
filenames=*.phy
for f in ${filenames}
do
echo "Running codonphyml ${f}"
cmd="codonphyml -i ${f} -m GY --fmodel F3X4 -t e -f empirical -w g -a e"
echo ${cmd}
eval ${cmd}
done
Is the a better suited probabilistic model I can use for my problem than what I have put above to get one DN/DS value?
Also, is CodoPhyml able to return the codons that it thinks are under selection pressure, if they are?
I hope this makes sense,
Thanks,
Peter Thorpe
Hi Peter, sourceforge was offline so I could not reply before.
So you are using the gamma model for omega (-w g). This means you have ,by default, 4 different values of omega. Each different omega value will be attributed to a different group of codon sites, but currently codonphyml does not return which codon sites have a give omega value. But this is a feature that some of our colleagues have been working on. I can implement it as well when I have some spare time.
So if one of the omega values is significatively more than 1, then you know that some sites are under selective pressure, currently we cannot tell you which ones because we need to implement that.
o0, o1, o2, o3 are the different omega values I told you about. You can have more if you need, but the code runs slower. p0...p3 represents the proportion of all codon sites that have the respective omega value. But when you are using the gamma model, this probabilities are all the same. More details you could get by using the discrete prob. model. I think it is the -w d option.
For example, your result:
Omega / Probability: o0=0.00010000 p0=0.25000000
Omega / Probability: o1=0.00014905 p1=0.25000000
Omega / Probability: o2=0.02008926 p2=0.25000000
Omega / Probability: o3=1.14867031 p3=0.25000000
You results show that it is likely that some sites (o3 value) are under selective pressure. Is 1.1486 siginificant value?
Let me know if you have further problems. Will try to show the codon sites in codonphyml asap.
Best
Last edit: Marcelo S Zanetti 2015-02-11
Cool, thank you for the explanation. Sounds very interesting. I will take a look at these issues and will write back to you asap.
Best
Thank you for the explanations above. Can I ask you about the 4 DN/DS values again?
. Nonsynonmous/synonymous ratio:
Omega / Probability: o0=0.00010000 p0=0.25000000
Omega / Probability: o1=0.00014905 p1=0.25000000
Omega / Probability: o2=0.02008926 p2=0.25000000
Omega / Probability: o3=1.14867031 p3=0.25000000
You said above "... In your example, you have 4 ratios because you use a probabilistic model that tries to consider that, along the DNA sequence, different places will behave in different ways with respect to the NONSYNONYMOUS/SYNONYMOUS ratio...."
How would you interpret the values above, these values are so far apart from each other?
what does o0= represent?
what does o1= represent?
what does o2= represent?
what does o3= represent?
what does o4= represent?
Thanks,
Pete
Last edit: Peter Thorpe 2015-01-28
Hi Peter,
I am sorry, too many things to do, ended up forgeting your question. Will do it tonight.
Best
sorry for the late reply, wrote above.
Hi Peter, are you satisfied with my answer? Please let me know. Thx.
Hi Marcelo,
Sorry your reply get nested in the conversation above (i didnt see it) ... I have read it now. Yes this does make sense. This was what I have thought the result would mean.
I would be very keen for CodonPhyml to return the sites under selection (as you mentioned - this would be a great addition to the package). I think that would be amazing!! I know Codeml does this.
I have some good results ;)
Thank you very much for your time Marcelo - you have helped a lot.
Cheers,
Peter Thorpe
Hi Peter, cool.
I deleted some messages so it reads better now.
While I am not read with codonphyml, you can still use it to generate the tree, than you can use codeml, with that tree as input, together with your msa, to get the codon sites under selective pressure, running a similar model. So you don't need to wait.
Best
Great, thanks! I will use the trees generated.
Thanks,
Pete
Great!