Ziheng Yang - 2014-05-21

Below is a question from Carlos about the gamma priors for tau and theta in bpp, as well as my reply. The question may be of general interest, so with Carlos's permission, I am copying the question and answer below.

(A) Question

Dear Ziheng Yang,

I'm interested in applying BPP analysis to delimiting species in a group of grasshopers.

I'm having some issues, hope you can help me, as I haven't been able to reach consensus among colleagues. I'm doing a species delimitation analysis, and, following previous works (e.g, Leache and Fujita 2010), I've tested three different combinations of priors for the Gamma distribution of theta and thau to test if modification in these affect posterior distributions. After several replicates for each scenario, I found very contrasting values of PP among scenarios. Scenarios with large theta and large thau (i.e., 2/10 for both; top values in nodes in the figure), as well as the one with large theta and small thau (2/10 and 2/1000 respectively; values in the middle) give small values of PP, but the scenario with small values for both, theta and thau (bottom values), gives high PP in almost all nodes (>95% PP).

From this, I could say that these parameters are in fact influencing a lot the posterior distributions. I'm not sure of what should follow. I have thought in considering as "supported" only cladogenesis of those nodes showing high PP values for the three scenarios, and "not supported" the rest (although one of the scenarios shows high PP for that node). Other thing could be to provide a reasonable explanation to prefer one scenario over others, although this seems rather difficult. I'm also attaching one .ctl file with parameters for one of the scenarios.

Please, any suggestion from you will be greatly appreciated by us, hope not take so much of your time,

thank you in advance,

Carlos

(B) My reply

Dear Carlos,

I suggest that you read and follow the advice in the documentation, bppDOC.pdf. Not all of Leache and Fujita's recommendations are correct so you should not follow them without careful thought about your own data. The introduction explains what theta and tau mean and the section "5. The gamma prior" explains what the two parameters of the gamma prior mean. The values of 1 or 2 for the shape parameter are fine if you want the priors to be diffuse, but your prior means may be orders of magnitude off.

for example, the doc explains
"theta is the average proportion of different sites between two sequences sampled at random from the population."
your G(2, 10) prior for theta means that two sequences from the same population are 20% different. Are your grasshoppers (of the same species) really so different from each other? Humans are only 0.06% different from each other. If they are only 1% different, you should use something like G(2, 200), and if they are only 0.1% different, you should use something like G(2, 2000). The same applies to the tau prior.

best wishes,
ziheng