BPandP / Discussion / BP&P Q&A: Questions about BP&P?

Anonymous - 2013-09-13

Hi Ziheng,

thanks for a very useful piece of software.

I am running a species delimitation analysis with 11 'species', 9 loci and multiple sequences per species.

I am trying the species delimitation with a fixed, fully-resolved species tree (i.e. without rjMCMC) at the moment. The program is running fine, but I am a little confused about how to set the tau-threshold value - or whether I need to do this.

The program seems to automatically report posterior probabilities of each node tau compared to tau-threshold values of 0.0002, and 0.00002.

I have a reasonable date for the root of my group and a good mutation rate estimate that give me a root age tau prior of tau-0 = G(170,2011), and I have decided on my own 'tau-threshold' tau-T = 0.00065. My question is:

is it okay to use my 'realistic' tau-0 for this analysis, and assess the posterior tau values against my threshold after the fact (by examining the 95% CIs on each tau)? or, should I set my tau-0 to have my desired tau-T as its median? i.e. is this how I would achieve Pr(tau-0 < tau-T) = 0.5 ? or, is there another way to include my desired tau-T in the control file that I have not discovered?

Sorry if this question is very basic but I haven't been able to find this spelt out anywhere, and can't find the control file for the human population example in your 2010 paper.

thanks in advance for your help
Pip
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2013-10-13
  
  We recommend the rjmcmc approach despite the fact that the algorithms are hard to run.
  The threshold is hard coded in the program. Pls open up bpp.c and search for the word threshold and change the values opinion the program. Then recompile and run.
  Best,
  Ziheng
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Anonymous - 2017-07-21
    
    Post awaiting moderation.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Anonymous - 2017-08-26
    
    Post awaiting moderation.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Anonymous - 2017-08-26
    
    Post awaiting moderation.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
- Ziheng Yang - 2013-11-15
  
  Yes the program automatically reports posterior probabilities of each node tau compared to tau-threshold values of 0.0002, and 0.00002. This is because the thresholds for tau are hard-coded in the program bpp.c. You need change the values in the file and recompile the program to use different thresholds.
  
  double PtauThreshold[NSPECIES][2]={{0}}, tauThreshold[2]={2E-5, 2E-4}
  
  ziheng
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
- Ziheng Yang - 2015-11-13
  
  G(170,2011) sounds like a strange prior. first the shape parameter 170 means a very informative prior. second the mean 170/2011 is quite large, and i wonder whether there are any errors.
  to delimit species, i suggest that you run analysis A11. please look through the tutorial paper at the web site.
  ziheng
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Rannala - 2013-09-14

The threshold tau_T should be chosen based on your criterion for the number of generations of isolation needed before you are willing to call two populations different species. Tau is measured in units of expected substitutions so if the generation time were 1 year and the substitution rate per year were 10^-9 a choice of tau_T=10^-6 would correspond to a requirement of 1000 generations or more of isolation for species status. We did not have an informative prior on tau_0 in our analysis of the human populations and chose a tau_0 that would produce a single species under the prior (with no data) with probability 50%. If the prior on tau_0 is concentrated on smaller values then the prior probability of splitting increases. It is better to be conservative and use a prior on tau_0 that produces a prior split probability that is no larger than 50% so that you do not oversplit when the data are uninformative. Your prior value on tau_0 is very large so oversplitting should not be a problem with our default values for tau_T. The prior probability that tau < tau_T with your prior on tau_0 and our thresholds is essentially 0. Whether our thresholds are a good choice for your species is something you need to decide based on what you think the substitution rate is per generation and how many generations of isolation you require before calling two populations species. For example, with mu=10^-8 per generation 0.0002 corresponds to 20000 generations of isolation and 0.00002 corresponds to 2000 generations of isolation. Hope that helps to clarify things. Perhaps you should try the rjMCMC algorithm instead?
Bruce

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2013-09-15
  
  Hi Bruce,
  
  thanks very much for the help. Maybe I can describe more precisely what I'm trying to analyse - I am still not quite clear whether my approach is valid and would very much appreciate your feedback.
  
  I have a dataset of 10 ingroup species and 1 outgroup, whose identity as an outgroup is very clear. I have a date of 13 +/- 1 MYA for the root, and a mutation rate of 6.5 x 10^-9, hence a tau_0 of the form G(170,2011) (mean = 0.085).
  
  If I understand your previous answer correctly, you're saying it is fine to use an informative tau_0 in this analysis, because this particular prior tau_0 is so large - but that in general one should aim to be conservative to avoid oversplitting.
  
  What I am actually interested in is knowing how clearly the 10 ingroup species are delimited. Species tree - gene tree coestimation in BEAST shows little support for any significant species relationships, so I'm expecting most taxa pairs will not be distinct under any reasonable threshold using BP&P either. I am using the BEAST species tree as a guide tree.
  
  So, if I set my tao_threshold as ~100,000 generations, which seems reasonable in my taxa, and use the same mutation rate as above, that gives a tao_T of 0.00065.
  
  I suppose my question then is:
  
  How should I quantify the posterior probability of each tau falling below my desired tau_threshold? From your previous answer, it seems it would be fine to e.g. check whether the 95% CI of each tau contains my desired tau_threshold (or do something a little more involved to get an exact P-value). Is this correct? or is there a way to set my desired tau_threshold in the control file itself?
  
  And, more broadly, is this an appropriate use of the fixed-tree method?
  
  If that all sounds okay, I will also try the fixed-tree method on just the 10 poorly-resolving ingroup taxa, as i have a much more informative prior for the age of this node. However i will run the control file without data first to check that prior produces a split probability of <50%.
  
  I will definitely try the rjMCMC algorithms too, just wanted to get this method sorted first.
  
  thanks again
  Pip
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2013-09-18

Hi:
I’m trying BP&P on a bacterial species, which is recognized as ‘species complex’. I’m testing 10 essential genes that are not completely ‘unlinked’. So, to add confidence, I performed the analysis using different gamma priors for theta and for tau (with mean from 0.0001 to 0.1). I tried also BP&P for just 3 genes, 2 genes (the most physically distant genes on the chromosome) or 1 gene. All results are the same: speciation probabilities of 1.0 for all nodes describing 4 different species according the guide tree. The phylogeny of this bacterial species shows 4 well-defined, monophyletic subgroups, each could become a single species. I wonder if the analyses described are robust enough to delimitate the 4 species even though the genes are not ‘unlinked’. If not, what suggestions could you provide?
Jose

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Ziheng Yang - 2013-11-15
  
  Depending on the distance between the loci and the recombination rate, the fact that the loci are linked may not be that important.
  However, defining species in bacteria sounds tricky. Personally I think bpp results should be interpreted and integrated with other sources of evidence to declare species status. Do you have other characteristics, biochemical, physiological etc. to consider besides genetic sequence data?
  Ziheng
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2013-09-21

I am using BP&P to estimate modern and past population size in five species using 12 nuclear non-coding fragments. I have been running the analysis with unphased sequences and I obtain reasonable estimates but I was wondering if I should have phased the alleles before analysis. I thought it should not matter but I would like a more expert opinion. Thank you for providing such a useful software.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Ziheng Yang - 2013-11-15
  
  I am not entirely sure about this. bpp takes sequences as input and assumes that they are phased.
  Some people use a program like PHASE to "estimate" the phase for use with bpp. Some use ambiguity characters, which I think should create some bias.
  Perhaps you can try different things to see how different the results are.
  ziheng
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2013-10-07

Hi,

I'm running bpp2.2 to try and evaluate different species trees generated by *Beast and a BCA analysis (Bucky). I have 4 outgroups and 5 ingroups from a species complex so I'm setting 9 species for the program. I have two loci, one highly variable mitochondrial and one variable nuclear for all ~90 specimens.

My problem is that every guide tree always return me 1.000 posterior probability for the fully resolved tree (between the first and third generation it already gets to 1.000). I thought that maybe my two loci were not enough to "decide" between the two possible guide trees (that are quite similar), but then I tried all sorts of random guide trees, sometimes even placing some of my outgroup species within the ingroup ones, and the result is always 1.000 for the fully resolved one.

I'm using species delimitation=1, speciesmodelprior=1, locus rate=1 2, and heredity=1 4 4. I ran the program using both algorithms (0 and 1), tried different gamma parameters (like G(2, 2000) - (2, 1000) - (2, 10) - (1, 10) - (1, 1000), etc), almost every different starting tree, manual and automatic finetune, but nothing seems to change the result. Acceptance proportions are usually very similar between different runs and they seem to be all right (e.g.0.51 0.27 0.32 0.38 0.31, or 0.49 0.24 0.37 0.39 0.26 ). I never changed the placement of individuals within the species, but the guide species tree only.

The only thing that actually gave me different results was not specifying the "locus rate" and "heredity" but, as I'm using mitDNA and nuclear loci, it seems that this is the correct way of running it.

Am I missing something here?

Any help would be much appreciated.

Regards,

FD
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Ziheng Yang - 2013-11-15
  
  In simulations, we did see that bpp can recover distinct species with only one locus, if 5 or 10 sequences are used from each species. The power can be quite high. See figure 4 in Zhang et al. (2011 Syst Biol 60:747-761). The simulation is done under the ideal situation and the truth is known, so the results simply mean that the power can be very high, and one seldom needs many loci before the posterior probabilities reach ~100%.
  
  Real data are of course more complicated and one does not always know the truth.
  
  If you change a reasonable guide tree into an unreasonable one, for example, by placing the outgroup inside the ingroup, the posterior probs should go up rather than go down, so what you observe with that test is not surprising.
  
  To specify a heredity scalar file, use the format
  
  heredity = 2 heredity.txt
  
  I think the locusrate file is specified using a similar format.
  We have not used those two options together with species delimitation. I don't think they are broken, but they are not tested carefully. Please let me know if you notice anything strange.
  
  ziheng
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2013-11-08

Hello,
I have a dataset of 1 mitochondrial locus, and two nuclear loci from 2 putative species with no shared alleles between. I'm trying to use BPP to get a posterior on the split between these two. When I ran a file with the individuals assigned to the correct species the posterior was high, and I was excited. I then tried to move some of the individuals around between species, expecting the posterior to decrease but it still is up around 0.9. I'm not sure if there is a problem in how I'm formatting the input files or in the parameters, or if its something else that I'm completely not understanding. I've called each individual a "species" to try to determine the probability of the node of each putative species where the individuals coalesce. I've tried altering the fine tuning such that all of the acceptance rates are 0.3-0.7, the tau prior to (10, 20000), and changed the species model prior from 1 to 0, none of which seemed to make a difference. Thank you so much for any insight you might be able to give. I've attached the control file of the mixed up species, which I'm expecting low posteriors from. I also can't get it to read the heredity scalar file. Thank you very much.

bpp13sp10Mixed2.ctl.tmp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Ziheng Yang - 2013-11-15
  
  I think to move the individuals across populations, the only change you need to make is the Imap file.
  
  Make sure the results are stable if you use different rjMCMC algorithms or start the chain from different species delimitation models.
  
  Assuming that the results are all correct, it is harder to decide whether the drop of posterior probabilities from 1 to 0.90 is large enough. This may depend on how many individuals you move around, how large and how informative the datasets are etc. It sounds hard to believe if each individual is claimed to be a distinct species. Perhaps a test using a smaller dataset, say, just three individuals from the same population, is easier to understand. Does bpp generate posterior of 100% for 3 species?
  
  To specify a heredity scalar file, use the format
  
  heredity = 2 heredity.txt
  
  the file should have as many numbers as the number of loci. Have a look at the estimates of theta with and without using this to confirm that the results are sensible. I have not used this option for a while.
  
  You could also analyse the mt locus and nuclear loci separately, for comparison.
  
  ziheng
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2013-11-25

Dear Ziheng & Bruce,

I was hoping one of you could give us a brief explanation about the ESS values that BPP produces versus the ESS values logged if the mcmc file is imported into Tracer, and what we should be focusing on specifically. Thanks.

-Chris

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2014-09-17

Hello,

Would it be reasonable to select theta and tau priors based on results from a *BEAST analysis (using the divergence times and DMT (theta) values)? For instance if the root of the species tree has an age of 4 Myr and the DMT (theta) has a value of 3. Can I use the following priors: tau = G(20, 5) and theta = G(15, 5)?

Thanks,
Santiago

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2014-09-25

Hi,

I am new using BP&P and need some help. I am using a phylogeographic dataset for delimitating species using 7 loci and more than 100 sequences per loci per putative species. The first concern is about sample sizes. One of the "true" species (outgroup in phylogenetic analysis) is represented by only 1 sequence in 2 genes. Could this cause troubles when delimiting species?

Second question is about the mcmc output. I am testing the support for 5 different groups as suggested by a STARBEAST analysis. I have tried the 2 algorithms and both converge on the same result. Model 3 (1100) is the best model with a 100 % support. The mcmc output have a "nan" for every lnL calculated. Am I doing something wrong?

Attached are the screen of the console while running BP&P and the control file.

Thanks

Sergio

bpp.ctl

console.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2015-01-30

I have tried running species delimitation and species tree analysis at the same time using BPP3, which, as the manual says, should be setup as speciesdelimitation =1 and speciestree=1. However, whenever I setup speciesdelimitation = 1, it will give me an error says "Error: RJfinetune <= 0?.".

May I ask what this problem could be?

Thank you so much,

Xin

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "BP&P Q&A" comments posted by this user

Mark all as spam, and block user from posting to "Discussion"

Anonymous - 2015-08-04

I have the same problem as Xin when trying to run bpp3.1 for species delimitation with a fixed species tree. I get the error "Error: RJfinetune <= 0?". any suggestions on how to fix this?

Best wishes
Svante

I have the same problem as Xin when trying to run bpp3.1 for species delimitation with a fixed species tree. I get the error "Error: RJfinetune <= 0?". any suggestions on how to fix this? Best wishes Svante

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Ziheng Yang - 2015-11-13
  
  the syntax is as follow.
  speciesdelimitation = 1 0 2 * speciesdelimitation algorithm0 and finetune(e)
  
  speciesdelimitation = 1 1 2 1 * speciesdelimitation algorithm1 finetune (a m)
  best,
  ziheng
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "BP&P Q&A" comments posted by this user

Mark all as spam, and block user from posting to "Discussion"

Anonymous - 2015-12-15

Hello, kind of a basic question, but I don't see the answer so here goes:

I'm having trouble opening the FigTree.tre files created by the A00 analysis (bpp v3.2) in FigTree. Figtree (v1.4.2 for mac) tells me "Error reading tree file: missing closing ')' in tree." I tried running your frogA00 example as well, that tree also doesn't open.

I tried manually editing the file as well to no avail, though I admit my knowledge of the nexus format is not great, so I could be putting the closing ) in the wrong place. Maybe it's an issue with FigTree, but it opens other trees fine so maybe it's the new version of this program...? Just thought you'd want to know. I've attached the text of the tree below, in case that's helpful for diagnosis.

Hello, kind of a basic question, but I don't see the answer so here goes: I'm having trouble opening the FigTree.tre files created by the A00 analysis (bpp v3.2) in FigTree. Figtree (v1.4.2 for mac) tells me "Error reading tree file: missing closing ')' in tree." I tried running your frogA00 example as well, that tree also doesn't open. I tried manually editing the file as well to no avail, though I admit my knowledge of the nexus format is not great, so I could be putting the closing ) in the wrong place. Maybe it's an issue with FigTree, but it opens other trees fine so maybe it's the new version of this program...? Just thought you'd want to know. I've attached the text of the tree below, in case that's helpful for diagnosis.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

FigTree_A00_lowerpriors.tre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Rannala - 2015-12-16
  
  That file does not work, and it is caused by the theta values and the # sign. To make it work, you need to delete the theta values using a regular expression search and replace.
  I think it was readable by an earlier version of figtree, but the newer versions are perhaps more strict.
  Best,
  Ziheng
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Questions about BP&P?

Bayesian MCMC algorithms for the analysis of phylogeographic DNA data

Forums

Help

Questions about BP&P?

Questions about BP&P?

Bayesian MCMC algorithms for the analysis of phylogeographic DNA data

Forums

Help

Questions about BP&P? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Questions about BP&P?