Menu

Questions about BP&P?

BP&P Q&A
Rannala
2012-10-17
2017-04-07
1 2 3 > >> (Page 1 of 3)
  • Rannala

    Rannala - 2012-10-17

    Users are encouraged to post questions here regarding the BP&P
    software. When reporting apparent bugs or other unexpected program
    behavior please state the version of the program that you are using as
    well as any parameter settings.

     
  • Anonymous

    Anonymous - 2013-01-05

    I wonder if the BP&P could be used to delimitate bacterial species considering that the populations had no horizontal gene transfer and no (or low levels of) recombination. What kind of special cares should we take to use BP&P in bacterial populations?
    I appreciate you answer and comments,

    Jose
    j.castillo@proinpa.org

     
    • Rannala

      Rannala - 2013-01-05

      The current method assumes that the different sequence loci are freely recombining (unlinked). One issue with bacterial species is the limited recombination/horizontal transfer which creates dependence among sequences/genes/chromosomes across the genome. One strategy would be to use a single sequence locus but that would likely have low power.

       
  • Anonymous

    Anonymous - 2013-02-12

    I am performing a species delimitation analysis with mtDNA and two phased nDNA loci. When I run the program I get an error stating that I have more sequences at locus two (26) than allowed by the control file. This locus has 13 individuals with two haplotypes each. I was wondering what the maximum number of sequences per locus may be? Thanks.

    CHris

     
    • Rannala

      Rannala - 2013-02-13

      This sounds like a problem with the control file format. Can you post the content of your control file?

       
    • Anonymous

      Anonymous - 2017-11-03
      Post awaiting moderation.
  • Anonymous

    Anonymous - 2013-02-13

    I think I figured it out. It had to do with how I was numbering my individuals in the data file. For example, I was assigning the same sequence ID to different phased alleles, so in the Imap and control file there were not enough sequences per species. I specified new sequence IDs for the second allele for each nuclear gene but included these in the same 'species' as the first allele in the Imap file. I hope this is correct.

    Chris

     
  • Anonymous

    Anonymous - 2013-02-19

    Hello,

    I have performed a species delimitation analysis for ten closely related and recently diverged species and I have two questions about the output.

    First, although the ratios of theta to tau are the same for both algorithms, the absolute values of theta and tau differ between the two rjMCMC algorithms, with the mean values from algorithm 1 being 2-4 times greater than those from algorithm 0, although in some cases the 95% credible intervals overlap. Should I be worried about this? I have done eight replicate runs of each algorithm and have found this behavior to be consistent across runs.

    Second, I am interested in measuring the divergence between species and subclades in units of Ne generations. I have a known substitution rate, and can convert theta to Ne and tau to time. However, I'm not sure which is the appropriate value of theta to use. Should I calculate the geometric mean of the mean theta values across the entire species tree?

    Thank you for your help,
    Ron

     
    • Ziheng Yang

      Ziheng Yang - 2013-06-06

      First, Algorithms 0 and 1 should produce identical results, so the difference is a concern. Do you see similar differences when you turn off speciesdelimitation and use a fixed tree?

      I am not sure about your second question. The model assumes that each population has its own Ne, so you need decide which one to use. Also since tau/theta = (time*mu)/(4 Ne mu) = time/(4 Ne), so you don't have to know the mutation rate since mu cancels.

      ziheng

       
  • Anonymous

    Anonymous - 2013-03-21

    I am trying to use MCcoal for simulating and then analyzing the simulated data with BPP.
    I was wondering if the heredity scalars are working in both programs. I just tried to specify a file with these scalars but MCcoal reported a problem. A file with locus rates was apparently read without problems.
    thanks for your help
    sincerely,
    Arley Camargo

     
    • Ziheng Yang

      Ziheng Yang - 2013-06-06

      i am still learning how to use sourceforge. sorry this is such a late reply.
      this question is perhaps answered already.
      anyway, bpp can deal with heredity scalars, but MCcoal can't. I guess the way to go may be for you to write simple (perl) scripts to generate the MCcoal control file and simulate the alignments for different scalars separately and then merge the alignments into one data file. That way you know how many loci should be generated for each scalar. The scalar is used to multiple theta's, but the tau's should remain unchanged.
      ziheng

       
  • Anonymous

    Anonymous - 2013-05-22

    I am trying to run BPP2.2 to delimit 5 possible fungal species within a species complex. I have run the program with each of the possible starting trees, but the following generations always run using the 1111 tree, so the branches are not collapsing and I am not getting posterior species model probabilities. I have 70 sequences for each of 3 loci. I suspect my .ctl file (attached) has an error.

    Thanks for your assistance,

    John

     
    • Anonymous

      Anonymous - 2013-06-05

      I am getting the exact same thing. Have you received an answer to your question, John? I wrote my control file identically to other control files, so I don't think that is the issue. I've also started on all five starting trees, varied theta and tau combinations (x3) and used both algorithm settings (0 and 1) with the same result every time: first generation it goes to the fully resolved species tree with a probability of 1.000 and stays there through all 50,000 generations...

      Thanks,
      KPW

       
    • Rannala

      Rannala - 2013-06-05

      This is not necessarily any indication of a problem. It is possible that the completely delimited model has probability 1. In that case the program will not visit other delimitations. If you are worried you could try artificially splitting one of your populations into two groups, create a guide tree with the groups as sister species and see whether that node is collapsed in the posterior distribution of models.

       
    • Ziheng Yang

      Ziheng Yang - 2013-06-06

      The control file looks o.k. Run the program on the command line and
      observe its behavior, and if it behaves in the same way as it does on
      the example datasets, it should be fine. Version 2.2 includes quite a
      few data examples. For example, the two datasets that we analyzed in
      our recent genetics paper are in the package. You can try to
      duplicate our results in the paper, and then prepare your files in the
      same format.

      Most likely the analysis supports the fully resolved tree, so model
      1111 has posterior prob ~100%. The results are probably correct
      especially if you get the same results with different starting trees.
      You have many sequences at each locus, so the dataset is quite large.
      You can use 1 or 2 for nloci = 3 to see whether the prob becomes
      smaller with fewer loci. Also check whether the priors are reasonable
      and change them to see whether they have an impact. (Look at the
      explanations of those priors in the document.)

      thetaprior = 2 2000    # gamma(a, b) for theta
        tauprior = 2 20000 1  # gamma(a, b) for root tau & Dirichlet(a) for other tau's
      

      I just saw the post below by KPW and Bruce's reply. Yes, it seems
      that there may not be any numerical problem. It is just that the
      method is favoring the fully resolved model, with posterior ~100%.

      I am interested in the question whether the method (correctly
      implemented, without any computational problems) oversplits. We don't
      know much about this. The method seems often to favour many species
      (or even the fully resolved tree) in empirical datasets.
      Nevertheless, in simulations, it does not oversplit. You know your
      species, if you believe the method is splitting two populations that
      should be one species into two species, it may be interesting to many
      people to know. It is possible that the simulations (there are only 2
      of these, see below) missed some important features of the real
      process and the simulation results are not that relevant, but in that
      case, we need know what the important features are.

      best,
      ziheng

      Camargo, A., M. Morando, L. J. Avila, and J. W. Sites. 2012. Species delimitation with ABC and other coalescent-based methods: a test of accuracy with simulations and an empirical example with lizards of the Liolaemus Darwinii complex (Squamata: Liolaemidae). Evolution 66:2834-2849.

      Rannala, B., and Z. Yang. 2013. Improved reversible jump algorithms for Bayesian species delimitation. Genetics 194:245-253.

      Zhang, C., D.-X. Zhang, T. Zhu, and Z. Yang. 2011. Evaluation of a Bayesian coalescent method of species delimitation. Syst. Biol. 60:747-761.

       
      • Anonymous

        Anonymous - 2013-07-19

        Good day,

        I have now rerun my dataset numerous times to check for oversplitting. In each series of BPP runs I used both algorithm 1 and 0 and used 3-4 starting trees. In the first series, I created a "false" node by choosing to divide one clade from my coalescent species tree into two clades. The posterior probabilies of this clade always had low support and so oversplitting did not occur. In the second series, I split the same clade by choosing a small subclade within with a 0.64 posterior probability. At the end of these runs, posterior probability of this node was always 1.00, so as I understand it, oversplitting did ocurr. However, there was an error reported at the end of each run: error in scanfile (). In the third series, I split a different clade from my coalescent species tree by choosing a small subclade within with a 0.93 posterior probability. In this case, the BPP runs returned posterior probabilites of about 0.98 each time. This, if I am correct, would not be considered a case of oversplitting. These third series runs also produced the same error message: error in scanfile (). Each of the clades I split had 1.00 support in my coalescent species tree. I am very interested in your evaluation of these conditions and results.

        Thanks you,

        John

         
  • Anonymous

    Anonymous - 2013-06-02

    Hi
    I am running BP&P for species delimitation with 16 species in my guide tree and 130 sequences for 6 locus. The analysis initiates, sets the parameters, calculates de likelihood and it does not seem to proceed to print out the percentage progress indicator, acceptance proportions and the posteriors for the parameters.
    Have set something incorrectly?
    Thanks!

     
    • Ziheng Yang

      Ziheng Yang - 2013-06-06

      I am not sure what may be the problem. Can you copy the last few lines of the screen output here.
      ziheng

       
  • Anonymous

    Anonymous - 2013-06-12

    Hey guys.
    I am using the BPP with three loci, for two populations. I check all my files several times, and always the program show me the same error.

    • Bad option 'A' in first line of seqfile -

    what that means?
    I already ran on Win and Mac, and the error is the same.
    Thanks in advance

    obs: I do not know if matter, but my first loci, is phased...

     
  • Ziheng Yang

    Ziheng Yang - 2013-06-19

    I don't know, but the problem is that the program is having trouble reading the sequence data file. did you look at the files in the package. you have to use the same format.
    ziheng yang

     
  • Anonymous

    Anonymous - 2013-07-15

    I can't seem to get the acceptance proportion of my GBtj finetune parameter into the 0.3-0.4 range, no matter what I change the corresponding finetune parameter to in my ctl file. It stays around 0.64 no matter what. I must be missing something. Any advice?

     
    • Ziheng Yang

      Ziheng Yang - 2013-07-21

      This is not a problem, and is noted before. Basically the GBtj move changes one coalescent time tj in a gene tree. It is a very small move, and does not change the likelihood and prior much so the acceptance proportion is high.
      ziheng

       
  • Anonymous

    Anonymous - 2013-08-02

    Regarding my post from 5-22-13

    I have now had the opportunity to run BPP multiple times on my dataset to test for over splitting. I have run 3 tests. I have run both algorithm 0 and 1 and used 3 starting trees for each test.

    I have 5 clades with nodes supported at posterior probabilites of 1 from my coalescent run in MrBayes. These 5 clades are supported during my BPP runs at 100% (1111 on the guide tree). I then took Dr. Rannala's suggestion and ran a test by creating an artificial clade to see if it would be supported in the posterior distribution. I created an artificial split of one of my well supported clades. This node was not supported by BPP (11110). Next I split the clade using a small sub-branch within with posterior probability of 0.93, that node was supported 100 percent(11111) in BPP. I then split the clade a third time using a small sub-branch with with posterior probability of 0.64, that node was also supported at 100 percent (11111) in BPP. The program seems to be running correctly. I am still trying to determine the species delimitation of this group and am wondering if these tests would indicate oversplitting, or if I should be satisfied with my original result of my 5 clade run.

    Thank you for your advice,

    John

     
    • Rannala

      Rannala - 2013-09-14

      With only 3 loci, using the data to create additional splits of subclades with high posterior probability in the guide tree could lead to over-splitting (we haven't really explored this issue with such a small number of loci). However, this effect would presumably disappear (posterior probabilities would decrease) if additional loci were then added.

       
    • Anonymous

      Anonymous - 2013-10-13

      If I understand correctly, you are using the reconstructed gene trees, perhaps with posterior probabilities from mrbayes, to define clades/populations, so there is a selection bias, and you would expect bpp to tend to suggest those populations as distinct species. It is like multiple comparisons.
      If you make splits at random, it sounds like that bpp does not over split.

      It is a good question how one should evaluate whether bpp over splits.
      Ziheng

       
1 2 3 > >> (Page 1 of 3)

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.