generating data from a stochastic model ...

  • Tim Handley

    Tim Handley - 2011-04-26

    I'm interested in using JAGS to generate data from a stochastic model,and then
    fit that generated data with MCMC.

    The motivation for this comes from a particular data set with known
    measurement error. To understand the important effects within the data, and
    accurately estimate credible intervals, I'd like to use the actual
    measurements and the known errors to stochastically generate a bunch of
    "data", and then fit that "data" using MCMC in JAGS.

    Back in 2009, there was some discussion of this issue over here: http://jackm
    . I've
    kind-of followed that thought process, and I have some ideas, but I don't
    currently know how to do this. I would greatly appreciate any advice which
    anyone might have.

    Things which I have considered include:

    1) The JAGS data block. I created a data block, and then created a data model
    within this block which would generate "data" from the actual measurements and
    associated measurement errors. However, as clearly described in the JAGS
    manual, each node in the data block was forward sampled exactly once. What I
    had needed was for each node to be forward sampled once per iteration, such
    that the ultimate posterior distributions would reflect the full range of
    variation possible in the "data." As this is not how a data block functions,
    it seems that a data block is not a solution to this problem.

    2) cut() or dsum(): After some additional reading, it sounds like the BUGS
    cut() function would do the trick. However, I much prefer to work within JAGS.
    According to Martyn ( this can be done within JAGS as an observable
    function, although it may not be advisable. More specifically, dsum() may do
    what I want. According to the manual, dsum() requires that the parameters
    passed to the function be "unobserved stochastic nodes", but I could perhaps
    work with that.

    Thus, a solution may be the following:

    for(each data point i)
      fakedata[i] ~ stochastic function of measurements and known errors
      fakedata_fixed[i] ~ dsum(fakedata[i])

    The first relation generates the "data" from the measurements and the known
    uncertainties, and the second uses dsum() as the JAGS analog to the BUGS
    cut(), ensuring that no information propagates upwards to the "data"
    generating nodes.

    As I noted earlier, I would greatly appreciate any advice or feedback. Thanks,

  • Martyn Plummer

    Martyn Plummer - 2011-04-28

    If you want to generate and fit the data in the same model then in theory you
    have two choices

    1) Use a loop in which the data are generated first then analyzed. Pool the
    simulations from the analysis phase. As you know you can generate the data
    within JAGS using a data block. This would be easy to set up using the rjags
    interface to R. The disadvantage is that you must go through the model
    compilation and adaptation at each iteration.

    2) Try to do everything within one model. This will involve cuts. There is no
    other way to do it. Cuts are not properly implemented in JAGS. Although there
    is an experimental cut module in the source it is not part of the distribution
    because the naive cut algorithm, as implemented in OpenBUGS is wrong: it will
    not converge to a well-defined distribution.

    So in practice you will have to go for option 1 now.

  • Tim Handley

    Tim Handley - 2011-04-28

    OK. I can do #1, it just takes longer.

    As a humble additional request, when you have time, could you expand the
    observable functions section of the manual? I'm afraid I still don't
    understand how dinterval() and dsum() work, neither do I understand the
    constraints on their use.

    Again, thanks for sharing your work, and for providing kind support on these


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks