a couple questions and possible contributions

Help
2012-04-08
2013-01-31
  • Tom Lippincott
    Tom Lippincott
    2012-04-08

    Hi Martyn (and JAGS users), I have a few technical questions, along with a
    couple ways I was thinking of contributing.

    I've been using JAGS on and off for about a year for model prototyping, but
    have reached the point where I'm introducing non-parametrics (specifically,
    hierarchical Dirichlet Processes). When implementing these by hand, there's a
    lot of bookkeeping to keep things efficient, such as lookup tables so we don't
    have to repack the global statistics every time a new component is observed.
    I've seen a couple posts from people about DPs, but no actual model code. It
    would be great if someone could post a working model. But if it amounts to
    using the model language to handle what should be internal bookkeeping, I
    imagine it could greatly benefit from a dedicated module. If so, I would be
    interested in working on it.

    Regarding data: generally, I'm dealing with situations like the following: a
    global DP is used to draw a distribution, which is then used as input to sub-
    DPs (one per group). Each instance in a group draws a class with its sub-DP as
    prior. This class is then used to generate the instance, which is several
    categorically-distributed values. In other words, the data might look like
    this for two groups, the first with one instance, the second with two:

    [
      [
        [1,2,3],
        [2,1],
      ],
      [
        [3, 4, 1, 2],
      ],
    ]
    

    Now, since there's a fairly small upper limit on the maximum group and
    instance sizes, my ideal approach would be to pass a matrix with dimensions
    (GROUP_COUNT, MAX_GROUP_SIZE, MAX_INSTANCE_SIZE), with unused cells indicated
    by NAN or somesuch. The less-ideal approach I've tried is passing this along
    with index-matrices that the model uses to iterate only over the appropriate
    cells, e.g.:

    for(i in 1 : length(GROUPS)){
      for(j in 1 : GROUP_LENGTHS[i]){
        ...
      }
    }
    

    But I've found that JAGS still creates nodes for the full NxNxN matrix. Is
    there a way to do what I'm asking in the current code, e.g. with sparse matrix
    classes? If not, it seems fairly straightforward to implement at the data-
    reading level (by simply not creating nodes for the extraneous cells), and I'd
    be glad to work on this as well.

    Finally, something more controversial: while R of course has some strong
    arguments in its favor, I personally write almost everything in Python. I just
    checked out the latest JAGS code, tossed in a SWIG interface file and an SCons
    build config, and found that most of it seamlessly builds a full Python
    interface. I don't know if this has already been considered, but it would have
    the nice advantage (once kinks are worked out) of auto-generating native
    libraries, with fine-grained access and control of all the JAGS classes, for
    arbitrary scripting languages (including R). I mention this mainly in case
    anyone else is interested: I think I'll make a repo that tracks the main
    branch, where I can try out this idea for my own use, but if it goes well I'll
    let the list know.

    Thanks for your help,

    -Tom

     
    • Lauri Lyly
      Lauri Lyly
      2013-01-31

      Hey Tom,

      I also have experience with SWIG and would be interested in a Python wrapper for jags. I'm not too fond of R so I'm asking what's the status of that Python wrapper project?

      I would be willing to contribute if it helps.

      • Lauri
       
  • Tom Lippincott
    Tom Lippincott
    2012-04-08

    Addendum: I did just notice that in 3.0.0, the changelog refers to mixture
    nodes being able to share a MixMap, which I imagine is related to an efficient
    DP implementation.

     
  • Jack Tanner
    Jack Tanner
    2012-04-08

    This sounds really exciting, because I was just about to look into Dirichlet
    Processes for my own models.

    For a sparse JAGS data structure (that may or may not be useful for DPs), see
    here:

    http://sourceforge.net/projects/mcmc-jags/forums/forum/610037/topic/5022737

     
    Last edit: Martyn Plummer 2013-02-04