Tom Lippincott
2012-04-08
Hi Martyn (and JAGS users), I have a few technical questions, along with a
couple ways I was thinking of contributing.
I've been using JAGS on and off for about a year for model prototyping, but
have reached the point where I'm introducing non-parametrics (specifically,
hierarchical Dirichlet Processes). When implementing these by hand, there's a
lot of bookkeeping to keep things efficient, such as lookup tables so we don't
have to repack the global statistics every time a new component is observed.
I've seen a couple posts from people about DPs, but no actual model code. It
would be great if someone could post a working model. But if it amounts to
using the model language to handle what should be internal bookkeeping, I
imagine it could greatly benefit from a dedicated module. If so, I would be
interested in working on it.
Regarding data: generally, I'm dealing with situations like the following: a
global DP is used to draw a distribution, which is then used as input to sub-
DPs (one per group). Each instance in a group draws a class with its sub-DP as
prior. This class is then used to generate the instance, which is several
categorically-distributed values. In other words, the data might look like
this for two groups, the first with one instance, the second with two:
[ [ [1,2,3], [2,1], ], [ [3, 4, 1, 2], ], ]
Now, since there's a fairly small upper limit on the maximum group and
instance sizes, my ideal approach would be to pass a matrix with dimensions
(GROUP_COUNT, MAX_GROUP_SIZE, MAX_INSTANCE_SIZE), with unused cells indicated
by NAN or somesuch. The less-ideal approach I've tried is passing this along
with index-matrices that the model uses to iterate only over the appropriate
cells, e.g.:
for(i in 1 : length(GROUPS)){ for(j in 1 : GROUP_LENGTHS[i]){ ... } }
But I've found that JAGS still creates nodes for the full NxNxN matrix. Is
there a way to do what I'm asking in the current code, e.g. with sparse matrix
classes? If not, it seems fairly straightforward to implement at the data-
reading level (by simply not creating nodes for the extraneous cells), and I'd
be glad to work on this as well.
Finally, something more controversial: while R of course has some strong
arguments in its favor, I personally write almost everything in Python. I just
checked out the latest JAGS code, tossed in a SWIG interface file and an SCons
build config, and found that most of it seamlessly builds a full Python
interface. I don't know if this has already been considered, but it would have
the nice advantage (once kinks are worked out) of auto-generating native
libraries, with fine-grained access and control of all the JAGS classes, for
arbitrary scripting languages (including R). I mention this mainly in case
anyone else is interested: I think I'll make a repo that tracks the main
branch, where I can try out this idea for my own use, but if it goes well I'll
let the list know.
Thanks for your help,
-Tom
Lauri Lyly
2013-01-31
Hey Tom,
I also have experience with SWIG and would be interested in a Python wrapper for jags. I'm not too fond of R so I'm asking what's the status of that Python wrapper project?
I would be willing to contribute if it helps.
Tom Lippincott
2012-04-08
Addendum: I did just notice that in 3.0.0, the changelog refers to mixture
nodes being able to share a MixMap, which I imagine is related to an efficient
DP implementation.
Jack Tanner
2012-04-08
This sounds really exciting, because I was just about to look into Dirichlet
Processes for my own models.
For a sparse JAGS data structure (that may or may not be useful for DPs), see
here:
http://sourceforge.net/projects/mcmc-jags/forums/forum/610037/topic/5022737