Apparently I need help with my model

Fabio
2013-09-18
2013-10-10
  • Fabio
    Fabio
    2013-09-18

    Hi all!

    I need to design a tool to help understand web analytics data in a kind of A/B testing scenario. But it's not a straight A/B test of one piece of content vs another. Instead it's one process for generating a page vs another, and there's many pages on which users can convert/not convert, and some pages may have inherently higher or lower conversion rates. Also while there may be a lot of overall data, when you just look at the page level, many pages have just a few visits, and usually under just condition A or B for a page.

    So I researched and found that I should be using a hierarchical bayesian model. I've tried to implement the model in both Python and JAGS, but first it was failing and now I'm not sure it's right. Can you guys look at my model (see below for XXXXXX) and let me know what I'm doing wrong?

    Model (Try.bug)

    var alpha_A, beta_A, alpha_B, beta_B, p_A[N], p_B[N], delta[N], n_A[N], n_B[N], c_A[N], c_B[N];
    model {
        for (i in 1:N) {
            c_A[i] ~ dbin(p_A[i],n_A[i])
            c_B[i] ~ dbin(p_B[i],n_B[i])
            p_A[i] ~ dbeta(alpha_A,beta_A)
            p_B[i] ~ dbeta(alpha_B,beta_B)
            delta[i] <- p_A[i]-p_B[i]
        }
        alpha_A ~ XXXXXX
        alpha_B ~ XXXXXX
        beta_A ~ XXXXXX
        beta_B ~ XXXXXX
    }
    

    Data (Try.r)

    "N" <- 60
    "c_A" <- structure(c(0,6,0,3,0,8,0,4,0,6,1,5,0,5,0,7,0,3,0,7,0,4,0,5,0,4,0,5,0,4,0,2,0,4,0,5,0,8,2,7,0,6,0,3,0,3,0,8,0,4,0,4,2,6,0,7,0,3,0,1))
    "n_A" <- structure(c(0,9,0,3,0,9,0,9,0,9,3,9,0,9,0,9,0,3,0,9,0,9,0,9,3,9,0,9,0,9,0,3,0,9,0,9,0,9,3,9,0,9,0,9,0,3,0,9,0,9,0,9,3,9,0,9,0,9,0,3))
    "c_B" <- structure(c(5,0,2,2,2,0,2,0,2,0,0,0,5,0,4,0,3,1,2,0,2,0,2,0,0,0,3,0,6,0,4,1,5,0,2,0,6,0,1,0,2,0,4,0,4,1,1,0,3,0,5,0,0,0,5,0,2,0,7,1))
    "n_B" <- structure(c(9,0,9,3,9,0,9,0,9,0,3,0,9,0,9,0,9,3,9,0,9,0,9,0,3,0,9,0,9,0,9,3,9,0,9,0,9,0,3,0,9,0,9,0,9,3,9,0,9,0,9,0,3,0,9,0,9,0,9,3))
    

    These sample data generated using p_A~Beta(30,20), p_B~Beta(20,30), 0 correlation between p_A and p_B, and n=[0,9,0,9,0,9,3] or [9,0,9,0,9,0,3] repeating.

    Control (Try.cmd)

    model in Try.bug
    data in Try.r
    compile, nchains(2)
    initialize
    update 400
    monitor set p_A, thin(3)
    monitor set p_B, thin(3)
    monitor set delta, thin(3)
    update 1000
    coda *, stem(Try)
    

    Originally my model used dgamma(1,20) for the priors on alpha and beta (I've marked this as XXXXXX), but I was always getting Error in node beta_B, Slicer stuck at value with infinite density or Error in node beta_A, Slicer stuck at value with infinite density. Interestingly I never got it for either of the alphas.

    I wasn't sure what the infinite densities might be, but I decided to try to remove the peak near zero for alphas/betas by changing the XXXXXX to dgamma(2,20). Unfortunately, I was still getting the same error, but by sheer chance, after running it two dozen times, one time it finished and after inspecting the output, I realized I was seeing a lot of p values near 0 or 1, which I was NOT expecting since the underlying distributions have p_A closely clustered around 0.6 and p_B around 0.4.

    To try to force the sampler away from the alpha or beta 0 values corresponding to those 0 or 1 p values, I changed XXXXXX to dpar(0.5,1), and that got the sampler to finish on each try. First output here

    Even though it's now completing, I suspect there's something wrong with my model since I had to force the priors away from certain (incorrect) values to get the model to complete.

    One hunch is that it has something to do with all the 0 measurements. But since n is 0 for those, c=0=Binom(0 trials, p success rate) should be equally probable given any p, and it shouldn't be affecting the posterior of p (much less alpha or beta)

    Anyway, can someone help me understand what's going on?

     
    Last edit: Fabio 2013-09-18
  • Martyn Plummer
    Martyn Plummer
    2013-09-19

    You're seeing a lot p values near 0 and 1 because that's in your prior distribution.

    Try simulating from the prior to see what implications your prior distribution has for p_A[i] or p_B[i]. You can do this either by not supplying the outcome variables c_A and c_B in the data, or by commenting out the lines that define them in the model file.

    You will see that the dgamma(2, 20) prior distribution on alpha_A and beta_A generates a prior for p_A that is heavily concentrated near 0 and 1. The dpar(0.5, 1) prior also gives a bimodal prior for p_A, although the phenomenon is less extreme.

    The WinBUGS Litters Example is very similar to your problem. In the litters model, the BUGS authors put a dgamma(1, 0.001) prior on the shape parameters. They note

    |This provides a log-concave prior which gently penalizes larger values.

    It also gives a prior uniform distribution for p_A[i], which is nice.

     
  • Fabio
    Fabio
    2013-09-20

    <facepalm/> So I was assuming the parameters to dgamma were shape and scale, but actually they were shape and rate. Of course, had I read the manual more closely and compared the dgamma PDF definition against the PDF I was visualizing in another application, I should have realized that! A little more practice on using JAGS should help in that I will be able to conveniently use JAGS (rather than another application) to visualize my prior distributions.

    Thanks also for suggesting a better prior, which is actually close to what I thought I was using!

     
    Last edit: Fabio 2013-09-20
  • Fabio
    Fabio
    2013-10-09

    Apparently I still need help with my model.

    It seems to work fine in the case that there's enough data on both the A and B sides, but... when there is very little data on one side (let's say no data for A for the sake of argument), and you've got your p_B's centered around a value other than 0.5, then the model understandably reports a distribution for delta that is not centered around 0. Of course, this is because the model is assigning a prior probability to p_A from 0-1 regardless of what data there is for p_B and never touching it. Now in the case of A/B testing, this is not very useful and even misleading. If you just input data for A, you want your resulting distribution to tell you "I don't know", not "B is better" just because A is below 0.5.

    There's a certain symmetry that needs to somehow be exploited, I just don't know how.. I tried changing the model to impose a prior symmetry between A and B, as p(p_A>p_B) ~ Beta(1.5,1.5) and then updating this as values from both sides were observed, but the model became basically useless, even when there was plentiful and balanced data. I'd love to save you from reading my shameful attempt at a model, but I guess I should include it :-/

    var A_better, tau, alpha_0,beta_0, Ai_better[N], p_0[N], loglift[N], p_A[N], p_B[N], n_A[N], n_B[N], c_A[N], c_B[N], delta[N];
    model {
        A_better ~ dbeta(1.5,1.5)
        tau ~ dgamma(3,1/1)
        for (i in 1:N) {
            Ai_better[i] ~ dbern(A_better)
            p_0[i] ~ dbeta(alpha_0,beta_0)
            loglift[i] ~ dnorm(0, tau)
            p_A[i] <- (exp(loglift[i] * (2*Ai_better[i]-1))*(p_0[i]/(1-p_0[i])))/(exp(loglift[i] * (2*Ai_better[i]-1))*(p_0[i]/(1-p_0[i]))+1)
            p_B[i] <- (exp(loglift[i] * (1-2*Ai_better[i]))*(p_0[i]/(1-p_0[i])))/(exp(loglift[i] * (1-2*Ai_better[i]))*(p_0[i]/(1-p_0[i]))+1)
            c_A[i] ~ dbin(p_A[i],n_A[i])
            c_B[i] ~ dbin(p_B[i],n_B[i])
            delta[i] <- p_A[i]-p_B[i] /* Variable of interest */
        }
        scale<-20
        alpha_0 ~ dgamma(2,1/scale)
        beta_0 ~ dgamma(2,1/scale)
    }
    
     
  • Martyn Plummer
    Martyn Plummer
    2013-10-10

    You want to give up on the beta-binomial and use a logistic model with a hierarchical prior. Something like this:

    model {
        for (i in 1:N) {
            c_A[i] ~ dbin(p_A[i],n_A[i])
            c_B[i] ~ dbin(p_B[i],n_B[i])
            logit(p_A[i]) <- mu + theta[i] - eps[i]
            logit(p_B[i]) <- mu + theta[i] + eps[i]
            theta[i] ~ dnorm(0, tau.theta)
            eps[i] ~ dnorm(0, tau.eps)
        }
        mu ~ dlogistic(0, 1)
    
        # Vague prior on the variance of theta[i]
        tau.theta <- 1/(sigma.theta*sigma.theta)
        sigma.theta ~ dnorm(0, 10)
    
        #Strength of prior on tau.eps determines how close you expect p_A[i] and
        #p_B[i] to be a priori.
    
        tau.eps ~ dgamma(5, 0.05)
    }
    
     
  • Fabio
    Fabio
    2013-10-10

    Thank you, I'll give that a try with my datasets and try to read up on this type of model to understand it better.

    Also, is there a way to donate to your project?