Part 2: Markov chain Monte Carlo (MCMC)

Back to Part 1

We now aim to obtain the posterior distribution of d and k by MCMC sampling. The draft MCMC algorithm is:

  1. Set initial states for d and k.
  2. Propose a new state d^* (from an appropriate proposal density).
  3. Accept the proposal with probability
    \alpha=\min\Big(1, \frac{p(d^*)p(k)p(D|d^*)}{p(d)p(k)p(D|d)}\Big).
    If the proposal is accepted set d=d^*, otherwise d=d.
  4. Repeat 2-3 for k.
  5. Save current values of d and k.
  6. Go to step 2.

To avoid numerical problems (i.e. underflows or overflows) it is best to work with the logarithm of the unnormalised posterior. This is defined as the sum of the log-prior and the log-likelihood. The R code for the log-unnormalised posterior is

# function returning the logarithm of the unscaled posterior:
# f(d) * f(k) * f(D|d,k)
# by default we set the priors as: 
# f(d) = Gamma(d | 2, 20) and f(k) = Gamma(k | 2, .1)
ulnPf <- function(d, k, n=948, ns=84, nv=6, a.d=2, b.d=20, 
        a.k=2, b.k=.1) {
  # The normalizing constant in the prior densities can 
  # be ignored:
  lnpriord <- (a.d - 1)*log(d) - b.d * d
  lnpriork <- (a.k - 1)*log(k) - b.k * k
  
  # log-Likelihood (K80 model):
  expd1 <- exp(-4*d/(k+2));
  expd2 <- exp(-2*d*(k+1)/(k+2))
  p0 <- .25 + .25 * expd1 + .5 * expd2
  p1 <- .25 + .25 * expd1 - .5 * expd2
  p2 <- .25 - .25 * expd1
  lnL <- ((n - ns - nv) * log(p0/4) + 
         ns * log(p1/4) + nv * log(p2/4))
  
  # Return unnormalised posterior:
  return (lnpriord + lnpriork + lnL)
}

Note that the joint prior on d and k are two independent gamma densities. Recall the gamma density is

\mathrm{Gamma}(x|a,b) = \frac{1}{\Gamma(a)}x^{a-1}e^{-bx},

where \Gamma is the complete gamma function. \Gamma(a) is a constant factor (i.e. does not depend on d or k), and so it will cancel out in the calculation of the posterior ratio in step 3 of the MCMC, and thus can be ignored in the calculation of the log-prior in ulnPf. Ignoring calculation of the constant speeds up the code. The calculation of the log-likelihood in ulnPf here involves two calls to the exp function and is slightly faster than the code in Part 1, which has two exp calls (For the K80 log-likelihood equation please see Part 1).

We will now write the R code implementing the MCMC algorithm above. We will use d and k to be the current values (d and k) and dprop and kprop to be the proposed values (d^* and k^*), with corresponding logarithms of the unnormalized posterior to be ulnP and ulnPprop. Note that if lnalpha > 0, we know \alpha = 1 so that the proposal is accepted. The code calculates the probability of acceptance \alpha only when lnalpha < 0. The corresponding R function for the MCMC algorithm is:

mcmcf <- function(init.d, init.k, N, w.d, w.k) {
  # init.d and init.k are the initial states
  # N is the number of MCMC iterations.
  # w.d is the width of the sliding-window proposal for d.
  # w.k is the width of the sliding-window proposal for k.
  
  # We keep the visited states (d, k) in sample.d and sample.k 
  # for easy plotting. In practical MCMC applications, these 
  # are usually written into a file.
  sample.d <- sample.k <- numeric(N+1)

  # STEP 1: Initialise the MCMC chain
  d <- init.d;  sample.d[1] <- init.d
  k <- init.k;  sample.k[1] <- init.k
  ulnP <- ulnPf(d, k)
  acc.d <- 0;  acc.k <- 0 # number of acceptances
  
  for (i in 1:N) {
    # STEP 2: Propose new state d* 
    # we use a uniform sliding window of width w with reflection
    # to propose new values d* and k* 
    # propose d* and accept/reject the proposal
    dprop <- d + runif(1, -w.d/2, w.d/2)
    # reflect if dprop is negative
    if (dprop < 0) dprop <- -dprop
    
    ulnPprop <- ulnPf(dprop, k)
    lnalpha <- ulnPprop - ulnP

    # STEP 3: Accept or reject the proposal
    # if ru < alpha accept proposed d*:
    if (lnalpha > 0 || runif(1) < exp(lnalpha)){
      d <- dprop;  ulnP <- ulnPprop; 
      acc.d  <- acc.d + 1
    }
    # else reject and stay where we are (so that nothing needs 
    # to be done).
    
    # STEP 4: Repeat steps 2-3 for k
    # propose k* and accept/reject the proposal
    kprop <- k + runif(1, -w.k/2, w.k/2)
    # reflect if kprop is negative
    if (kprop < 0) kprop <- -kprop
    
    ulnPprop <- ulnPf(d, kprop)
    lnalpha <- ulnPprop - ulnP
    # if ru < alpha accept proposed k*:
    if (lnalpha > 0 || runif(1) < exp(lnalpha)){
      k <- kprop;  ulnP <- ulnPprop
      acc.k  <- acc.k + 1
    }
    # else reject and stay where we are.
    
    # STEP 5: Save chain state
    sample.d[i+1] <- d;  sample.k[i+1] <- k
  }
  
  # print out the proportion of times
  # the proposals were accepted
  print("Acceptance proportions (d, k):")
  print(c(acc.d/N, acc.k/N))
  
  # return vector of d and k visited during MCMC
  
  return (list(d=sample.d, k=sample.k))
}

Note that as we use ulnP to store the log posterior for the current state, we make one call to the ulnPf function (to calculate the log unnormalized posterior for the newly proposed state) in each of the two proposals. This reduces the amount of computation by a half compared with the alternative strategy of calculating the log unnormalized posterior for both the current and proposed states making two calls to ulnPf. Note also that in the syntax “if(a || b)” on lines 32 and 48, the expression “a || b” evaluates to TRUE as soon as a is TRUE in which case b is not evaluated, saving one call to the exp function. Note that the algorithm is stochastic, so repetitions of it will give different results. Function system.time in R is useful to measure the amount of time an R expression takes to be evaluated. For example, running the MCMC algorithm above with 10,000 iterations takes about 0.3 seconds in a 2.2 GHz MacBook Air.

# Test run-time:
system.time(mcmcf(0.2, 20, 1e4, .12, 180)) # about 0.3s
# Run again and save MCMC output:
dk.mcmc <- mcmcf(0.2, 20, 1e4, .12, 180)

We now plot the ‘traces’ of the parameters. A trace plot is a plot of parameter state vs. MCMC iteration. Trace plots allow us to easily visualise the MCMC chain. They can also show us if there are convergence or mixing problems with the chain (we’ll see this in Part 3):

par(mfrow=c(1,3))

# trace plot of d
plot(dk.mcmc$d, ty='l', xlab="Iteration", ylab="d", 
     main="Trace of d")

# trace plot of k
plot(dk.mcmc$k, ty='l', xlab="Iteration", ylab="k", 
     main="Trace of k")

# We can also plot the joint sample of d and k
# (points sampled from posterior surface)
plot(dk.mcmc$d, dk.mcmc$k, pch='.', xlab="d", ylab="k", 
     main="Joint of d and k")

Trace plots for distance, d and kappa, k, and the plot for joint sampled from posterior surface.

Figure2

The trace plots show that the chain has mixed well (i.e., it has explored the parameter space well). The joint sample of d and k matches the plot of the posterior surface from Part 1. In Part 3 we will see examples of chains that have poor mixing. Chains with poor mixing may be the product of poorly chosen proposal densities of bad proposal step sizes. For example, trying running the MCMC chain again after modifying the step sizes (the widths) of the proposal densities, w.d and w.k.

Part 3: Efficiency of the MCMC chain

3 thoughts on “Part 2: Markov chain Monte Carlo (MCMC)

  1. Pingback: Part 1: Introduction | The DNA in Us

  2. Pingback: Part 3: Efficiency of the MCMC chain | The DNA in Us

  3. Pingback: Tutorial: Bayesian MCMC phylogenetics using R | The DNA in Us

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s