# Part 3: Efficiency of the MCMC chain

Back to Part 2

Values sampled in an MCMC chain are autocorrelated because new states are either the previous state or a modification of it. The efficiency of an MCMC chain is closely related to the autocorrelation. Intuitively, if the autocorrelation is high, the chain will be inefficient, i.e. we will need to run the chain for a long time to obtain a good approximation to the posterior distribution.

The efficiency of a chain is defined as

$\mathrm{eff} = 1 / (1 + 2(r_1 + r_2 + r_3 + ...))$,

where $r_i$ is the correlation coefficient for lag $i$. We will now calculate the efficiency of the MCMC. To calculate the efficiency it is appropriate to obtain a long sample (say one million). R’s autocorrelation function, acf, can be used to calculate the correlation coefficients and plot them in the so-called autocorrelation plot.

# run a very long chain (1e6 generations take about
# 40s in my MacBook Air) to calculate efficiency
dk.mcmc2 <- mcmcf(0.2, 20, 1e6, .12, 180)

# R's acf function (for AutoCorrelation Function)
par(mfrow=c(1,2))
acf(dk.mcmc2$d) acf(dk.mcmc2$k)

# Define efficiency function
eff <- function(acf) 1 / (1 + 2 * sum(acf$acf[-1])) # the efficiencies are roughly 22% for d and # 20% for k: # [1] 0.2255753 # mcmcf(0.2, 20, 1e7, .12, 180) eff(acf(dk.mcmc2$d))
# [1] 0.2015054 # mcmcf(0.2, 20, 1e7, .12, 180)
eff(acf(dk.mcmc2$k)) The chain has an efficiency of roughly 20%-22% for $k$ and $d$. This means that a sample of size N from the MCMC is as good as an independent sample of size $N \times 0.2$ (in terms of the variance of the estimate). The plots for the autocorrelation are shown below: Notice that the autocorrelation drops to zero after about 10-15 iterations for both $d$ and $k$. That is, a sample observation obtained at iteration $i$, is roughly uncorrelated to an observation obtained at iteration $i + 15$ (i.e. for a lag of 15). Also notice that the autocorrelation is very high for small lags. For example, for a lag of 2, the autocorrelation is roughly 60%. This is because neighbouring sates in the chain are highly similar. It is customary to ‘thin’ the chain, that is, save states only every other iteration or so. This is done to remove the highly correlated neighbouring states, which can be done without loosing too much information, and thus save hard drive space in the computer. However the estimates with the smallest variance (error) are obtained when all of the samples in the chain are used. A concept closely related to efficiency is the effective-sample size (ESS) of the chain. It is defined as $\mathrm{ESS} = N * \mathrm{eff}$, where $N$ is the total number of samples taken from the chain. Thus for the chain above, the ESS’s are 1e6 * 0.23 # ~230K 1e6 * 0.20 # ~200K or about 200,000s. This means that an independent sample of size 200K has about the same variance (error) for the parameter estimates (in this case the mean of $d$ and $k$) than our autocorrelated MCMC sample of size 1 million. An MCMC chain with efficiency around 20% is a very efficient chain. We will see examples of inefficient chains next. ## Inefficient chains: Inefficient chains are those that do not ‘mix’ well. That is, parameter states are highly autocorrelated even when they are many iterations apart, and thus efficiency and ESS tend to be low. Inefficient chains usually occur due to poor proposal densities, or complex posteriors with highly correlated posteriors in over-parameterised models. To illustrate inefficient chains, we will run our MCMC again by using a proposal density with a too large step size for $d$, and another with a too small step size for $k$. # The window width for the d proposal density is too large, # while it is too small for k dk.mcmc3 <- mcmcf(0.2, 20, 1e4, 3, 5) par(mfrow=c(1,2)) # because proposal width for d is too large, # chain gets stuck at same values of d: plot(dk.mcmc3$d, ty='l', main="Trace of d", cex.main=2.0,
cex.lab=1.5, cex.axis=1.5, ylab="d")

# whereas proposal width for k is too small,
# so chain moves slowly:
plot(dk.mcmc3$k, ty='l', main="Trace of k", cex.main=2.0, cex.lab=1.5, cex.axis=1.5, ylab="k") The trace plots for the inefficient chain are shown below. In the case of $d$, the proposal density is too wide, newly proposed values of $d$ are so far from the current state that they may fall outside the area of high probability density in the posterior, and thus most proposals are rejected. This is evident in the trace plot, which has many horizontal sections, where the chain is stuck in the same value of $d$ for many iterations. In the case of $k$ the proposal density is too narrow, and so the chain moves slowly by making baby-sized steps. Inefficient chains are characterised by acceptance proportions that are either too low or too high. For $d$ the acceptance proportion (check your R output) was about ~3% (i.e. over 97% of proposals were rejected), while for $k$ the acceptance proportion was over 90% (i.e. the vast majority of proposals were accepted). As a rule of thumb, for bell-shaped posterior densities, one should aim for acceptance proportions close to 30%, which lead to optimal mixing, large efficiency and large ESS. Try tweaking the values of w.d and w.k and running the MCMC again until you achieve acceptance proportions close to 30% for both parameters. Then check the trace plots. Finding the optimal step sizes is known as ‘fine-tuning’ the chain. Many MCMC programs use automatic fine-tuning algorithms and the user does not normally need to worry about this. We will now calculate eff for the inefficient chains. Because the chain has so much autocorrelation, it is best to do a very long run (using 1e7 iterations is best, but this takes over 10min in my computer). dk.mcmc4 <- mcmcf(0.2, 20, 1e6, 3, 5) # Efficiencies are roughly 1.5% for d, and 0.35% for k: # [1] 0.01530385 # mcmcf(0.2, 20, 1e7, 3, 5) eff(acf(dk.mcmc4$d, lag.max=2e3))

# [1] 0.003493112 # mcmcf(0.2, 20, 1e7, 3, 5)
eff(acf(dk.mcmc4$k, lag.max=2e3)) The efficiencies are very slow, about 1.5% and 0.3% for $d$ and $k$ respectively. This means ESS’s around 15K and 3K for a chain of size 1 million. This is very inefficient. The autocorrelation plots are shown below. Notice that the correlations only drop to zero for about 200 iterations in the case of $d$ and over 1,000 iterations in the case of $k$. Finally, we plot the traces for efficient (part 2) and inefficient chains. par(mfrow=c(2,2)) plot(dk.mcmc$d, ty='l', las=1, ylim=c(.05,.2),
main="Trace of d, efficient chain", xlab='',
ylab="Distance, d", cex.main=2.0, cex.lab=1.5)
plot(dk.mcmc3$d, ty='l', las=1, ylim=c(.05,.2), main="Trace of d, inefficient chain", xlab='', ylab='', cex.main=2.0, cex.lab=1.5) plot(dk.mcmc$k, ty='l', las=1, ylim=c(0,100),
main="Trace of k, efficient chain",
xlab='', ylab="ts/tv ratio, k",
cex.main=2.0, cex.lab=1.5)
plot(dk.mcmc3\$k, ty='l', las=1, ylim=c(0,100),
main="Trace of k, inefficient chain",
xlab='', ylab='', cex.main=2.0, cex.lab=1.5)

In this part we explored efficiency and effective-sample size, and their relation to the acceptance proportion. These concepts are important when we assess whether the chain is mixing well or not, if we know that the chain has converged to the stationary distribution and is sampling from the high-probability region of the posterior. In the next section, we look at diagnostics of MCMC or how to tell that the chain has converged to the posterior.

Part 4: Checking for convergence