We now illustrate the concept of burn-in. We will run a chain with a high starting value, and another with a low starting value.

dk.mcmc.l <- mcmcf(0.01, 20, 1e4, .12, 180) dk.mcmc.h <- mcmcf(0.4, 20, 1e4, .12, 180) plot(dk.mcmc.l$d, xlim = c(1,200), ylim = c(0,0.4), ty = "l") lines(dk.mcmc.h$d, col="red") # We use the low chain to calculate the mean # and sd of d. We could have used the high chain # as well. mean.d <- mean(dk.mcmc.l$d) sd.d <- sd(dk.mcmc.l$d) abline(h = mean.d + 2 * c(-sd.d, sd.d), lty = 2)

The horizontal dashed lines indicate approximately the 95% CI. Notice how the chains move from either the high or low starting values towards the stationary phase (the area within the dashed lines). The area before the chain reaches stationarity is the burn-in.

The simplest way to check for convergence is to run the MCMC chain several times (at least twice) with over-dispersed starting values. Then one can compare the outputs of the various chains and check how similar they are. If the chains have converged to the target posterior distribution, then they will have similar (almost identical) histograms, posterior means, 95% credibility intervals, etc.

We now run the efficient and inefficient chains again, but with different starting values (i.e. `d.init`

and `k.init`

)

# Efficient chain (good proposal step sizes) dk.mcmc.b <- mcmcf(0.05, 5, 1e4, .12, 180) # Inefficient chain (bad proposal step sizes) dk.mcmc3.b <- mcmcf(0.05, 5, 1e4, 3, 5)

And we can now plot and compare the histograms:

par(mfrow=c(1,2)) bks <- seq(from=0, to=150, by=1) hist(dk.mcmc.b$k, prob=TRUE, breaks=bks, border=NA, col=rgb(0, 0, 1, .5), las=1, xlab="kappa", xlim=c(0,100), ylim=c(0,.055)) hist(dk.mcmc$k, prob=TRUE, breaks=bks, border=NA, col=rgb(.5, .5, .5, .5), add=TRUE) hist(dk.mcmc3.b$k, prob=TRUE, breaks=bks, border=NA, col=rgb(0, 0, 1, .5), las=1, xlab="kappa", xlim=c(0,100), ylim=c(0,.055)) hist(dk.mcmc3$k, prob=TRUE, breaks=bks, border=NA, col=rgb(.5, .5, .5, .5), add=TRUE)

Efficient chains converge quickly, and histograms overlap well. On the other hand, inefficient chains need to be run for a longer time to converge, here they were not run long enough, so that the histograms do not overlap as well. Your histograms may look slightly different due to the stochastic nature of the MCMC algorithm.

We can also calculate the posterior means:

# posterior means (similar for efficient chains): mean(dk.mcmc$d); mean(dk.mcmc.b$d) mean(dk.mcmc$k); mean(dk.mcmc.b$k) # posterior means (not so similar for the inefficient chains): mean(dk.mcmc3$d); mean(dk.mcmc3.b$d) mean(dk.mcmc3$k); mean(dk.mcmc3.b$k)

The variance of the estimated posterior mean is the sample variance of the MCMC sample (ignoring the auto-correlation) divided by the efficiency times 1/(sample size). The standard error of the estimated posterior mean is the square-root of the variance:

# efficient chain, standard error of the means sqrt(1/1e4 * var(dk.mcmc$d) / 0.23) # roughly 2.5e-4 sqrt(1/1e4 * var(dk.mcmc$k) / 0.20) # roughly 0.2 # inefficient chain, standard error of the means sqrt(1/1e4 * var(dk.mcmc3$d) / 0.015) # roughly 9.7e-4 sqrt(1/1e4 * var(dk.mcmc3$k) / 0.003) # roughly 1.6

Thus, for the efficient chain on , I got an estimated posterior mean of 0.105, which is within ±2 times 2.5e-4 of the true posterior mean. Note for the inefficient chains, the standard errors are about four times and eight times those of the efficient chains.

We can also plot densities (smoothed histograms) for the efficient and inefficient chains:

par(mfrow=c(1,2)); adj <- 1.5 # Efficient chains: plot(density(dk.mcmc.b$k, adj=adj), col="blue", las=1, xlim=c(0, 100), ylim=c(0, .05), xaxs="i", yaxs="i") lines(density(dk.mcmc$k, adj=adj), col="black") # Inefficient chains: plot(density(dk.mcmc3.b$k, adj=adj), col="blue", las=1, xlim=c(0, 100), ylim=c(0, .05), xaxs="i", yaxs="i") lines(density(dk.mcmc3$k, adj=adj), col="black")

The resulting R plots are shown below.

A useful measure of convergence is the potential scale reduction statistic of Gelman and Rubin (also known as Gelman and Rubin convergence diagnostic). The statistic compares the within-chain variance and the among-chain variance for the posterior mean of a parameter. When the value of the statistic is close to 1, convergence has been achieved. Note that the diagnostic may fail (i.e. give a misleading reassurance of convergence) for strongly multi-modal posteriors. To avoid this it is important to run more than two chains with over-dispersed starting values. The R package `coda`

for MCMC diagnostics, has a function to calculate the convergence diagnostic, `gelman.diag`

. If you have the package installed, you may want to go ahead and calculate the diagnostic for the chains above.

Chapter 7 in Yang (2014, Molecular Evolution: A Statistical Approach) provides a good introduction to Bayesian computation including Bayesian MCMC, proposal densities, efficiency, convergence diagnostics, etc.

We hope you enjoyed the tutorial. If you have questions, comments or if you spot any errors, please post a message below or sends us an email!

Fabricia, Mario and Ziheng.