Among-species comparisons can include phylogenetic information to account for non-independence arising from shared evolutionary history. Often, phylogenetic topologies and branch lengths are not known exactly, but are estimated with uncertainty. This uncertainty can be accounted for using methods recently described in a neat paper called Bayesian models for comparative analysis integrating phylogenetic uncertainty by de Villemereuil et al. Here, I’ll demonstrate the method by estimating the body mass of the Siberut macaque (Macaca siberu).

(Not a siberut macaque)

Building the phylogeny

First, I downloaded a nexus file with DNA sequences from A molecular phylogeny of living primates by Polina Perelman et al., available here on TreeBASE. I culled the nexus file to include only the 14 species in the genus Macaca, and saved it as macaques.nex.

I used MrBayes to estimate the macaque phylogeny with the following file (analysis.nex):

begin mrbayes;
   set autoclose=yes nowarn=yes;
   execute macaques.nex;
   lset nst=6 rates=invgamma;
   mcmc nruns=1 ngen=120000 samplefreq=120 file=macaques.nex1;
   mcmc file=macaques.nex2;

Calling MrBayes from within R:

system("mb 'analysis.nex' > log.txt &")

Here is the unrooted consensus tree:

Body mass data

Macaque weight data are available as part of a (much) larger dataset on the body mass of late quaternary mammals, by Smith and colleagues. I extracted the log body mass data for the 13 available macaque species. No body mass data were available for the Siberut macaque (M. siberu).

Body mass model

We will account for phylogenetic non-independence by considering average species weights to be multivariate normally distributed around a within-genus mean, with a covariance matrix $\Sigma$ that reflects phylogenetic distance (see de Villemereuil et al.). The off-diagonal elements of this matrix are scaled by Pagel’s $\lambda$, which reflects the degree of phylogenetic signal in the data.

With the help of the ape and MASS packages, covariance and precision matrices can be constructed for the trees comprising the posterior phylogeny.


# load the phylogenies
trees1 <-"macaques.nex1.t")
trees2 <-"macaques.nex2.t")
K <- length(trees1)*2       # number of trees
N <- 14                     # number of species

cov.mats <- array(dim=c(K, N, N))
inv.mat <- array(dim=c(K, N, N))
for (i in 1:(K/2)){
  cov.mats[i, , ] <- vcv(trees1[[i]], corr=T)
  inv.mat[i,,] <- ginv(cov.mats[i,,])
  cov.mats[(K/2+i), , ] <- vcv(trees2[[i]], corr=T)
  inv.mat[(K/2+i), , ] <- ginv(cov.mats[(K/2+i), , ])

Then we can fit the model with OpenBUGS, estimating the missing body mass of M. siberu, accounting for uncertainty in the phylogeny’s topology and branch lengths.

# load body mass data
d <- read.csv("macaques.csv")

# specify model
    alpha ~ dnorm(3, .01)
    for (i in 1:nspec){
      mu[i] <- alpha
    weight[1:nspec] ~ dmnorm(mu[], TAU[, ])
    sigma ~ dunif(0, 10)
    tau <- 1 / (sigma * sigma)
    lambda ~ dunif(0, 1)

    for (k in 1:ntree){
      p[k] <- 1/ntree
    K ~ dcat(p[])

    for (i in 1:nspec){
      for (j in 1:nspec){
        LAMBDA[i, j] <- 1 + (lambda - 1) * (1 - equals(i, j))
        TAU[i, j] <- tau * LAMBDA[i, j] * inv.mat[K, i, j]
bugsd <- list(weight = d$lmass, nspec=nrow(d),
           ntree = K, inv.mat=inv.mat)

# run model
out <- bugs(bugsd, inits = NULL,
   = c("sigma", "alpha", "K", "weight",
                                   "TAU", "lambda"),
            n.iter=3000, model.file="phylomod.txt", DIC=F, n.thin=5,
coda.out <- as.mcmc.bugs(out)
ggd <- ggs(coda.out)

ggs_density(ggd, "weight") + theme_bw() +
  ggtitle(expression(paste("Predicted body size of ", italic("Macaca siberu")))) +
  theme(axis.title.y = element_blank()) +
  xlab("Log(body mass in kg)")

Here is our phylogenetically informed estimate of the body size for M. siberu. Pagel’s $\lambda$ indicates a weak but non-zero phylogenetic signal with mean = 0.417 and 95% BCI = (0.0156, 0.949). It should go without saying this is a toy example, and it may be better to go out and weigh some actual Siberut macaques (at a minimum, this would be a good excuse for a vacation).

References & further reading

  • Blomberg et al.): Independent contrasts and PGLS regression estimators are equivalent. Systematic Biology 2012.

  • Pagel: Inferring the historical patterns of biological evolution. Nature 1999.

  • Perelman et al.: A molecular phylogeny of living primates. PLoS Genetics 2011.

  • Smith et al.: Body mass of late quaternary mammals. Ecology 2003.

  • de Villemereuil et al.: Bayesian models for comparative analysis integrating phylogenetic uncertainty. BMC Evolutionary Biology 2012