Ecology in silico

Visualizing Bivariate Shrinkage

Inspired by this post about visualizing shrinkage on Coppelia, and this thread about visualizing mixed models on Stack Exchange, I started thinking about how to visualize shrinkage in more than one dimension. One might find themselves in this situation with a varying slope, varying intercept hierarichical (mixed effects) model, a model with two varying intercepts, etc. Then I found a great bivariate shrinkage plot in Doug Bates’ lme4 book, Figure 3.8.

Here is a snippet to reproduce a similar bivariate shrinkage plot in ggplot2, adding a color coded probability density surface and contours for the estimated multivariate normal distribution of random effects, using the same sleep study data that Bates used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
## 2d shrikage for random slope, random intercept model
library(lme4)
library(ggplot2)
library(grid)
library(mvtnorm)
library(reshape2)
library(ellipse)

# fit models
m0 <- lm(Reaction ~ Days * Subject, data=sleepstudy)
m1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

# extract fixed effect estimates
intercepts <- grepl("Days", names(coef(m0))) == FALSE
m0_intercepts <- coef(m0)[intercepts]
m0_intercepts[-1] <- m0_intercepts[-1] + m0_intercepts[1]
slopes <- !intercepts
m0_slopes <- coef(m0)[slopes]
m0_slopes[-1] <- m0_slopes[-1] + m0_slopes[1]

# extract random effect estimates
m1_intercepts <- coef(m1)$Subject[, 1]
m1_slopes <- coef(m1)$Subject[, 2]

d <- data.frame(interceptF = m0_intercepts,
                slopeF = m0_slopes,
                interceptR = m1_intercepts,
                slopeR = m1_slopes
                )

# 95% bivariate normal ellipse for random effects
df_ell <- data.frame(ellipse(VarCorr(m1)$Subject, centre=fixef(m1)))
names(df_ell) <- c("intercept", "slope")

# bivariate normal density surface
lo <- 200 # length.out of x and y grid
xvals <- seq(from = min(df_ell$intercept) - .1 * abs(min(df_ell$intercept)),
             to = max(df_ell$intercept) + .05 * abs(max(df_ell$intercept)),
             length.out = lo)
yvals <- seq(from = min(df_ell$slope) - .4 * abs(min(df_ell$slope)),
             to = max(df_ell$slope) + .1 * abs(max(df_ell$slope)),
             length.out = lo)

z <- matrix(0, lo, lo)
for (i in 1:lo) {
  x = xvals
  y = yvals[i]
  z[,i] = dmvnorm(cbind(x,y),
                  mean = fixef(m1),
                  sigma = VarCorr(m1)$Subject)
}

mv_ranef <- melt(z)
names(mv_ranef) <- c("x", "y", "z")
mv_ranef$x <- xvals[mv_ranef$x]
mv_ranef$y <- yvals[mv_ranef$y]


p <- ggplot(d) +
  geom_raster(aes(x=x, y=y, fill=z), data=mv_ranef) +
  scale_fill_gradient(low="white", high="red") +
  guides(fill=FALSE) +
  geom_path(data=df_ell, aes(x=intercept, y=slope), size=.5) +
  geom_contour(aes(x=x, y=y, z=z), data=mv_ranef, size=.1, color="black") +
  geom_segment(aes(x=interceptF, y=slopeF,
                   xend=interceptR, yend=slopeR),
               arrow = arrow(length = unit(0.3, "cm")),
               alpha=.7) +
  xlab("Estimated intercepts") +
  ylab("Estimated slopes") +
  theme(legend.position="none") +
  ggtitle("Bivariate shrinkage plot") +
  theme_bw()  +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
p

Notes on Shrinkage & Prediction in Hierarchical Models

Ecologists increasingly use mixed effects models, where some intercepts or slopes are fixed, and others are random (or varying). Often, confusion exists around whether and when to use fixed vs. random intercepts/slopes, which is understandable given their multiple definitions.

In an attempt to help clarify the utility of varying intercept models (and more generally, hierarchical modeling), specifically in terms of shrinkage and prediction, here is a GitHub repo with materials and a slideshow from our department’s graduate QDT (quantitative (th)ink tank) group.

For fun, I’ve included a simple example demonstrating the value of shrinkage when trying to rank NBA players by their free throw shooting ability, a situation with wildly varying amounts of information (free throw attempts) on each player. The example admittedly is not ecological, and sensitive readers may replace free throw attempts with prey capture attempts for topical consistency. Many if not most ecological datasets suffer from similar issues, with varying amounts of information from different sites, species, individuals, etc., so even without considering predation dynamics of NBA players, the example’s relevance should be immediate.

Spoiler alert: Mark Price absolutely dominated at the free throw line in the early nineties.

Dynamic Occupancy Models in Stan

Occupancy modeling is possible in Stan as shown here, despite the lack of support for integer parameters. In many Bayesian applications of occupancy modeling, the true occupancy states (0 or 1) are directly modeled, but this can be avoided by marginalizing out the true occupancy state. The Stan manual (pg. 96) gives an example of this kind of marginalization for a discrete change-point model.

Here’s a Stan implementation of a dynamic (multi-year) occupancy model of the sort described by MacKenzie et al. (2003).

First, the model statement:

dyn_occ.stan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
data {
   int<lower=0> nsite;
   int<lower=0> nyear;
   int<lower=0> nrep;
   int<lower=0,upper=1> Y[nsite,nyear,nrep];
}
parameters {
   real<lower=0,upper=1> p;
   real<lower=0,upper=1> gamma;
   real<lower=0,upper=1> phi;
   real<lower=0, upper=1> psi1;
}
transformed parameters {
   matrix[nsite, nyear] psi;
   for (r in 1:nsite){
     for (t in 1:nyear){
       if (t < 2){
          psi[r, t] <- psi1;
       } else {
          psi[r, t] <- psi[r, t-1] * phi + (1 - psi[r, t-1]) * gamma;
       }
     }
   }
}
model {
   // priors
  psi1 ~ uniform(0,1);
  gamma ~ uniform(0,1);
  phi ~ uniform(0,1);
  p ~ uniform(0,1);

   // likelihood
  for (r in 1:nsite){
    for (t in 1:nyear){
      if (sum(Y[r, t]) > 0){
        increment_log_prob(log(psi[r, t]) + bernoulli_log(Y[r, t], p));
      } else {
        increment_log_prob(log_sum_exp(log(psi[r, t]) + bernoulli_log(Y[r, t],p),
                                      log(1-psi[r, t])));
      }
    }
  }
}

This model can be made faster by storing values for log(psi) and log(1 - psi), as done in Bob Carpenter’s single season example.

Fitting the model (in parallel):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
library(rstan)
library(parallel)

simulate_data <- function(){
  nsite <- 100;
  nrep <- 2;     # repeat surveys
  nyear <- 10;  # nyears
  p <- 0.8;
  gamma <- .2
  phi <- .8
  psi1 <- .8
  psi <- array(dim=c(nsite, nyear))
  psi[, 1] <- psi1
  for (t in 2:nyear){
    psi[, t] <- psi[, t - 1] * phi + (1 - psi[, t-1]) * gamma
  }

  Z <- array(dim=c(nsite, nyear))
  for (t in 1:nyear){
    Z[, t] <- rbinom(nsite, 1, psi[, t])
  }

  Y <- array(dim=c(nsite, nyear, nrep))
  for (r in 1:nsite){
    for (t in 1:nyear){
      Y[r, t, ] <- rbinom(nrep, 1, Z[r, t] * p)
    }
  }
  return(list(nsite=nsite, nrep=nrep, nyear=nyear,
              p=p, gamma=gamma, phi=phi,
              psi1=psi1, Y=Y))
}

## parallel run
d <- simulate_data()
# initialize model
mod <- stan("dyn_occ.stan", data = d, chains = 0)

estimate_params <- function(model, data, iter=2000){
  sflist <- mclapply(1:4, mc.cores = 4,
             function(i) stan(fit = model, data = data,
                              chains = 1, chain_id = i,
                              iter=iter,
                              refresh = -1))
  fit <- sflist2stanfit(sflist)
  return(fit)
}

fit <- estimate_params(model=mod, data=d)

Does it work? Let’s whip up 1000 simulated datasets and their corresponding estimates for colonization and extinction rates.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
one_run <- function(initialized_mod, params){
  require(modeest)
  require(reshape2)
  source("HDI.R")
  d <- simulate_data()
  fit <- estimate_params(model=initialized_mod, data=d)
  # store HDI for certain params
  post <- extract(fit, params)
  vals <- array(dim=c(length(params), 3))
  rownames(vals) <- params
  colnames(vals) <- c("LCI", "mode", "UCI")
  mode <- rep(NA, length(params))
  lci <- rep(NA, length(params))
  uci <- rep(NA, length(params))
  for (i in 1:length(params)){
    # calculate posterior mode
    mode[i] <- mlv(post[[i]], method="mfv")$M
    # calculate HDI
    interv <- HDI(post[[i]])
    lci[i] <- interv[1]
    uci[i] <- interv[2]
  }
  vals <- data.frame(parameter = params, mode=mode,
                     lci=lci, uci=uci)
  return(vals)
}

iterations <- 1000
check <- one_run(mod, c("gamma", "phi"))
check$iteration <- rep(1, nrow(check))
for (i in 2:iterations){
  new_d <- one_run(mod, c("gamma", "phi"))
  new_d$iteration <- rep(i, nrow(new_d))
  check <- rbind(check, new_d)
}

true.vals <- data.frame(parameter = c("gamma", "phi"),
                        value = c(d$gamma, d$phi))
post.vals <- data.frame(parameter = c("gamma", "phi"),
                        value = c(mean(check$mode[check$parameter == "gamma"]),
                                  mean(check$mode[check$parameter == "phi"])))

ggplot(check) +
  geom_point(aes(x=mode, y=iteration)) +
  facet_wrap(~parameter) +
  geom_segment(aes(x=lci, xend=uci,
                   y=iteration, yend=iteration)) +
  theme_bw() +
  geom_vline(aes(xintercept=value), true.vals,
             color="blue", linetype="dashed") +
  geom_vline(aes(xintercept=value), post.vals,
             color="red", linetype="dashed") +
  xlab("value")

Here are the results for the probability of colonization $\gamma$, and the probability of persistence $\phi$. The blue dashed line shows the true value, and the dashed red lines shows the mean of all 1000 posterior modes. The black lines represent the HPDI for each interation, and the black points represent the posterior modes. This example uses a uniform prior on both of these parameters - probably an overrepresentation of prior ignorance in most real systems.

Based on some initial exploration, this approach seems much much (much?) faster than explicitly modeling the latent occurrence states in JAGS, with better chain mixing and considerably less autocorrelation. Extension to multi-species models should be straightforward too.

References

MacKenzie DI, Nichols JD, Hines JE, Knutson MG, Franklin AB. 2003. Estimating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology 84(8): 2200-2207. pdf