Ecology in silico

Why I Think Twice Before Editing Plots in Powerpoint, Illustrator, Inkscape, Etc.

Thanks to a nice post by Meghan Duffy on the Dynamic Ecology blog (How do you make figures?), we have some empirical evidence that many figures made in R by ecologists are secondarily edited in other programs including MS Powerpoint, Adobe Illustrator, Inkscape, and Photoshop. This may not be advisable* for two reasons: reproducibility and bonus learning.

Reproducibility

R is nice because results are relatively easy to reproduce. It’s free, and your code serves as a written record of what was done. When figures are edited outside of R, they can be much more difficult to reproduce. Independent of whether I am striving to maximize the reproducibility of my work for others, it behooves me to save time for my future self, ensuring that we (I?) can quickly update my own figures throughout the process of paper writing, submission, rewriting, resubmission, and so on.

I had to learn this the hard way. The following figure was my issue: initially I created a rough version in R, edited it in Inkscape (~30 minutes invested), and ended up with a “final” version for submission.

Turns out that I had to remake the figure three times throughout the revision process (for the better). Eventually I realized that it was far more efficient to make the plot in R than to process it outside of R.

In retrospect, two things are clear:

  1. My energy allocation strategy was not conducive to the revision process. I wasted time trying to make my “final” version look good in Inkscape, when I could have invested time to figure out how to make the figure as I wanted it in R. The payoff from this time investment will be a function of how much manipulation is done outside R, how hard it is to get the desired result in R, and how many times a figure will be re-made.
  2. I probably could have found a better way to display the data. Another post perhaps.

Bonus learning

Forcing myself to remake the figure exactly as I wanted it using only R had an unintended side effect: I learned more about base graphics in R. Now, when faced with similar situations, I can make similar plots much faster, because I know more graphical parameters and plotting functions. In contrast, point-and-click programs are inherently slow because I’m manually manipulating elements, usually with a mouse, and my mouse isn’t getting any faster.

*different strokes for different folks, of course

(Figure from Does life history mediate changing disease risk when communities disassemble?)

Visualizing Bivariate Shrinkage

Inspired by this post about visualizing shrinkage on Coppelia, and this thread about visualizing mixed models on Stack Exchange, I started thinking about how to visualize shrinkage in more than one dimension. One might find themselves in this situation with a varying slope, varying intercept hierarichical (mixed effects) model, a model with two varying intercepts, etc. Then I found a great bivariate shrinkage plot in Doug Bates’ lme4 book, Figure 3.8.

Here is a snippet to reproduce a similar bivariate shrinkage plot in ggplot2, adding a color coded probability density surface and contours for the estimated multivariate normal distribution of random effects, using the same sleep study data that Bates used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
## 2d shrikage for random slope, random intercept model
library(lme4)
library(ggplot2)
library(grid)
library(mvtnorm)
library(reshape2)
library(ellipse)

# fit models
m0 <- lm(Reaction ~ Days * Subject, data=sleepstudy)
m1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

# extract fixed effect estimates
intercepts <- grepl("Days", names(coef(m0))) == FALSE
m0_intercepts <- coef(m0)[intercepts]
m0_intercepts[-1] <- m0_intercepts[-1] + m0_intercepts[1]
slopes <- !intercepts
m0_slopes <- coef(m0)[slopes]
m0_slopes[-1] <- m0_slopes[-1] + m0_slopes[1]

# extract random effect estimates
m1_intercepts <- coef(m1)$Subject[, 1]
m1_slopes <- coef(m1)$Subject[, 2]

d <- data.frame(interceptF = m0_intercepts,
                slopeF = m0_slopes,
                interceptR = m1_intercepts,
                slopeR = m1_slopes
                )

# 95% bivariate normal ellipse for random effects
df_ell <- data.frame(ellipse(VarCorr(m1)$Subject, centre=fixef(m1)))
names(df_ell) <- c("intercept", "slope")

# bivariate normal density surface
lo <- 200 # length.out of x and y grid
xvals <- seq(from = min(df_ell$intercept) - .1 * abs(min(df_ell$intercept)),
             to = max(df_ell$intercept) + .05 * abs(max(df_ell$intercept)),
             length.out = lo)
yvals <- seq(from = min(df_ell$slope) - .4 * abs(min(df_ell$slope)),
             to = max(df_ell$slope) + .1 * abs(max(df_ell$slope)),
             length.out = lo)

z <- matrix(0, lo, lo)
for (i in 1:lo) {
  x = xvals
  y = yvals[i]
  z[,i] = dmvnorm(cbind(x,y),
                  mean = fixef(m1),
                  sigma = VarCorr(m1)$Subject)
}

mv_ranef <- melt(z)
names(mv_ranef) <- c("x", "y", "z")
mv_ranef$x <- xvals[mv_ranef$x]
mv_ranef$y <- yvals[mv_ranef$y]


p <- ggplot(d) +
  geom_raster(aes(x=x, y=y, fill=z), data=mv_ranef) +
  scale_fill_gradient(low="white", high="red") +
  guides(fill=FALSE) +
  geom_path(data=df_ell, aes(x=intercept, y=slope), size=.5) +
  geom_contour(aes(x=x, y=y, z=z), data=mv_ranef, size=.1, color="black") +
  geom_segment(aes(x=interceptF, y=slopeF,
                   xend=interceptR, yend=slopeR),
               arrow = arrow(length = unit(0.3, "cm")),
               alpha=.7) +
  xlab("Estimated intercepts") +
  ylab("Estimated slopes") +
  theme(legend.position="none") +
  ggtitle("Bivariate shrinkage plot") +
  theme_bw()  +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
p

Notes on Shrinkage & Prediction in Hierarchical Models

Ecologists increasingly use mixed effects models, where some intercepts or slopes are fixed, and others are random (or varying). Often, confusion exists around whether and when to use fixed vs. random intercepts/slopes, which is understandable given their multiple definitions.

In an attempt to help clarify the utility of varying intercept models (and more generally, hierarchical modeling), specifically in terms of shrinkage and prediction, here is a GitHub repo with materials and a slideshow from our department’s graduate QDT (quantitative (th)ink tank) group.

For fun, I’ve included a simple example demonstrating the value of shrinkage when trying to rank NBA players by their free throw shooting ability, a situation with wildly varying amounts of information (free throw attempts) on each player. The example admittedly is not ecological, and sensitive readers may replace free throw attempts with prey capture attempts for topical consistency. Many if not most ecological datasets suffer from similar issues, with varying amounts of information from different sites, species, individuals, etc., so even without considering predation dynamics of NBA players, the example’s relevance should be immediate.

Spoiler alert: Mark Price absolutely dominated at the free throw line in the early nineties.