10  Basic workflow with brms

The brms package provides a flexible framework for specifying multilevel regression models, using Stan as the back end. It is typically used for models within the generalized linear mixed model (GLMM) specification, but can accommodate nonlinear models such as Emax. This chapter uses the brms package to develop and evaluate Bayesian Emax regression models. Models for continuous and binary response data are discussed, and in the next chapter these are extended to discuss covariate modeling.

Show the code
library(tidyverse)
library(brms)
library(posterior)
library(tidybayes)
library(here)

theme_set(theme_bw(base_size = 12))
Show the code
if (require(cmdstanr)) {
  # prefer cmdstanr and cache binaries
  options(
    brms.backend = "cmdstanr", 
    cmdstanr_write_stan_file_dir = here("_brms-cache")
  )
  dir.create(here("_brms-cache"), FALSE)
} else {
  rstan::rstan_options(auto_write = TRUE)
}
Loading required package: cmdstanr
This is cmdstanr version 0.9.0
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /home/danielle/.cmdstan/cmdstan-2.36.0
- CmdStan version: 2.36.0

10.1 Hyperbolic Emax models

This section shows how to build a standard Emax model for continuous response data using brms. To build the model a simulated data set is used:

Show the code
d_example_emax_3cov <- read_csv(here("data", "d_example_emax_3cov.csv"))
d_example_emax_3cov
# A tibble: 300 × 6
    dose exposure cov_a cov_b cov_c response
              
 1   100    4151.  5.71  2.33  7.83     14.2
 2   100    8067.  4.92  4.66  6.74     14.0
 3   100    4878.  4.88  4.21  4.68     13.3
 4   100    9713.  8.42  6.56  1.29     16.4
 5   100   11491.  4.37  3.96  3.55     15.1
 6   100    2452.  8.69  7.60  3.64     13.6
 7   100    5652.  6.61  3.95  5.13     14.1
 8   100    9939.  5.35  7.77  8.29     15.0
 9   100    5817.  5.61  2.24  9.60     13.3
10   100    5176.  6.06  1.79  8.74     14.1
# ℹ 290 more rows

In this chapter only the exposure and response columns are used. A simple exploratory visualization of the exposure-response relationship is shown below:

Show the code
d_example_emax_3cov |> 
  ggplot(aes(exposure, response)) + 
  geom_point() + 
  geom_smooth(formula = y ~ x, method = "loess")

The model considered in this section is a hyperbolic Emax model, in which the Hill coefficient is fixed to unity (i.e. gamma = 1). The model construction takes place in stages. First, use brmsformula() to describe the exposure-response relationship, setting nl = TRUE to ensure that brms interprets the input as a non-linear model:

Show the code
hyperbolic_model <- brmsformula(
  response ~ e0 + emax * exposure / (ec50 + exposure),
  e0 ~ 1,
  emax ~ 1,
  ec50 ~ 1,
  nl = TRUE
) 

In this specification, the first formula indicates that the exposure-response relationship is an Emax function. The later formulas indicate that e0, emax, and ec50 are model parameters.

In the second stage, assumptions must also be specified for the distribution of measurement errors. For simplicity, this example assumes errors are normally distributed. Use the brmsfamily() function to specify this:

Show the code
gaussian_measurement <- brmsfamily(
  family = "gaussian", 
  link = "identity"
)

In the third stage, parameter priors for e0, emax, and ec50 must also be specified. In brms the default is to place an improper flat prior on regression parameters. For this example a weakly-informative prior is used. The prior() function is used for this, using the nlpar argument to specify the name of a non-linear parameter, and using lb and ub to impose lower and upper bounds if required:

Show the code
hyperbolic_model_prior <- c(
  prior(normal(0, 1.5), nlpar = "e0"),
  prior(normal(0, 1.5), nlpar = "emax"),
  prior(normal(2000, 500), nlpar = "ec50", lb = 0)
)

These three components provide the complete specification of the model. They are passed to brm() along with the data to estimate model parameters:

Show the code
hyperbolic_model_fit <- brm(
  formula = hyperbolic_model, 
  family = gaussian_measurement, 
  data = d_example_emax_3cov, 
  prior = hyperbolic_model_prior
) 

When this code is executed a Stan model is compiled and run, and detailed information on the sampling is printed during the run.

After the sampling is complete the user can inspect the brms model object to obtain a summary of the model, the sampling, and the parameter estimates:

Show the code
hyperbolic_model_fit
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: response ~ e0 + emax * exposure/(ec50 + exposure) 
         e0 ~ 1
         emax ~ 1
         ec50 ~ 1
   Data: d_example_emax_3cov (Number of observations: 300) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
e0_Intercept       6.52      0.64     5.22     7.72 1.00     1104     1404
emax_Intercept    10.54      0.66     9.23    11.83 1.00     1198     1602
ec50_Intercept  2969.18    345.23  2327.76  3677.26 1.00     1500     1799

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.29      0.05     1.19     1.40 1.00     2137     2001

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The data can be visualized in many different ways. A simple example is shown below, using epred_draws() from tidybayes package to extract model predictions as a function of exposure, and median_qi() to calculate a 95% interval around the model predictions:

Show the code
hyperbolic_model_fit |> 
  epred_draws(newdata = tibble(exposure = seq(0, 50000, 1000))) |> 
  median_qi() |> 
  ggplot(mapping = aes(exposure, .epred)) + 
  geom_path() + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper), alpha = 0.3) +
  geom_point(data = d_example_emax_3cov, mapping = aes(y = response))

10.2 Sigmoidal Emax models

It is often necessary to consider sigmoidal Emax models, in which the Hill coefficient gamma is estimated from data. To do so within in the brms framework, the first step is to incorporate the gamma parameter in the model specification:

Show the code
sigmoidal_model <- brmsformula(
  response ~ e0 + emax * exposure^gamma / (ec50^gamma + exposure^gamma),
  e0 ~ 1,
  emax ~ 1,
  ec50 ~ 1,
  gamma ~ 1,
  nl = TRUE
) 

Next, because gamma is now a model parameter, a prior for it must be specified. The prior specification may now look like this:

Show the code
sigmoidal_model_prior <- c(
  prior(normal(0, 1.5), nlpar = "e0"),
  prior(normal(0, 1.5), nlpar = "emax"),
  prior(normal(2000, 500), nlpar = "ec50", lb = 0),
  prior(lognormal(0, 0.25), nlpar = "gamma", lb = 0)
)

No changes to the measurement model are required: like the hyperbolic Emax model, it is typical to fit the sigmoidal Emax model to continuous responses by assuming measurement errors are described by independent normal variates.

To fit the model, call brm():

Show the code
sigmoidal_model_fit <- brm(
  formula = sigmoidal_model, 
  family = gaussian_measurement, 
  data = d_example_emax_3cov, 
  prior = sigmoidal_model_prior
) 

Once the sampling is complete, printing the model object displays estimated model parameters, 95% credible intervals for those parameters, and diagnostic information about the sampling:

Show the code
sigmoidal_model_fit
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: response ~ e0 + emax * exposure^gamma/(ec50^gamma + exposure^gamma) 
         e0 ~ 1
         emax ~ 1
         ec50 ~ 1
         gamma ~ 1
   Data: d_example_emax_3cov (Number of observations: 300) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
e0_Intercept        6.70      0.68     5.25     7.97 1.00     1238     1544
emax_Intercept     10.18      0.83     8.61    11.87 1.00     1242     1566
ec50_Intercept   3035.06    349.95  2380.67  3766.63 1.00     1408     1968
gamma_Intercept     1.09      0.12     0.86     1.35 1.00     1804     2289

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.29      0.06     1.19     1.41 1.00     2363     2431

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

In this instance it is clear from inspection that a sigmoidal model is unnecessary: the posterior mean for gamma is 1.09 with 95% credible interval from 0.86 to 1.35. A hyperbolic model is the more natural choice here. If explicit model comparison is required, cross-validation methods such as LOO-CV can be used to compare the performance of different brms models estimated from the same data. This is discussed in the next chapter.

10.3 Binary Emax models

Now consider the case where the response is binary. Again, a simulated data set is used, identical to the previous example except that the response variable is now 0 or 1 for each subject:

Show the code
d_example_emax_bin_3cov <- read_csv(here("data", "d_example_emax_bin_3cov.csv"))
d_example_emax_bin_3cov
# A tibble: 300 × 6
    dose exposure cov_a cov_b cov_c response
              
 1   100    4151.  5.71  2.33  7.83        0
 2   100    8067.  4.92  4.66  6.74        1
 3   100    4878.  4.88  4.21  4.68        1
 4   100    9713.  8.42  6.56  1.29        1
 5   100   11491.  4.37  3.96  3.55        1
 6   100    2452.  8.69  7.60  3.64        0
 7   100    5652.  6.61  3.95  5.13        1
 8   100    9939.  5.35  7.77  8.29        1
 9   100    5817.  5.61  2.24  9.60        0
10   100    5176.  6.06  1.79  8.74        1
# ℹ 290 more rows

The exposure-response relationship is illustrated by plotting the difference in exposure between responders and non-responders:

Show the code
d_example_emax_bin_3cov |> 
  mutate(response = factor(response)) |> 
  ggplot(aes(response, exposure)) + 
  geom_violin(draw_quantiles = .5)

To adapt the brms model to be appropriate for binary responses, the measurement model is adjusted. As in logistic regression, binary responses are assumed to be Bernoulli distributed, with a logit link function:

Show the code
bernoulli_measurement <- brmsfamily(
  family = "bernoulli", 
  link = "logit"
)

This is the only respect in which the binary model differs from its continuous counterpart. The model formula and prior specification is the same as for the original model at the start of the chapter.

Note that as the modeling is perfomed on logit scale, normal(0, 1.5) priors are considered as a good starting point for e0 and emax. There is a good discussion of these priors on the Stan website.

Show the code
binary_model <- brmsformula(
  response ~ e0 + emax * exposure / (ec50 + exposure),
  e0 ~ 1,
  emax ~ 1,
  ec50 ~ 1,
  nl = TRUE
) 

binary_model_prior <- c(
  prior(normal(0, 1.5), nlpar = "e0"),
  prior(normal(0, 1.5), nlpar = "emax"),
  prior(normal(2000, 500), nlpar = "ec50", lb = 0)
)

To estimate parameters, call brm() for the binary data set using the bernoulli_measurement family:

Show the code
binary_base_fit <- brm(
  formula = binary_model, 
  family = bernoulli_measurement, 
  data = d_example_emax_bin_3cov, 
  prior = binary_model_prior
) 

Again, inspect the model fit object to see the results:

Show the code
binary_base_fit
 Family: bernoulli 
  Links: mu = logit 
Formula: response ~ e0 + emax * exposure/(ec50 + exposure) 
         e0 ~ 1
         emax ~ 1
         ec50 ~ 1
   Data: d_example_emax_bin_3cov (Number of observations: 300) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
e0_Intercept      -1.37      0.69    -2.74    -0.07 1.00     1501     1475
emax_Intercept     3.48      0.88     1.79     5.20 1.00     1563     1591
ec50_Intercept  2451.84    436.14  1618.73  3326.50 1.00     1807     1556

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The predictions of the fitted model are visualized below:

Show the code
binary_base_fit |> 
  epred_draws(newdata = tibble(exposure = seq(0, 50000, 1000))) |> 
  median_qi() |> 
  ggplot(mapping = aes(exposure, .epred)) + 
  geom_path() + 
  geom_ribbon(
    mapping = aes(ymin = .lower, ymax = .upper), 
    alpha = 0.3
  ) +
  geom_jitter(
    data = d_example_emax_bin_3cov,
    mapping = aes(y = response), 
    width = 0, 
    height = .05
  )