Show the code
library(tidyverse)
library(brms)
library(posterior)
library(tidybayes)
library(here)
theme_set(theme_bw(base_size = 12))
The brms package provides a flexible framework for specifying multilevel regression models, using Stan as the back end. It is typically used for models within the generalized linear mixed model (GLMM) specification, but can accommodate nonlinear models such as Emax. This chapter uses the brms package to develop and evaluate Bayesian Emax regression models. Models for continuous and binary response data are discussed, and in the next chapter these are extended to discuss covariate modeling.
library(tidyverse)
library(brms)
library(posterior)
library(tidybayes)
library(here)
theme_set(theme_bw(base_size = 12))
if (require(cmdstanr)) {
# prefer cmdstanr and cache binaries
options(
brms.backend = "cmdstanr",
cmdstanr_write_stan_file_dir = here("_brms-cache")
)dir.create(here("_brms-cache"), FALSE)
else {
} ::rstan_options(auto_write = TRUE)
rstan }
Loading required package: cmdstanr
This is cmdstanr version 0.9.0
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /home/danielle/.cmdstan/cmdstan-2.36.0
- CmdStan version: 2.36.0
This section shows how to build a standard Emax model for continuous response data using brms. To build the model a simulated data set is used:
<- read_csv(here("data", "d_example_emax_3cov.csv"))
d_example_emax_3cov d_example_emax_3cov
# A tibble: 300 × 6
dose exposure cov_a cov_b cov_c response
1 100 4151. 5.71 2.33 7.83 14.2
2 100 8067. 4.92 4.66 6.74 14.0
3 100 4878. 4.88 4.21 4.68 13.3
4 100 9713. 8.42 6.56 1.29 16.4
5 100 11491. 4.37 3.96 3.55 15.1
6 100 2452. 8.69 7.60 3.64 13.6
7 100 5652. 6.61 3.95 5.13 14.1
8 100 9939. 5.35 7.77 8.29 15.0
9 100 5817. 5.61 2.24 9.60 13.3
10 100 5176. 6.06 1.79 8.74 14.1
# ℹ 290 more rows
In this chapter only the exposure
and response
columns are used. A simple exploratory visualization of the exposure-response relationship is shown below:
|>
d_example_emax_3cov ggplot(aes(exposure, response)) +
geom_point() +
geom_smooth(formula = y ~ x, method = "loess")
The model considered in this section is a hyperbolic Emax model, in which the Hill coefficient is fixed to unity (i.e. gamma = 1
). The model construction takes place in stages. First, use brmsformula()
to describe the exposure-response relationship, setting nl = TRUE
to ensure that brms interprets the input as a non-linear model:
<- brmsformula(
hyperbolic_model ~ e0 + emax * exposure / (ec50 + exposure),
response ~ 1,
e0 ~ 1,
emax ~ 1,
ec50 nl = TRUE
)
In this specification, the first formula indicates that the exposure-response relationship is an Emax function. The later formulas indicate that e0
, emax
, and ec50
are model parameters.
In the second stage, assumptions must also be specified for the distribution of measurement errors. For simplicity, this example assumes errors are normally distributed. Use the brmsfamily()
function to specify this:
<- brmsfamily(
gaussian_measurement family = "gaussian",
link = "identity"
)
In the third stage, parameter priors for e0
, emax
, and ec50
must also be specified. In brms the default is to place an improper flat prior on regression parameters. For this example a weakly-informative prior is used. The prior()
function is used for this, using the nlpar
argument to specify the name of a non-linear parameter, and using lb
and ub
to impose lower and upper bounds if required:
<- c(
hyperbolic_model_prior prior(normal(0, 1.5), nlpar = "e0"),
prior(normal(0, 1.5), nlpar = "emax"),
prior(normal(2000, 500), nlpar = "ec50", lb = 0)
)
These three components provide the complete specification of the model. They are passed to brm()
along with the data to estimate model parameters:
<- brm(
hyperbolic_model_fit formula = hyperbolic_model,
family = gaussian_measurement,
data = d_example_emax_3cov,
prior = hyperbolic_model_prior
)
When this code is executed a Stan model is compiled and run, and detailed information on the sampling is printed during the run.
After the sampling is complete the user can inspect the brms model object to obtain a summary of the model, the sampling, and the parameter estimates:
hyperbolic_model_fit
Family: gaussian
Links: mu = identity; sigma = identity
Formula: response ~ e0 + emax * exposure/(ec50 + exposure)
e0 ~ 1
emax ~ 1
ec50 ~ 1
Data: d_example_emax_3cov (Number of observations: 300)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
e0_Intercept 6.52 0.64 5.22 7.72 1.00 1104 1404
emax_Intercept 10.54 0.66 9.23 11.83 1.00 1198 1602
ec50_Intercept 2969.18 345.23 2327.76 3677.26 1.00 1500 1799
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.29 0.05 1.19 1.40 1.00 2137 2001
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The data can be visualized in many different ways. A simple example is shown below, using epred_draws()
from tidybayes
package to extract model predictions as a function of exposure, and median_qi()
to calculate a 95% interval around the model predictions:
|>
hyperbolic_model_fit epred_draws(newdata = tibble(exposure = seq(0, 50000, 1000))) |>
median_qi() |>
ggplot(mapping = aes(exposure, .epred)) +
geom_path() +
geom_ribbon(aes(ymin = .lower, ymax = .upper), alpha = 0.3) +
geom_point(data = d_example_emax_3cov, mapping = aes(y = response))
It is often necessary to consider sigmoidal Emax models, in which the Hill coefficient gamma
is estimated from data. To do so within in the brms framework, the first step is to incorporate the gamma
parameter in the model specification:
<- brmsformula(
sigmoidal_model ~ e0 + emax * exposure^gamma / (ec50^gamma + exposure^gamma),
response ~ 1,
e0 ~ 1,
emax ~ 1,
ec50 ~ 1,
gamma nl = TRUE
)
Next, because gamma
is now a model parameter, a prior for it must be specified. The prior specification may now look like this:
<- c(
sigmoidal_model_prior prior(normal(0, 1.5), nlpar = "e0"),
prior(normal(0, 1.5), nlpar = "emax"),
prior(normal(2000, 500), nlpar = "ec50", lb = 0),
prior(lognormal(0, 0.25), nlpar = "gamma", lb = 0)
)
No changes to the measurement model are required: like the hyperbolic Emax model, it is typical to fit the sigmoidal Emax model to continuous responses by assuming measurement errors are described by independent normal variates.
To fit the model, call brm()
:
<- brm(
sigmoidal_model_fit formula = sigmoidal_model,
family = gaussian_measurement,
data = d_example_emax_3cov,
prior = sigmoidal_model_prior
)
Once the sampling is complete, printing the model object displays estimated model parameters, 95% credible intervals for those parameters, and diagnostic information about the sampling:
sigmoidal_model_fit
Family: gaussian
Links: mu = identity; sigma = identity
Formula: response ~ e0 + emax * exposure^gamma/(ec50^gamma + exposure^gamma)
e0 ~ 1
emax ~ 1
ec50 ~ 1
gamma ~ 1
Data: d_example_emax_3cov (Number of observations: 300)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
e0_Intercept 6.70 0.68 5.25 7.97 1.00 1238 1544
emax_Intercept 10.18 0.83 8.61 11.87 1.00 1242 1566
ec50_Intercept 3035.06 349.95 2380.67 3766.63 1.00 1408 1968
gamma_Intercept 1.09 0.12 0.86 1.35 1.00 1804 2289
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.29 0.06 1.19 1.41 1.00 2363 2431
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
In this instance it is clear from inspection that a sigmoidal model is unnecessary: the posterior mean for gamma
is 1.09 with 95% credible interval from 0.86 to 1.35. A hyperbolic model is the more natural choice here. If explicit model comparison is required, cross-validation methods such as LOO-CV can be used to compare the performance of different brms models estimated from the same data. This is discussed in the next chapter.
Now consider the case where the response is binary. Again, a simulated data set is used, identical to the previous example except that the response
variable is now 0 or 1 for each subject:
<- read_csv(here("data", "d_example_emax_bin_3cov.csv"))
d_example_emax_bin_3cov d_example_emax_bin_3cov
# A tibble: 300 × 6
dose exposure cov_a cov_b cov_c response
1 100 4151. 5.71 2.33 7.83 0
2 100 8067. 4.92 4.66 6.74 1
3 100 4878. 4.88 4.21 4.68 1
4 100 9713. 8.42 6.56 1.29 1
5 100 11491. 4.37 3.96 3.55 1
6 100 2452. 8.69 7.60 3.64 0
7 100 5652. 6.61 3.95 5.13 1
8 100 9939. 5.35 7.77 8.29 1
9 100 5817. 5.61 2.24 9.60 0
10 100 5176. 6.06 1.79 8.74 1
# ℹ 290 more rows
The exposure-response relationship is illustrated by plotting the difference in exposure between responders and non-responders:
|>
d_example_emax_bin_3cov mutate(response = factor(response)) |>
ggplot(aes(response, exposure)) +
geom_violin(draw_quantiles = .5)
To adapt the brms model to be appropriate for binary responses, the measurement model is adjusted. As in logistic regression, binary responses are assumed to be Bernoulli distributed, with a logit link function:
<- brmsfamily(
bernoulli_measurement family = "bernoulli",
link = "logit"
)
This is the only respect in which the binary model differs from its continuous counterpart. The model formula and prior specification is the same as for the original model at the start of the chapter.
Note that as the modeling is perfomed on logit scale, normal(0, 1.5) priors are considered as a good starting point for e0 and emax. There is a good discussion of these priors on the Stan website.
<- brmsformula(
binary_model ~ e0 + emax * exposure / (ec50 + exposure),
response ~ 1,
e0 ~ 1,
emax ~ 1,
ec50 nl = TRUE
)
<- c(
binary_model_prior prior(normal(0, 1.5), nlpar = "e0"),
prior(normal(0, 1.5), nlpar = "emax"),
prior(normal(2000, 500), nlpar = "ec50", lb = 0)
)
To estimate parameters, call brm()
for the binary data set using the bernoulli_measurement
family:
<- brm(
binary_base_fit formula = binary_model,
family = bernoulli_measurement,
data = d_example_emax_bin_3cov,
prior = binary_model_prior
)
Again, inspect the model fit object to see the results:
binary_base_fit
Family: bernoulli
Links: mu = logit
Formula: response ~ e0 + emax * exposure/(ec50 + exposure)
e0 ~ 1
emax ~ 1
ec50 ~ 1
Data: d_example_emax_bin_3cov (Number of observations: 300)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
e0_Intercept -1.37 0.69 -2.74 -0.07 1.00 1501 1475
emax_Intercept 3.48 0.88 1.79 5.20 1.00 1563 1591
ec50_Intercept 2451.84 436.14 1618.73 3326.50 1.00 1807 1556
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The predictions of the fitted model are visualized below:
|>
binary_base_fit epred_draws(newdata = tibble(exposure = seq(0, 50000, 1000))) |>
median_qi() |>
ggplot(mapping = aes(exposure, .epred)) +
geom_path() +
geom_ribbon(
mapping = aes(ymin = .lower, ymax = .upper),
alpha = 0.3
+
) geom_jitter(
data = d_example_emax_bin_3cov,
mapping = aes(y = response),
width = 0,
height = .05
)