Statistical Specifications
statistical-specification.Rmd
IMPORTANT
Please note that this document is currently a work-in-progress and does not contain complete information for this package yet.
Survival Model Specification
This package can only be used to fit proportional hazards models of the form:
\[ \log(h_i(t \mid \theta, \psi_i)) = \log(h_0(t \mid \theta)) + X_i \beta + \sum_j G_j(t \mid \psi_i) \]
Where:
- \(h_0(.)\) is a parametric baseline hazard function
- \(t\) is the event time
- \(\theta\) is a vector of parameters that parameterise the baseline hazard
- \(\psi_i\) is an arbitrary vector of parameters for subject \(i\) specified by the longitudinal model
- \(G_j(.)\) is a link function that maps \(\psi_i\) to a contribution to the log-hazard function where \(j\) simply indexes the given link function
- \(X_i\) is the subjects covariate design matrix
- \(\beta\) is the corresponding coefficients to scale the design matrix covariates contribution to the log-hazard function
The following sections outline the available distributions to users which can be selected for the baseline hazard \(h_0(.)\). Please note that some of these distributions do not have the proportional-hazards property meaning that the resulting survival model corresponding to the hazard \(h_i()\) will not be of the same parametric family as the baseline distribution with the hazard \(h_0(.)\).
Exponential Distribution
\[ h(t \mid \lambda) = \lambda \]
Where: - \(\lambda > 0\) is the rate parameter
Weibull Distribution (Proportional Hazard Parameterisation)
\[ h(t \mid \lambda, \gamma) = \lambda \gamma t^{\gamma - 1 }; \]
Where: - \(\lambda > 0\) is the rate parameter - \(\gamma > 0\) is the shape parameter Note that with \(\gamma = 1\) we obtain the exponential distribution as a special case. ## Log-Logistic Distribution
\[ h(t \mid a, b) = \frac {(b/a)(t/a)^{(b-1)}} {1 + (t/a)^b} \]
Where:
- \(a > 0\) is the scale parameter
- \(b > 0\) is the shape parameter
Longitudinal Model Specification
Random-Slope Model
\[ y_{ij} = \mu_{l(i)} + s_i t_{ij} + \epsilon_{ij} \]
where:
- \(y_{ij}\) is the tumour size for subject \(i\) at timepoint \(j\)
- \(\mu_{l(i)}\) is the intercept for subject \(i\)
- \(s_i\) is the random slope for subject \(i\) with \(s_i \sim N(\mu_{sk(i)}, \sigma_s)\)
- \(\mu_{sk(i)}\) is the mean for the random slope within treatment arm \(k(i)\)
- \(\sigma_s\) is the variance term for the slopes
- \(\epsilon_{ij}\) is the error term with \(\epsilon_{ij} \sim N(0, \sigma)\)
- \(k(i)\) is the treatment arm index for subject \(i\)
- \(l(i)\) is the study index for subject \(i\)
Derivative of the SLD Trajectory Link / Growth Parameter Link
\[ G(t_{ij} \mid \mu_0, s_i) = s_i \]
Accessible via linkDSLD()
&
linkGrowth()
Identity Link
\[ \begin{align*} G(t_{ij} \mid \mu_0, s_i) = \mu_0 + s_i t_{ij} \end{align*} \]
Accessible via linkIdentity()
Stein-Fojo Model
\[\begin{align*} y_{ij} &\sim \mathcal{N}(SLD_{ij},\ SLD_{ij}^2 \sigma^2) \\ \\ SLD_{ij} &= \begin{cases} b_i[e^{-s_it_{ij}} + e^{g_i t_{ij}} - 1] & \text{if } t_{ij}\geq 0 \\ b_i e^{g_i t_{ij}} & \text{if } t_{ij}\lt 0 \end{cases}\\\\ b_i &\sim \text{LogNormal}(\mu_{bl(i)}, \omega_b) \\ s_i &\sim \text{LogNormal}(\mu_{sk(i)}, \omega_s) \\ g_i &\sim \text{LogNormal}(\mu_{gk(i)}, \omega_g) \\ \end{align*} \]
Where:
- \(i\) is the subject index
- \(j\) is the visit index
- \(y_{ij}\) is the observed tumour measurements
- \(SLD_{ij}\) is the expected sum of longest diameter for subject \(i\) at time point \(j\)
- \(t_{ij}\) is the time since first treatment for subject \(i\) at visit \(j\)
- \(b_i\) is the subject baseline SLD measurement
- \(s_i\) is the subject kinetics shrinkage parameter
- \(g_i\) is the subject kinetics tumour growth parameter
- \(\phi_i\) is the subject proportion of cells affected by the treatment
- \(k(i)\) is the treatment arm index for subject \(i\)
- \(l(i)\) is the study index for subject \(i\)
- \(\mu_{\theta k(i)}\) is the population mean for parameter \(\theta\) in group \(k(i)\)
- \(\omega_{\theta}\) is the population variance for parameter \(\theta\).
If using the non-centred parameterisation then the following alternative formulation is used: \[ \begin{align*} b_i &= exp(\mu_{bl(i)} + \omega_b * \eta_{b i}) \\ s_i &= exp(\mu_{sk(i)} + \omega_s * \eta_{s i}) \\ g_i &= exp(\mu_{gk(i)} + \omega_g * \eta_{g i}) \\ \\ \eta_{b i} &\sim N(0, 1)\\ \eta_{s i} &\sim N(0, 1) \\ \eta_{g i} &\sim N(0, 1) \\ \end{align*} \]
Where: * \(\eta_{\theta i}\) is a random effects offset on parameter \(\theta\) for subject \(i\)
Derivative of the SLD Trajectory Link
\[ G(t_{ij} \mid b_i, s_i, g_i) = \begin{cases} b_i(g_i e^{g_i t_{ij}} -s_ie^{-s_it_{ij}} ) & \text{if } t_{ij}\geq 0 \\ b_i g_i e^{g_i t_{ij}} & \text{if } t_{ij}\lt 0 \end{cases} \]
Accessible via linkDSLD()
Time to Growth Link
\[ G(t_{ij} \mid b_i, s_i, g_i) = \max \left( \frac{ \text{log}(s_i) - \text{log}(g_i) }{ s_i + g_i }, 0 \right) \]
Accessible via linkTTG()
Identity Link
\[ \begin{align*} G(t_{ij} \mid b_i, s_i, g_i) &= SLD_{ij} \end{align*} \]
Accessible via linkIdentity()
Growth Parameter Link
\[ \begin{align*} G(t_{ij} \mid b_i, s_i, g_i) &= log(g_i) \end{align*} \]
Accessible via linkGrowth()
Population Quantities
Note that when generating population quantities for the \(b\), \(s\) and \(g\) parameters the median of the distribution is used. This is because this has the same interpretation as using a non-centred parameterisation and setting the “random effects” term to be 0. For the \(\phi\) parameter the mean of the distribution is used.
Generalized Stein-Fojo (GSF) Model
\[ \begin{align*} y_{ij} &\sim \mathcal{N}(SLD_{ij},\ SLD_{ij}^2 \sigma^2) \\ \\ SLD_{ij} &= \begin{cases} b_i[\phi_i e^{-s_it_{ij}} + (1-\phi_i)e^{g_i t_{ij}}] & \text{if } t_{ij}\geq 0 \\ b_i e^{g_i t_{ij}} & \text{if } t_{ij}\lt 0 \end{cases}\\ \\ b_i &\sim \text{LogNormal}(\mu_{bl(i)}, \omega_b) \\ s_i &\sim \text{LogNormal}(\mu_{sk(i)}, \omega_s) \\ g_i &\sim \text{LogNormal}(\mu_{gk(i)}, \omega_g) \\ \phi_i &\sim \text{LogitNormal}(\mu_{\phi k(i)}, \omega_\phi) \end{align*} \]
Where:
- \(i\) is the subject index
- \(j\) is the visit index
- \(y_{ij}\) is the observed tumour measurements
- \(SLD_{ij}\) is the expected sum of longest diameter for subject \(i\) at time point \(j\)
- \(t_{ij}\) is the time since first treatment for subject \(i\) at visit \(j\)
- \(b_i\) is the subject baseline SLD measurement
- \(s_i\) is the subject kinetics shrinkage parameter
- \(g_i\) is the subject kinetics tumour growth parameter
- \(\phi_i\) is the subject proportion of cells affected by the treatment
- \(k(i)\) is the treatment arm index for subject \(i\)
- \(l(i)\) is the study index for subject \(i\)
- \(\mu_{\theta k(i)}\) is the population mean for parameter \(\theta\) in group \(k(i)\)
- \(\omega_{\theta}\) is the population variance for parameter \(\theta\).
If using the non-centred parameterisation then the following alternative formulation is used: \[ \begin{align*} b_i &= exp(\mu_{bl(i)} + \omega_b * \eta_{b i}) \\ s_i &= exp(\mu_{sk(i)} + \omega_s * \eta_{s i}) \\ g_i &= exp(\mu_{gk(i)} + \omega_g * \eta_{g i}) \\ \phi_i &= \text{logistic}(\mu_{gk(i)} + \omega_\phi * \eta_{\phi i}) \\ \\ \eta_{b i} &\sim N(0, 1)\\ \eta_{s i} &\sim N(0, 1) \\ \eta_{g i} &\sim N(0, 1) \\ \eta_{\phi i} &\sim N(0, 1) \\ \end{align*} \]
Where: * \(\eta_{\theta i}\) is a random effects offset on parameter \(\theta\) for subject \(i\)
Derivative of the SLD Trajectory Link
\[ G(t_{ij} \mid b_i, s_i, g_i, \phi_i) = \begin{cases} b_i(-s_i\phi_ie^{-s_it_{ij}} + (1 - \phi_i)g_i e^{g_i t_{ij}}) & \text{if } t_{ij}\geq 0 \\ b_i g_i e^{g_i t_{ij}} & \text{if } t_{ij}\lt 0 \end{cases} \]
Accessible via linkDSLD()
Time to Growth Link
\[ G(t_{ij} \mid b_i, s_i, g_i, \phi_i) = \max \left( \frac{ \log(s_i) + \log(\phi_i) - \log(g_i) - \log(1-\phi_i) }{ s_i + g_i }, 0 \right) \]
Accessible via linkTTG()
Identity Link
\[ \begin{align*} G(t_{ij} \mid b_i, s_i, g_i, \phi_i) &= SLD_{ij} \end{align*} \]
Accessible via linkIdentity()
Growth Parameter Link
\[ \begin{align*} G(t_{ij} \mid b_i, s_i, g_i, \phi_i) &= log(g_i) \end{align*} \]
Accessible via linkGrowth()
Claret-Bruno Model
\[ \begin{align*} y_{ij} &\sim \mathcal{N}(SLD_{ij},\ SLD_{ij}^2 \sigma^2) \\ \\ SLD_{ij} &= \begin{cases} b_i e^{g_i t_{ij}} & \text{if } t_{ij} < 0, \\ b_i \cdot \exp\left(g_i t_{ij} - \frac{p_i}{c_i} \left(1 - e^{-c_i t_{ij} }\right)\right) & \text{if } t_{ij} \geq 0. \end{cases}\\ \\ b_i &\sim \text{LogNormal}(\mu_{bl(i)}, \omega_b) \\ g_i &\sim \text{LogNormal}(\mu_{gk(i)}, \omega_g) \\ c_i &\sim \text{LogNormal}(\mu_{ck(i)}, \omega_c) \\ p_i &\sim \text{LogNormal}(\mu_{pk(i)}, \omega_p) \\ \end{align*} \]
Where:
- \(i\) is the subject index
- \(j\) is the visit index
- \(y_{ij}\) is the observed tumour measurements
- \(SLD_{ij}\) is the expected sum of longest diameter for subject \(i\) at time point \(j\)
- \(t_{ij}\) is the time since first treatment for subject \(i\) at visit \(j\)
- \(b_i\) is the subject baseline SLD value.
- \(g_i\) is the subject tumour growth rate.
- \(c_i\) is the subject treatment resistance rate.
- \(p_i\) is the subject treatment growth inhibition response.
- \(k(i)\) is the treatment arm index for subject \(i\)
- \(l(i)\) is the study index for subject \(i\)
- \(\mu_{\theta k(i)}\) is the population mean for parameter \(\theta\) in group \(k(i)\)
- \(\omega_{\theta}\) is the population variance for parameter \(\theta\).
Derivative of the SLD Trajectory Link
\[ G(t_{ij} \mid b_i, g_i, c_i, p_i) = \begin{cases} b_i g_i e^{g_i t_{ij}} & \text{if } t_{ij}\lt 0 \\ \left( g_i - p_i e^{-c_i t_{ij}} \right) SLD_{ij} & \text{if } t_{ij}\geq 0 \end{cases} \]
Accessible via linkDSLD()
Time to Growth Link
\[ G(t_{ij} \mid b_i, g_i, c_i, p_i) = \max \left( \frac{ln(\frac{p_i}{g_i})}{c_i}, 0 \right) \]
Accessible via linkTTG()
Identity Link
\[ \begin{align*} G(t_{ij} \mid b_i, g_i, c_i, p_i) &= SLD_{ij} \end{align*} \]
Accessible via linkIdentity()
Growth Parameter Link
\[ \begin{align*} G(t_{ij} \mid b_i, g_i, c_i, p_i) &= log(g_i) \end{align*} \]
Accessible via linkGrowth()
Post-Processing
Brier Score
The Brier Score is used to measure a models predictive performance.
In the case of Survival Models it is the weighted squared difference
between whether a subject died within a given time interval and the
models estimated probability of them dying within said interval. Within
jmpost
the following formula (as described in Blanche et al. (2015)) has been implemented:
\[ \hat{BS}(t) = \frac{1}{n}\sum_{i=1}^{n} \hat{W}_i(t) \Big(D_i(t) - \pi_i(t) \Big)^2 \]
\[ \hat{W}_i(t) = \frac{\mathbb{1}_{(T_i \gt t)}}{\hat{G}(t)} + \frac{\mathbb{1}_{(T_i \le t)} \cdot \Delta_i}{\hat{G}(T_i)} \]
Where:
- \(T_i\) is the observed event/censor time for subject \(i\)
- \(\Delta_i\) is the event indicator which is 1 if \(T_i\) is an event and 0 if \(T_i\) is censored
- \(\mathbb{1}_{(.)}\) is the indicator which is 1 if \({(.)}\) is true else 0
- \(D_i(t)\) is the event indicator function for subject \(i\) e.g. \(D_i(t) = \mathbb{1}_{(T_i \lt t,\ \delta_i = 1)}\)
- \(\pi_i(t)\) is a model predicted probability of subject \(i\) dying before time \(t\)
- \(\hat{G}(u)\) is the Kaplan-Meier estimator of survival function of the censoring time at \(u\)
Note that by default \(\hat{G}(T_i)\) is estimated by \(\hat{G}(T_i-)\) and that in the case of
ties event times are always considered to have occurred before censored
times (this is in contrast to survival::survfit
which
regards censored times as coming before event times when estimating the
censoring distribution). Both of these default options can be changed if
required.