Extending jmpost

IMPORTANT

Please note that this document is currently a work-in-progress and does not contain complete information for this package yet.

Custom Survival Distributions

Survival distributions in jmpost are specified via their contribution to the log-hazard function. Note that at present all distributions are implemented as proportional hazards models. This means that for any distribution that does not have the proportional hazards property that the distribution of any specific individual subject will not be of the same family. That is to say the distribution family e.g. “Log-Logistic” is just defining the baseline survival distribution only.

Survival distributions are implemented as S4 classes that inherit from SurvivalModel. The two main components of the SurvivalModel object are the stan and parameters slots. The parameters slot is a ParametersList() object which is used to specify the prior distributions of each parameter used by the survival distribution as well as for the design matrix coefficients (please see the “Prior Specification” section below).

The stan slot is a StanModule() object which needs to define the following:

In the parameters block each of the distributions parameters must be formally declared.
In the transformed parameters block there must define a vector called pars_os which contains one element for each parameter used by the survival distribution.
In the functions block there must define a function called log_h0 which returns the log-hazard contribution. This function must have the following signature:

matrix log_h0(matrix time, vector pars_os)

Where: - The return matrix must be of the same dimensionality as the time argument matrix where each element is the log-hazard contribution for the corresponding timepoint in the time matrix. - The time matrix contains one row per unique subject and one column per timepoint for that subject - The pars_os vector is as defined from (2) above and contains the parameters for that particular distribution.

For example the following is roughly equivalent to how the Weibull distribution is implemented:

SurvivalWeibullPH <- function(
    lambda = prior_gamma(2, 0.5),
    gamma = prior_gamma(2, 0.5),
    beta = prior_normal(0, 2)
) {
    stan <- StanModule("
        functions {
            matrix log_h0(matrix time, vector pars_os) {
                matrix[rows(time), cols(time)] result;
                result = log(pars_os[1]) + log(pars_os[2]) + (pars_os[2] - 1) * log(time);
                return result;
            }
        }

        parameters {
            real<lower=0> sm_weibull_ph_lambda;
            real<lower=0> sm_weibull_ph_gamma;
        }

        transformed parameters {
            vector[2] pars_os = [sm_weibull_ph_lambda, sm_weibull_ph_gamma]';
        }
    ")
    SurvivalModel(
        stan = stan,
        parameters = ParameterList(
            Parameter(name = "sm_weibull_ph_lambda", prior = lambda, size = 1),
            Parameter(name = "sm_weibull_ph_gamma", prior = gamma, size = 1),
            Parameter(name = "beta_os_cov", prior = beta, size = "p_os_cov_design")
        )
    )
}

Custom Longitudinal Models

Similar to the survival model the longitudinal models are implemented as S4 classes that inherit from the LongitudinalModel class. The two main components of the LongitudinalModel object are the stan and parameters slots which specify the underlying Stan code and prior distributions for the models parameters respectively (please see the “Prior Specification” section below).

Unlike the survival distributions, the longitudinal models are a lot more flexible and have less constraints on how they are implemented. That is there aren’t any specific variables or functions that you need to define.

That being said there are a several optional features of jmpost that do require the use of specific interfaces if you want to enable them for your model.

1) `loo` integration

If you want to use the loo package to calculate the leave-one-out cross-validation then you need to populate the Ypred_log_lik vector. This vector should contain the log-likelihood contribution for each individual tumour observation. This vector is automatically 0-initialised, thus all your code needs to do is populate it.

transformed parameters {
    Ypred_log_lik = vect_normal_log_dens(
        tumour_value,
        expected_tumour_value,
        rep_vector(lm_rs_sigma, n_tumour_obs)
    );
}

Where: - tumour_value, n_tumour_obs are predefined data objects (see the “Longitudinal Data Objects” section below) - expected_tumour_value is the expected value of the tumour assessment for each observation - lm_rs_sigma is the standard deviation of the tumour assessment - vect_normal_log_dens is a vectorised version of the normal log-likelihood function provided by jmpost (this is opposed to Stan’s inbuilt normal_lpdf function which returns a single value of the sum all of the log-likelihoods)

2) Individual Subject Generated Quantity Integration

In order to calculate the individual subject generated quantities (via GridFixed() / GridGrouped() / etc) you need to define a Stan function with the signature:

vector lm_predict_value(vector time, matrix long_gq_parameters)

Where:

time is a vector of timepoints for which to calculate the generated quantity
long_gq_parameters is a matrix with one row per subject and one column per parameter

Likewise, the long_gq_parameters object also needs to be defined for the model in the generated quantities block as a matrix with 1 row per subject and 1 column per parameter. This structure is to allow for models with subject specific parameters, in particular random effects models. If your model has the same parameters for all subjects, for example a fixed effects model, then the value should be repeated for each subject. Subject’s values should be in the same order as their factor levels in the DataJoint object. The following is an example from the LongitudinalRandomSlope model:

generated quantities {
    matrix[n_subjects, 2] long_gq_parameters;
    long_gq_parameters[, 1] = lm_rs_ind_intercept;
    long_gq_parameters[, 2] = lm_rs_ind_rnd_slope;
}

Where:

lm_rs_ind_intercept and lm_rs_ind_rnd_slope are the individual subject’s random intercept and random slope parameters respectively.

Note that the long_gq_parameters matrix should be structured as your lm_predict_value() function would expect it to be for the long_gq_parameters argument.

Please see “Custom Generated Quantities” section below for implementation details for inserting custom generated quantity code.

3) Population Generated Quantity Integration

A common use case is to calculate the quantities based on the “population” level parameters which is supported in jmpost via the GridPopulation() function. What this means in practice though is often model and parameterisation specific. For example some models would take the median of the distribution whilst others might take the mean or set the random effects offset to 0. As such if you wish for your model to be compatible with the GridPopulation() then you need to declare and populate the long_gq_pop_parameters object with the following signature:

matrix[gq_n_quant, 2] long_gq_pop_parameters;

Note that the number of rows is gq_n_quant. This number will be set to the unique number of combinations of the arm and study factors in the DataJoint object. To support populating this object two additional variables are provided for you namely gq_long_pop_study_index and gq_long_pop_arm_index which are vectors that contain the corresponding index of the study and arm variables for each row in the long_gq_pop_parameters matrix. The following is an example from the LongitudinalRandomSlope model:

generated quantities {
    matrix[gq_n_quant, 2] long_gq_pop_parameters;
    long_gq_pop_parameters[, 1] = to_vector(lm_rs_intercept[gq_long_pop_study_index]);
    long_gq_pop_parameters[, 2] = to_vector(lm_rs_slope_mu[gq_long_pop_arm_index]);
}

Where:

lm_rs_intercept and lm_rs_slope_mu are the model specific group level intercept and slope parameters respectively.

Note that the long_gq_pop_parameters matrix should be structured as your lm_predict_value() function would expect it to be for the long_gq_parameters argument.

Please see “Custom Generated Quantities” section below for implementation details for inserting custom generated quantity code.

Prior Specification

When writing your own custom longitudinal or survival model it is important to understand how the prior definitions are specified. By default jmpost will insert the Stan statements for any prior distributions based on the parameters slot of the model object. The importance of this is that it means you should not define the prior distributions in the Stan code itself. Note that this does not apply to hierarchical parameters who must have their distributions specified in the Stan code. For example in the LongitudinalRandomSlope model their is a different random slope for each treatment arm which is specified in the Stan code as:

model {
    lm_rs_ind_rnd_slope ~ normal(
        lm_rs_slope_mu[subject_arm_index],
        lm_rs_slope_sigma
    );
}

There is however no prior specified in the Stan code for lm_rs_slope_mu or lm_rs_slope_sigma as these are handled by the parameters slot of the model object as mentioned above. The main reason for using this approach is that jmpost implements the priors in such a way that users can change them without having to re-compile the Stan model.

Custom Generated Quantities

In order to avoid unnecessary processing, code that is solely used for generation of post sampling quantities is excluded from the Stan program when initially sampling from the joint model. Instead this code is only included when generating quantities via the LongitidunalQuantities or SurvivalQuantities constructors.

To facilitate this, when using LongitidunalQuantities or SurvivalQuantities, a dedicated model method enableGQ() is called on the user provided longitudinal and survival models. This model specific method is responsible for returning a StanModule object that contains all the relevant code required to generate the quantities for that given model. The following is a rough implementation of this method for the Random-Slope model which implements both feature (2) and (3) outlined in the above “Custom Longitudinal Model” section:

enableGQ.LongitudinalRandomSlope <- function() {
    StanModule("
        functions {
            vector lm_predict_value(vector time, matrix long_gq_parameters) {
                int nrow = rows(time);
                return (
                    long_gq_parameters[, 1] + long_gq_parameters[, 2] .* time
                );
            }
        }

        generated quantities {
            matrix[n_subjects, 2] long_gq_parameters;
            long_gq_parameters[, 1] = lm_rs_ind_intercept;
            long_gq_parameters[, 2] = lm_rs_ind_rnd_slope;

            matrix[gq_n_quant, 2] long_gq_pop_parameters;
            long_gq_pop_parameters[, 1] = to_vector(lm_rs_intercept[gq_long_pop_study_index]);
            long_gq_pop_parameters[, 2] = to_vector(lm_rs_slope_mu[gq_long_pop_arm_index]);
        }
    ")
}

Note that whilst it is possible to provide an enableGQ() method for the survival model it is not required. This is because the underlying framework for creating survival quantities is distribution agnostic and does not require any model specific code.

Custom Link Functions

Users can define custom link functions in several ways based upon the level of customisation required. In order to explain this process it is first important to understand how the link functions are implemented under the hood.

The link functions add their contribution to the likelihood function via the log-hazard function; that is the general model is formulated as:

\[ log(h_i(t, \phi_i)) = log(h_0(t)) + X_i \beta + \alpha_1 f(t, \phi_i) + \alpha_2 g(t, \phi_i) + \dots \]

Where: - \(X\) is the design matrix of covariates - \(\beta\) is the vector of coefficients for the covariates - \(f(t)\) and \(g(t)\) are the link functions - \(\alpha_1\) and \(\alpha_2\) are the coefficients for the link functions - \(h_0(t)\) is the baseline hazard function - \(\phi_i\) is a vector of parameters from the longitudinal model

Each longitudinal model is responsible for defining their own implementations of the \(\phi\) vector. The interface for doing this is by providing an enableLink method that updates the StanModule object of the model to define a Stan matrix with the name link_function_inputs that contains 1 row per subject and 1 column per \(\phi\) parameter.

For reference the following is roughly the implementation for the LongitudinalGSF model:

enableLink.LongitudinalGSF <- function(object, ...) {
  stan <- StanModule("
transformed parameters {
    matrix[n_subjects, 4] link_function_inputs;
    link_function_inputs[,1] = lm_gsf_psi_bsld;
    link_function_inputs[,2] = lm_gsf_psi_ks;
    link_function_inputs[,3] = lm_gsf_psi_kg;
    link_function_inputs[,4] = lm_gsf_psi_phi;
}")
    object@stan <- merge(
        object@stan,
        stan
    )
    object
}

That is to say the \(\phi\) parameters for the LongitudinalGSF model are the 4 primary parameters of the longitudinal model. If you wish to augment this with additional parameters then you can subclass the LongitudinalGSF model and override the enableLink method with the required additional parameters e.g.

GSFextended <- setClass(
    Class = "GSFextended",
    contains = "LongitudinalGSF"
)

enableLink.GSFextended <- function(object, ...) {
  stan <- StanModule("<stan-code-here>")
    object@stan <- merge(
        object@stan,
        stan
    )
    object
}

Next, the individual link functions are implemented as Stan functions with the following signature:

matrix <key>_contrib(
  matrix time,
  matrix link_function_inputs
)

Where: - <key> is the name of the link parameter as specified in the LinkComponent object - time is a matrix of 1 row per subject and 1 column per time point to be evaluated at for that subject.

The LinkComponent object is responsible for then integrating these functions into the final Stan model. For reference the following is roughly the implementation of the dSLD link component for the LongitudinalRandomSlope model:

LinkComponent(
  key = "link_dsld",
  stan = StanModule("
functions {
    matrix link_dsld_contrib(
        matrix time,
        matrix link_function_inputs
    ) {
        int nrows = rows(time);
        int ncols = cols(time);
        vector[nrows] lm_rs_ind_rnd_slope = link_function_inputs[,2];
        matrix[nrows, ncols] rnd_slope_mat = rep_matrix(lm_rs_ind_rnd_slope, ncols);
        return rnd_slope_mat;
    }
}"),
  key = "link_dsld",
  prior = prior
)

You can then pass these LinkComponent objects to the link argument of the JointModel constructor to add them to the model e.g.

JointModel(
  longitudinal = LongitudinalRandomSlope(),
  survival = SurvivalExponential(),
  link = LinkComponent(...)
)

If you wish to add multiple link functions then you must wrap them in a Link() object e.g.

JointModel(
  longitudinal = LongitudinalRandomSlope(),
  survival = SurvivalExponential(),
  link = Link(
    LinkComponent(...),
    LinkComponent(...)
  )
)

For most users the above should be sufficient for adding your own custom link functions for a specific analysis. The following explains how to create your own generic link functions which can be useful if you are wanting to share your code with other users or if you are building a link function that is applicable to multiple longitudinal models and you need polymorphism; this is only recommended for advanced users.

In order to avoid the user specify the exact model specific link component we provide several generic functions that will return the correct link component given their chosen link family. For example the linkDSLD() function will return a different implementation of the derivative of the SLD link depending on which longitudinal model the user has provided. These are implemented as standard S3 methods e.g.

linkDSLD.LongitudinalRandomSlope <- function(prior = prior_normal(0, 2), model, ...) {
    LinkComponent(
        key = "link_dsld",
        stan = StanModule("<stan-code-here>"),
        prior = prior
    )
}

A key design quirk to be aware of is that these methods dispatch off of the model argument not the prior argument. The reason for this is explained further below.

In order to simplify the end user API, the generic link functions have a default model argument of PromiseLongitudinalModel(). This is a special object that is used to dispatch the link<type>.PromiseLongitudinalModel() method which in turn creates a PromiseLinkComponent object. The purpose of these objects is to defer the creation of the LinkComponent object until within the JointModel constructor which will then call the link<Type> method again but with the correct model object. As an example the following is the implementation of the linkDSLD generic.

linkDSLD <- function(prior, model = PromiseLongitudinalModel(), ...) {
    UseMethod("linkDSLD", model)
}

linkDSLD.PromiseLongitudinalModel <- function(prior = prior_normal(0, 2), model, ...) {
    PromiseLinkComponent(fun = linkDSLD, prior = prior, key = "link_dsld")
}

linkDSLD.default <- function(prior, model, ...) {
    stop(sprintf("Method `linkDSLD` is not available for `%s`", class(model)[[1]]))
}

For reference the JointModel constructor will then attempt to resolve the promise by executing the provided function against the user provided longitudinal model object e.g.

object@fun(
    prior = object@prior,
    model = <longitudinal-model>,
)

Custom Simulation Functions

To assist with testing and debugging the joint models fitted via jmpost the SimJointData constructor function is provided to generate joint data from known parameters.

Custom Survival Simulation Functions

Before describing how to implement custom survival functions it is important to understand how the survival simulation framework works more generally in jmpost. To simulate event times the simulation function takes advantage of the following details:

\[ \begin{align} S(t) &\sim U(0, 1) \\ S(t) &= exp\left(-\int_0^t h(u)\ du\right) \end{align} \]

That is, it first samples a survival probability \(p\) from a uniform distribution and then calculates the required event time \(t\) to produce that survival probability. The current implementation approximates the integral by sequentially summing up the hazard after a given step size and declaring an event once the sampled probability value has been exceeded. This gives rise to two key parameters that need to be defined by the user:

time_max: The maximum time to simulate up to (events occurring after this time are censored)
time_step: How much of a gap to leave between time points to calculate the hazard at

Note that there is currently an outstanding development item to convert this to use numerical integration to remove the need for these parameters (see issue #329).

Custom survival distribution simulations are implemented as classes that inherit from SimSurvival providing key parameter values and in particular provide a log-hazard function that will be used as described above in combination with the covariate and link contributions. The following is rough example of how the Weibull distribution has been implemented:

SimSurvivalWeibullPH <- function(
    lambda,
    gamma,
    time_max = 2000,
    time_step = 1,
    lambda_censor = 1 / 3000,
    beta_cont = 0.2,
    beta_cat = c("A" = 0, "B" = -0.4, "C" = 0.2)
) {
    SimSurvival(
        time_max = time_max,
        time_step = time_step,
        lambda_censor = lambda_censor,
        beta_cont = beta_cont,
        beta_cat = beta_cat,
        loghazard = function(time) {
            log(lambda) + log(gamma) + (gamma - 1) * log(time)
        },
        name = "SimSurvivalWeibullPH"
    )
}

That is, the function is a essentially a constructor function for a SimSurvival object. This object then has the following slots defined:

time_max: The maximum time to simulate up to (as explained mentioned above)
time_step: How much of a gap to leave between time points to calculate the hazard at (as explained mentioned above)
beta_cont: The \(\beta\) coefficient for the continuous covariate (sampled from a standard normal distribution for each subject)
beta_cat: The \(\beta\) coefficients for the categorical covariates (evenly sampled from names(beta_cat) for each subject)
loghazard: The log-hazard function of the baseline survival distribution
name: The name of the simulation function; only used for printing purposes

Custom Longitudinal Simulation Functions

Custom longitudinal simulation functions are slightly more involved. Essentially the user needs to define a new class which inherits from SimLongitudinal and then implement the sampleSubjects and sampleObservations methods for the new class. The object itself should contain all the required parameters for the model as well as a times slot which is a vector of timepoints for observations to be generated at.

The sampleSubjects method is responsible for sampling the subject specific parameters e.g. individual parameters for a random effects model. The sampleObservations method is responsible for calculating the tumour size at each provided time point. The following is a rough example of how the SimLongitudinalGSF class is implemented:

# Declare the new class
.SimLongitudinalGSF <- setClass(
    "SimLongitudinalGSF",
    contains = "SimLongitudinal",
    slots = c(
        sigma = "numeric",
        mu_s = "numeric",
        mu_g = "numeric",
        mu_b = "numeric",
        mu_phi = "numeric",
        omega_b = "numeric",
        omega_s = "numeric",
        omega_g = "numeric",
        omega_phi = "numeric",
        link_dsld = "numeric",
        link_ttg = "numeric",
        link_identity = "numeric"
    )
)

# Define constructor function with sensible default values
SimLongitudinalGSF <- function(
    times = c(-100, -50, 0, 50, 100, 150, 250, 350, 450, 550) / 365,
    sigma = 0.01,
    mu_s = log(c(0.6, 0.4)),
    mu_g = log(c(0.25, 0.35)),
    mu_b = log(60),
    mu_phi = qlogis(c(0.4, 0.6)),
    omega_b = 0.2,
    omega_s = 0.2,
    omega_g = 0.2,
    omega_phi = 0.2,
    link_dsld = 0,
    link_ttg = 0,
    link_identity = 0
) {
    .SimLongitudinalGSF(
        times = times,
        sigma = sigma,
        mu_s = mu_s,
        mu_g = mu_g,
        mu_b = mu_b,
        mu_phi = mu_phi,
        omega_b = omega_b,
        omega_s = omega_s,
        omega_g = omega_g,
        omega_phi = omega_phi,
        link_dsld = link_dsld,
        link_ttg = link_ttg,
        link_identity = link_identity
    )
}

sampleSubjects.SimLongitudinalGSF <- function(object, subjects_df) {
    res <- subjects_df |>
        dplyr::mutate(study_idx = as.numeric(.data$study)) |>
        dplyr::mutate(arm_idx = as.numeric(.data$arm)) |>
        dplyr::mutate(psi_b = stats::rlnorm(dplyr::n(), object@mu_b[.data$study_idx], object@omega_b)) |>
        dplyr::mutate(psi_s = stats::rlnorm(dplyr::n(), object@mu_s[.data$arm_idx], object@omega_s)) |>
        dplyr::mutate(psi_g = stats::rlnorm(dplyr::n(), object@mu_g[.data$arm_idx], object@omega_g)) |>
        dplyr::mutate(psi_phi_logit = stats::rnorm(
            dplyr::n(),
            object@mu_phi[.data$arm_idx],
            object@omega_phi
        )) |>
        dplyr::mutate(psi_phi = stats::plogis(.data$psi_phi_logit))
    res[, c("subject", "arm", "study", "psi_b", "psi_s", "psi_g", "psi_phi")]
}

sampleObservations.SimLongitudinalGSF <- function(object, times_df) {
    times_df |>
        dplyr::mutate(mu_sld = gsf_sld(.data$time, .data$psi_b, .data$psi_s, .data$psi_g, .data$psi_phi)) |>
        dplyr::mutate(dsld = gsf_dsld(.data$time, .data$psi_b, .data$psi_s, .data$psi_g, .data$psi_phi)) |>
        dplyr::mutate(ttg = gsf_ttg(.data$time, .data$psi_b, .data$psi_s, .data$psi_g, .data$psi_phi)) |>
        dplyr::mutate(sld = stats::rnorm(dplyr::n(), .data$mu_sld, .data$mu_sld * object@sigma)) |>
        dplyr::mutate(
            log_haz_link =
                (object@link_dsld * .data$dsld) +
                (object@link_ttg * .data$ttg) +
                (object@link_identity * .data$mu_sld)
        )
}

The subjects_df argument to the sampleSubjects method is a data.frame with the following columns:

subject: The subject identifier
arm: The treatment arm that the subject belongs to
study: The study that the subject belongs to

Of note is that this dataset contains one row per subject. The return value must be a data.frame with the same number of rows as the input dataset as well as the subject, arm and study columns. The remaining columns are the subject specific parameters and can have any arbitrary name.

The times_df argument to the sampleObservations method is the same data.frame that was generated in the sampleSubjects method but duplicated once per required timepoint with an additional time column that contains said timepoint. The return value must be a data.frame with the same number of rows as the input dataset as well as the original columns subject, arm, study and time. In addition to the original columns the function must also define the following new columns:

sld: The tumour size at the given timepoint
log_haz_link: The contribution to the hazard function at that timepoint (set this to 0 if not defining a link function)

Formatting Stan Files

Under the hood this library works by merging multiple Stan programs together into a single program. It does this by parsing the program to extract out each block independently. Unfortunately, the formal Stan parser (stanc) provided by the Stan team only works with complete programs whereas most of the programs within jmpost are incomplete fragments. This package has therefore implemented its own simple parser; as a result, in order to not have to traverse the full abstract syntax tree (AST), a few addition constraints are made on how Stan programs can be formatted.

These additional constraints are:

The opening to each block must start on a newline and can’t have any non-whitespace character proceeding it.
The opening to each block cannot have any non-whitespace characters after the opening { character.
The closing } character after each block cannot have any non-whitespace characters after it

Valid:

data {
  int n; array[n] real x;
}       
parameters{      
  real mu; 
  real sigma;}

    model {    
x ~ normal(mu, sigma);
    }

Invalid:

// non-whitespace after opening `{`
data { int n; array[n] real x; } 

parameter { real mu;      // non-whitespace after opening `{`
  real sigma;
} model {                 // non-whitespace before block name
  x ~ normal(mu, sigma);
} // some comment         // non-whitespace after closing `}`

Stan Data Objects

When writing your own Stan code to extend jmpost it is important to note that many different data objects have already been defined in the data block of the base Stan template. This section outlines the different data objects that are made available to user. Note that some objects are only made available if the corresponding model is used; for example death times are only available if the user specifies a SurvivalModel object to JointModel().

Global Data Objects

Number of unique subjects

int<lower=1> n_subjects;

Number of unique studies

int<lower=1> n_studies;

Number of unique treatment arms

int<lower=1> n_arms;

Study index for each subject

array[n_subjects] int<lower=1,upper=n_studies> subject_study_index;

Note that this is sorted based upon the subject’s factor level within the R data.frame. For example lets say subject "A" has a factor level of 15 and that their corresponding study value has a factor level of 2 then subject_study_index[15] will be 2.

Treatment arm index for each subject

  array[n_subjects] int<lower=1,upper=n_arms> subject_arm_index;

A mirror of subject_study_index but for the treatment arm.

Survival Data Objects

Number of events

int<lower=1> n_subject_event;

Event/Censor Times

vector[n_subjects] event_times;

Ordered by subject factor level

Event Index

array[n_subject_event] int subject_event_index;

Is the index into event_times to identify which times are an event. The rest are censored.

Number of covariates for the survival model

int<lower=0> p_os_cov_design;

Note that this does not include an intercept term, which would conflict with the baseline distribution parameters.

Covariate design matrix for the survival model

matrix[n_subjects, p_os_cov_design] os_cov_design;

Note that this does not include an intercept term, which would conflict with the baseline distribution parameters.

Time >0 flag

array[rows(event_times)] int time_positive_flag;

Number of times >0

int n_times_positive;

Positive time index

array[n_times_positive] int time_positive_index

Gaussian Quadrature Integration Parameters

int<lower=1> n_nodes;
vector[n_nodes] nodes;
vector<lower=0, upper=1>[n_nodes] weights;

These are the nodes and weights for the Gaussian quadrature integration.

Longitudinal Data Objects

Total number of tumour assessments

int<lower=1> n_tumour_all;

Number of tumour assessments above LLoQ (Lower Limit of Quantification)

int<lower=1> n_tumour_obs;

Number of tumour assessments below LLoQ (Lower Limit of Quantification)

int<lower=0> n_tumour_cens;

Tumour assessments values

vector[n_tumour_all] tumour_value;

Tumour assessments time points

vector[n_tumour_all] tumour_time;

LLoQ threshold

real tumour_value_lloq;

Individual tumour assessment index

array[n_tumour_all] int subject_tumour_index;

That is if tumour assessment 1 belongs to the subject with factor level 3 then subject_tumour_index[1] will be 3.

Tumour assessment index for observations above LLoQ (Lower Limit of Quantification)

array[n_tumour_obs] int subject_tumour_index_obs;

For example if only tumour assessments 3 and 5 were above the LLoQ then subject_tumour_index_obs will be [3, 5].

Tumour assessment index for observations below LLoQ (Lower Limit of Quantification)

array[n_tumour_cens] int subject_tumour_index_cens;

For example if only tumour assessments 1 and 2 were below the LLoQ then subject_tumour_index_obs will be [1, 2].

Sparse matrix components for subject indexes of the tumour assessments

array [3] int<lower=0> n_mat_inds_all_y;
vector[n_mat_inds_all_y[1]] w_mat_inds_all_y;
array[n_mat_inds_all_y[2]] int v_mat_inds_all_y;
array[n_mat_inds_all_y[3]] int u_mat_inds_all_y;

This is the sparse matrix representation of the binary matrix that has one row per subject and one column per tumour assessment. That is, for row 3 of this matrix all columns that have an entry of 1 indicate that the corresponding entry in tumour_value belongs to the subject with factor level 3. This matrix is primarily used to calculate the sum of log-likelihood for all tumour assessments per subject in an efficient way.
See the Stan CSR documentation for more information on the sparse matrix representation.

Sparse matrix components for subject indexes of the tumour assessments above the LLoQ

array [3] int<lower=1> n_mat_inds_obs_y;
vector[n_mat_inds_obs_y[1]] w_mat_inds_obs_y;
array[n_mat_inds_obs_y[2]] int v_mat_inds_obs_y;
array[n_mat_inds_obs_y[3]] int u_mat_inds_obs_y;

Same as above but only for the tumour assessments above the LLoQ.

Sparse matrix components for subject indexes of the tumour assessments below the LLoQ

array [3] int<lower=0> n_mat_inds_cens_y;
vector[n_mat_inds_cens_y[1]] w_mat_inds_cens_y;
array[n_mat_inds_cens_y[2]] int v_mat_inds_cens_y;
array[n_mat_inds_cens_y[3]] int u_mat_inds_cens_y;

Same as above but only for the tumour assessments below the LLoQ.

Global Quantity Data Objects

Number of quantities to be generated

int <lower=1> gq_n_quant;

Assessment time for that the corresponding quantity should be generated at

vector[gq_n_quant] gq_times;

subject index for each quantity

array[gq_n_quant] int<lower=1, upper=n_subjects> gq_subject_index;

GridFixed() / GridGrouped() / GridObserved() / GridManual() only

Survival Quantity Data Objects

Number of link parameters

int<lower=0> gq_n_par;

GridPrediction() only

Matrix of link function inputs

matrix[gq_n_quant, gq_n_par] gq_link_function_inputs;

GridPrediction() only

Design matrix for the survival quantity model

matrix[gq_n_quant, p_os_cov_design] gq_os_cov_design;

GridPrediction() only

Longitudinal Quantity Data Objects

Arm index for each quantity

array [gq_n_quant] int <lower=1, upper=n_arms> gq_long_pop_arm_index;

GridPopulation() only

Study index for each quantity

array [gq_n_quant] int <lower=1, upper=n_studies> gq_long_pop_study_index;

GridPopulation() only

IMPORTANT

Custom Survival Distributions

Custom Longitudinal Models

1) loo integration

2) Individual Subject Generated Quantity Integration

3) Population Generated Quantity Integration

Prior Specification

Custom Generated Quantities

Custom Link Functions

Custom Simulation Functions

Custom Survival Simulation Functions

Custom Longitudinal Simulation Functions

Formatting Stan Files

Stan Data Objects

Global Data Objects

Survival Data Objects

Longitudinal Data Objects

Global Quantity Data Objects

Survival Quantity Data Objects

Longitudinal Quantity Data Objects

1) `loo` integration