Skip to contents

Generates covariates where categorical variables define mixture components and continuous variables are drawn from component-specific multivariate normal distributions. Each combination of categorical levels has its own probability and its own distribution for the continuous covariates.

Usage

simulate_X_mixture(
  n,
  p_cat,
  p_cont,
  cat_level_list,
  cat_comb_prob,
  cont_para_list
)

Arguments

n

Positive integer. Number of units to simulate.

p_cat

Non-negative integer. Number of categorical covariates.

p_cont

Non-negative integer. Number of continuous covariates. At least one of `p_cat` or `p_cont` must be positive.

cat_level_list

List of length `p_cat`. Each element is a vector of possible levels for that categorical variable. The total number of combinations is `prod(lengths(cat_level_list))`.

cat_comb_prob

Numeric vector of probabilities, one per combination of categorical levels (in the order produced by [expand.grid()]). Must sum to 1.

cont_para_list

List of parameter lists for the continuous covariates. When `p_cat > 0`, must have one element per combination of categorical levels; each element is a list with `mean` (length `p_cont`) and `sigma` (`p_cont x p_cont` matrix). When `p_cat == 0`, a single-element list.

Value

A data frame with `n` rows and `p_cat + p_cont` columns named `x1`, ..., `xp`.

Examples

# Continuous only
X <- simulate_X_mixture(
  n = 100, p_cat = 0, p_cont = 2,
  cat_level_list = list(),
  cat_comb_prob = c(),
  cont_para_list = list(list(mean = c(0, 0), sigma = diag(2)))
)

# Mixed categorical and continuous
X <- simulate_X_mixture(
  n = 100, p_cat = 1, p_cont = 2,
  cat_level_list = list(c(0, 1)),
  cat_comb_prob = c(0.4, 0.6),
  cont_para_list = list(
    list(mean = c(0, 0), sigma = diag(2)),
    list(mean = c(2, 2), sigma = diag(2))
  )
)