Run MiDAS statistical analysis — runMiDAS • midasHLA

runMiDAS perform association analysis on MiDAS data using statistical model of choice. Function is intended for use with prepareMiDAS. See examples section.

runMiDAS(
  object,
  experiment,
  inheritance_model = NULL,
  conditional = FALSE,
  omnibus = FALSE,
  omnibus_groups_filter = NULL,
  lower_frequency_cutoff = NULL,
  upper_frequency_cutoff = NULL,
  correction = "bonferroni",
  n_correction = NULL,
  exponentiate = FALSE,
  th = 0.05,
  th_adj = TRUE,
  keep = FALSE,
  rss_th = 1e-07
)

Arguments

object: An existing fit from a model function such as lm, glm and many others.
experiment: String indicating the experiment associated with object's MiDAS data to use. Valid values includes: "hla_alleles", "hla_aa", "hla_g_groups", "hla_supertypes", "hla_NK_ligands", "kir_genes", "kir_haplotypes", "hla_kir_interactions", "hla_divergence", "hla_het", "hla_custom", "kir_custom". See prepareMiDAS for more information.
inheritance_model: String specifying inheritance model to use. Available choices are "dominant", "recessive", "additive".
conditional: Logical flag indicating if conditional analysis should be performed.
omnibus: Logical flag indicating if omnibus test should be used.
omnibus_groups_filter: Character vector specifying omnibus groups to use.
lower_frequency_cutoff: Number giving lower frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions.
upper_frequency_cutoff: Number giving upper frequency threshold. Numbers greater than 1 are interpreted as the number of feature occurrences, numbers between 0 and 1 as fractions.
correction: String specifying multiple testing correction method. See details for further information.
n_correction: Integer specifying number of comparisons to consider during multiple testing correction calculations. For Bonferroni correction it is possible to specify a number lower than the number of comparisons being made. This is useful in cases when knowledge about the biology or redundance of alleles reduces the need for correction. For other methods it must be at least equal to the number of comparisons being made; only set this (to non-default) when you know what you are doing!
exponentiate: Logical flag indicating whether or not to exponentiate the coefficient estimates. Internally this is passed to tidy. This is typical for logistic and multinomial regressions, but a bad idea if there is no log or logit link. Defaults to FALSE.
th: Number specifying threshold for a variable to be considered significant.
th_adj: Logical flag indicating if adjusted p-value should be used as threshold criteria, otherwise unadjusted p-value is used.
keep: Logical flag indicating if the output should be a list of results resulting from each selection step. Default is to return only the final result.
rss_th: Number specifying residual sum of squares threshold at which function should stop adding additional variables. As the residual sum of squares approaches 0 the perfect fit is obtained making further attempts at variable selection nonsense. This behavior can be controlled using rss_th.

Value

Analysis results, depending on the parameters:

conditional=FALSE, omnibus=FALSE: Tibble with first column "term" holding names of tested variables (eg. alleles). Further columns depends on the used model and are determined by associated tidy function. Generally they will include "estimate", "std.error", "statistic", "p.value", "conf.low", "conf.high", "p.adjusted".
conditional=TRUE, omnibus=FALSE: Tibble or a list of tibbles, see keep argument. The first column "term" hold names of tested variables. Further columns depends on the used model and are determined by associated tidy function. Generally they will include "estimate", "std.error", "statistic", "p.value", "conf.low", "conf.high", "p.adjusted".
conditional=FALSE, omnibus=TRUE: Tibble with first column holding names of tested omnibus groups (eg. amino acid positions) and second names of variables in the group (eg. residues). Further columns are: "df" giving difference in degrees of freedom between base and extended model, "statistic" giving Chisq statistic, "p.value" and "p.adjusted".
conditional=TRUE, omnibus=TRUE: Tibble or a list of tibbles, see keep argument. The first column hold names of tested omnibus groups (eg. amino acid positions), second column hold names of variables in the group (eg. residues). Further columns are: "df" giving difference in degrees of freedom between base and extended model, "statistic" giving Chisq statistic, "p.value" and "p.adjusted".

Details

By default statistical analysis is performed iteratively on each variable in selected experiment. This is done by substituting placeholder in the object's formula with each variable in the experiment.

Setting conditional argument to TRUE will cause the statistical analysis to be performed in a stepwise conditional testing manner, adding the previous top-associated variable as a covariate to object's formula. The analysis stops when there is no more significant variables, based on self-defined threshold (th argument). Either adjusted or unadjusted p-values can be used as the selection criteria, which is controlled using th_adj argument.

Setting omnibus argument to TRUE will cause the statistical analysis to be performed iteratively on groups of variables (like residues at particular amino acid position) using likelihood ratio test.

Argument inheritance_model specifies the inheritance model that should be applyed to experiment's data. Following choices are available:

"dominant" carrier status is sufficient for expression of the phenotype (non-carrier: 0, heterozygous & homozygous carrier: 1).
"recessive" two copies are required for expression of the phenotype (non-carrier & heterozygous carrier: 0, homozygous carrier: 1).
"additive" allele dosage matters, homozygous carriers show stronger phenotype expression or higher risk than heterozygous carriers (non-carrier = 0, heterozygous carrier = 1, homozygous carrier = 2).
"overdominant" heterozygous carriers are at higher risk compared to non-carriers or homozygous carriers (non-carrier & homozygous carrier = 0, heterozygous carrier = 1).

correction specifies p-value adjustment method to use, common choice is Benjamini & Hochberg (1995) ("BH"). Internally this is passed to p.adjust.

Examples

# create MiDAS object
midas <- prepareMiDAS(hla_calls = MiDAS_tut_HLA,
                      colData = MiDAS_tut_pheno,
                      experiment = c("hla_alleles", "hla_aa")
)

# construct statistical model
object <- lm(disease ~ term, data = midas)

# run analysis
runMiDAS(object, experiment = "hla_alleles", inheritance_model = "dominant")
#> Warning: 'experiments' dropped; see 'drops()'
#> # A tibble: 447 × 14
#>    allele     p.value p.adjusted estimate std.error conf.low conf.high statistic
#>    <chr>        <dbl>      <dbl>    <dbl>     <dbl>    <dbl>     <dbl>     <dbl>
#>  1 DQB1*06:02 1.73e-6   0.000773   0.189     0.0393   0.112     0.266       4.81
#>  2 DRB1*15:01 5.29e-6   0.00236    0.177     0.0387   0.101     0.253       4.58
#>  3 B*57:01    1.34e-5   0.00601    0.254     0.0581   0.140     0.368       4.37
#>  4 C*07:02    1.60e-4   0.0714     0.148     0.0390   0.0713    0.224       3.79
#>  5 B*18:01    4.93e-4   0.220     -0.168     0.0481  -0.262    -0.0737     -3.50
#>  6 DRA*01:02  2.63e-3   1          0.0960    0.0319   0.0335    0.159       3.02
#>  7 DQB1*02:02 4.34e-3   1         -0.116     0.0406  -0.196    -0.0364     -2.86
#>  8 B*51:01    4.49e-3   1          0.134     0.0472   0.0418    0.227       2.85
#>  9 DRA*01:01  1.02e-2   1         -0.118     0.0457  -0.207    -0.0279     -2.57
#> 10 DRB1*16:15 1.10e-2   1         -0.404     0.159   -0.715    -0.0929     -2.55
#> # ℹ 437 more rows
#> # ℹ 6 more variables: Ntotal <dbl>, Ntotal.percent <formttbl>,
#> #   `N(disease=0)` <dbl>, `N(disease=0).percent` <formttbl>,
#> #   `N(disease=1)` <dbl>, `N(disease=1).percent` <formttbl>

# omnibus test
# omnibus_groups_filter argument can be used to restrict omnibus test only
# to selected variables groups, here we restrict the analysis to HLA-A
# positions 29 and 43.
runMiDAS(
  object,
  experiment = "hla_aa",
  inheritance_model = "dominant",
  omnibus = TRUE,
  omnibus_groups_filter = c("A_29", "A_43")
)
#> Warning: 'experiments' dropped; see 'drops()'
#> Warning: While evaluating the model problems occured:
#> 
#> • Warnings:
#> • Coefficients for variables: A_29_D could not be estimated, those variables will not be included in likelyhood ratio test result.; occured 1 times, including variables: A_29_D
#> # A tibble: 2 × 6
#>   aa_pos residues    df statistic p.value p.adjusted
#>   <chr>  <chr>    <dbl>     <dbl>   <dbl>      <dbl>
#> 1 A_43   Q, R         2      8.36  0.0153     0.0306
#> 2 A_29   A            1      1.00  0.317      0.634