Skip to contents

Predictiveness curve, PPV, NPV and 1-NPV risk estimates

Usage

riskProfile(
  outcome,
  score,
  methods = "asis",
  prev.adj = NULL,
  show.prev = TRUE,
  show.nonparam.pv = TRUE,
  show.best.pv = TRUE,
  include = c("PC", "PPV", "1-NPV"),
  plot.raw = FALSE,
  rev.order = FALSE
)

Arguments

outcome

Vector of binary outcome for each observation.

score

Numeric vector of continuous predicted risk score.

methods

Character vector of method names (case-insensitive) for plotting curves or a named list where elements are method function and its arguments. Default is set to "asis".

Full options are: c("asis", "binned", "pava", "mspline", "gam", "cgam").

To specify arguments per method, use lists. For example:

list(
  pava = list(method = "pava", ties = "primary"),
  mspline = list(method = "mspline", fitonPerc = TRUE),
  gam = list(method = "gam", bs = "tp", logscores = FALSE),
  bin = list(method = "binned", bins = 10),
  risk = list(method = "asis")
)

See section "Estimation" for more details.

prev.adj

NULL (default) or scalar numeric between 0 and 1 for prevalence adjustment.

show.prev

Logical, show prevalence value in the graph. Defaults to TRUE.

show.nonparam.pv

Logical, show non-parametric calculation of PVs. Defaults to TRUE.

show.best.pv

Logical, show best possible PVs. Defaults to TRUE.

include

Character vector (case-insensitive, partial matching) specifying what quantities to include in the plot.

Default is: c("PC", "PPV", "1-NPV").

Full options are: c("NPV", "PC", "PPV", "1-NPV").

plot.raw

Logical to show percentiles or raw values. Defaults to FALSE (i.e. percentiles).

rev.order

Logical, reverse ordering of scores. Defaults to FALSE.

Value

A list containing the plot and data, plus errorbar data if they were requested (through "binned" estimation method with a parameter errorbar.sem).

Estimation

The methods argument specifies the estimation method. You can provide either a vector of strings, any of

c("asis", "binned", "pava", "mspline", "gam", "cgam")

("asis" is not available for calibrationProfile), or a named list of lists. In the latter case, the inner list must have an element "method", which specifies the estimation function (one of those above), and optionally other elements, which are passed to the estimation function. For example:

list(
  gam = list(method = "gam", k = 3),
  c_gam = list(method = "cgam", numknots = 3)
)

To see what arguments are available for each estimation method, see the documentation of that function. The naming convention is getXest, where X stands for the estimation method, for example getGAMest().

"gam", "cgam", and "mspline" always fit on percentiles by default. To change this, use fitonPerc = FALSE, for example

list(gam = list(method = "gam", fitonPerc = FALSE))

"gam" and "cgam" methods are wrappers of mgcv::gam() and cgam::cgam(), respectively. The default values of function arguments (like k, the number of knots in mgcv::s()) mirror the package defaults.

Examples

# Read in example data
auroc <- read.csv(system.file("extdata", "sample.csv", package = "stats4phc"))
rscore <- auroc$predicted_calibrated
truth <- as.numeric(auroc$actual)

# Default plot includes 1-NPV, PPV, and a predictiveness curve (PC) based on risk-cutoff
p1 <- riskProfile(outcome = truth, score = rscore)
p1$plot

p1$data
#> # A tibble: 3,340 × 7
#>    method   score percentile outcome estimate pv    pvValue
#>    <chr>    <dbl>      <dbl>   <dbl>    <dbl> <chr>   <dbl>
#>  1 asis   NA         0            NA   0.0640 PC         NA
#>  2 asis    0.0640    0.00300       0   0.0640 PC         NA
#>  3 asis    0.0654    0.00601       0   0.0654 PC         NA
#>  4 asis    0.0659    0.00901       1   0.0659 PC         NA
#>  5 asis    0.0702    0.0120        0   0.0702 PC         NA
#>  6 asis    0.0709    0.0150        0   0.0709 PC         NA
#>  7 asis    0.0721    0.0180        1   0.0721 PC         NA
#>  8 asis    0.0753    0.0210        0   0.0753 PC         NA
#>  9 asis    0.0764    0.0240        0   0.0764 PC         NA
#> 10 asis    0.0770    0.0270        0   0.0770 PC         NA
#> # ℹ 3,330 more rows

# Show also NPV
p2 <- riskProfile(
  outcome = truth,
  score = rscore,
  include = c("PC", "NPV", "PPV", "1-NPV")
  # or use partial matching: include = c("PC", "N", "PPV", "1")
)
p2$plot

p2$data
#> # A tibble: 3,674 × 7
#>    method   score percentile outcome estimate pv    pvValue
#>    <chr>    <dbl>      <dbl>   <dbl>    <dbl> <chr>   <dbl>
#>  1 asis   NA         0            NA   0.0640 PC         NA
#>  2 asis    0.0640    0.00300       0   0.0640 PC         NA
#>  3 asis    0.0654    0.00601       0   0.0654 PC         NA
#>  4 asis    0.0659    0.00901       1   0.0659 PC         NA
#>  5 asis    0.0702    0.0120        0   0.0702 PC         NA
#>  6 asis    0.0709    0.0150        0   0.0709 PC         NA
#>  7 asis    0.0721    0.0180        1   0.0721 PC         NA
#>  8 asis    0.0753    0.0210        0   0.0753 PC         NA
#>  9 asis    0.0764    0.0240        0   0.0764 PC         NA
#> 10 asis    0.0770    0.0270        0   0.0770 PC         NA
#> # ℹ 3,664 more rows

# All estimates of prediction curve
p3 <- riskProfile(
  outcome = truth,
  score = rscore,
  methods = c("mspline", "gam", "cgam", "binned", "pava", "asis"),
  include = c("PC", "PPV", "1-NPV")
)
p3$plot


# Specifying method arguments (note each list has a "method" element)
p4 <- riskProfile(
  outcome = truth,
  score = rscore,
  methods = list(
    "gam" = list(method = "gam", bs = "tp", logscores = FALSE, fitonPerc = TRUE),
    "risk" = list(method = "asis"), # no available arguments for this method
    "bin" = list(method = "binned", quantiles = 10, errorbar.sem = 1.2)
  )
)
p4$plot


# Compare multiple GAMs in terms of Predictiveness Curves
p5 <- riskProfile(
  outcome = truth,
  score = rscore,
  methods = list(
    "gam_3" = list(method = "gam", k = 3),
    "gam_4" = list(method = "gam", k = 4),
    "gam_7" = list(method = "gam", k = 7)
  ),
  include = "PC"
)
p5$plot


# Using logistic regression as user-defined estimation function, fitting on percentiles
# Function needs to take exactly these two arguments
my_est <- function(outcome, score) {
  # Calculate percentiles
  perc <- ecdf(score)(score)
  # Fit
  m <- glm(outcome ~ perc, family = "binomial")
  # Generate predictions
  preds <- predict(m, type = "response")
  # Return a data.frame with exactly these columns
  return(
    data.frame(
      score = score,
      percentile = perc,
      outcome = outcome,
      estimate = preds
    )
  )
}
p6 <- riskProfile(
  outcome = truth,
  score = rscore,
  methods = list(my_lr = my_est)
)
p6$plot


# Using cgam as user-defined estimation function
# Note that you can also use the predefined cgam using methods = "cgam"
# Attach needed library
# Watch out for masking of mgcv::s and cgam::s if both are attached
library(cgam, quietly = TRUE) 
#> 
#> Attaching package: ‘cgam’
#> The following objects are masked from ‘package:coneproj’:
#> 
#>     conc, conv, decr, decr.conc, decr.conv, incr, incr.conc, incr.conv
# Function needs to take exactly these two arguments
my_est <- function(outcome, score) {
  # Fit on raw predictions with space = "E"
  m <- cgam(
    outcome ~ s.incr(score, numknots = 5, space = "E"),
    family = "binomial"
  )
  # Generate predictions and convert to vector
  preds <- predict(m, type = "response")$fit
  # Return a data.frame with exactly these columns
  out <- data.frame(
    score = score,
    percentile = ecdf(score)(score),
    outcome = outcome,
    estimate = preds
  )
  return(out)
}

p7 <- riskProfile(
  outcome = truth,
  score = rscore,
  methods = list(my_cgam = my_est)
)
p7$plot


# Prevalence adjustment to 0.1
p8 <- riskProfile(outcome = truth, score = rscore, prev.adj = 0.1)
p8$plot