Risk profile plot
riskProfile.Rd
Predictiveness curve, PPV, NPV and 1-NPV risk estimates
Usage
riskProfile(
outcome,
score,
methods = "asis",
prev.adj = NULL,
show.prev = TRUE,
show.nonparam.pv = TRUE,
show.best.pv = TRUE,
include = c("PC", "PPV", "1-NPV"),
plot.raw = FALSE,
rev.order = FALSE
)
Arguments
- outcome
Vector of binary outcome for each observation.
- score
Numeric vector of continuous predicted risk score.
- methods
Character vector of method names (case-insensitive) for plotting curves or a named list where elements are method function and its arguments. Default is set to
"asis"
.Full options are:
c("asis", "binned", "pava", "mspline", "gam", "cgam")
.To specify arguments per method, use lists. For example:
list( pava = list(method = "pava", ties = "primary"), mspline = list(method = "mspline", fitonPerc = TRUE), gam = list(method = "gam", bs = "tp", logscores = FALSE), bin = list(method = "binned", bins = 10), risk = list(method = "asis") )
See section "Estimation" for more details.
- prev.adj
NULL
(default) or scalar numeric between 0 and 1 for prevalence adjustment.- show.prev
Logical, show prevalence value in the graph. Defaults to
TRUE
.- show.nonparam.pv
Logical, show non-parametric calculation of PVs. Defaults to
TRUE
.- show.best.pv
Logical, show best possible PVs. Defaults to
TRUE
.- include
Character vector (case-insensitive, partial matching) specifying what quantities to include in the plot.
Default is:
c("PC", "PPV", "1-NPV")
.Full options are:
c("NPV", "PC", "PPV", "1-NPV")
.- plot.raw
Logical to show percentiles or raw values. Defaults to
FALSE
(i.e. percentiles).- rev.order
Logical, reverse ordering of scores. Defaults to
FALSE
.
Value
A list containing the plot and data, plus errorbar
data if they were requested
(through "binned"
estimation method with a parameter errorbar.sem
).
Estimation
The methods
argument specifies the estimation method.
You can provide either a vector of strings, any of
("asis"
is not available for calibrationProfile
),
or a named list of lists.
In the latter case, the inner list must have an element "method",
which specifies the estimation function (one of those above),
and optionally other elements, which are passed to the estimation function.
For example:
To see what arguments are available for each estimation method,
see the documentation of that function.
The naming convention is getXest
,
where X
stands for the estimation method, for example getGAMest()
.
"gam", "cgam", and "mspline" always fit on percentiles by default.
To change this, use fitonPerc = FALSE
, for example
"gam" and "cgam" methods are wrappers of mgcv::gam()
and cgam::cgam()
, respectively.
The default values of function arguments (like k
, the number of knots in mgcv::s()
)
mirror the package defaults.
Examples
# Read in example data
auroc <- read.csv(system.file("extdata", "sample.csv", package = "stats4phc"))
rscore <- auroc$predicted_calibrated
truth <- as.numeric(auroc$actual)
# Default plot includes 1-NPV, PPV, and a predictiveness curve (PC) based on risk-cutoff
p1 <- riskProfile(outcome = truth, score = rscore)
p1$plot
p1$data
#> # A tibble: 3,340 × 7
#> method score percentile outcome estimate pv pvValue
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 asis NA 0 NA 0.0640 PC NA
#> 2 asis 0.0640 0.00300 0 0.0640 PC NA
#> 3 asis 0.0654 0.00601 0 0.0654 PC NA
#> 4 asis 0.0659 0.00901 1 0.0659 PC NA
#> 5 asis 0.0702 0.0120 0 0.0702 PC NA
#> 6 asis 0.0709 0.0150 0 0.0709 PC NA
#> 7 asis 0.0721 0.0180 1 0.0721 PC NA
#> 8 asis 0.0753 0.0210 0 0.0753 PC NA
#> 9 asis 0.0764 0.0240 0 0.0764 PC NA
#> 10 asis 0.0770 0.0270 0 0.0770 PC NA
#> # ℹ 3,330 more rows
# Show also NPV
p2 <- riskProfile(
outcome = truth,
score = rscore,
include = c("PC", "NPV", "PPV", "1-NPV")
# or use partial matching: include = c("PC", "N", "PPV", "1")
)
p2$plot
p2$data
#> # A tibble: 3,674 × 7
#> method score percentile outcome estimate pv pvValue
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 asis NA 0 NA 0.0640 PC NA
#> 2 asis 0.0640 0.00300 0 0.0640 PC NA
#> 3 asis 0.0654 0.00601 0 0.0654 PC NA
#> 4 asis 0.0659 0.00901 1 0.0659 PC NA
#> 5 asis 0.0702 0.0120 0 0.0702 PC NA
#> 6 asis 0.0709 0.0150 0 0.0709 PC NA
#> 7 asis 0.0721 0.0180 1 0.0721 PC NA
#> 8 asis 0.0753 0.0210 0 0.0753 PC NA
#> 9 asis 0.0764 0.0240 0 0.0764 PC NA
#> 10 asis 0.0770 0.0270 0 0.0770 PC NA
#> # ℹ 3,664 more rows
# All estimates of prediction curve
p3 <- riskProfile(
outcome = truth,
score = rscore,
methods = c("mspline", "gam", "cgam", "binned", "pava", "asis"),
include = c("PC", "PPV", "1-NPV")
)
p3$plot
# Specifying method arguments (note each list has a "method" element)
p4 <- riskProfile(
outcome = truth,
score = rscore,
methods = list(
"gam" = list(method = "gam", bs = "tp", logscores = FALSE, fitonPerc = TRUE),
"risk" = list(method = "asis"), # no available arguments for this method
"bin" = list(method = "binned", quantiles = 10, errorbar.sem = 1.2)
)
)
p4$plot
# Compare multiple GAMs in terms of Predictiveness Curves
p5 <- riskProfile(
outcome = truth,
score = rscore,
methods = list(
"gam_3" = list(method = "gam", k = 3),
"gam_4" = list(method = "gam", k = 4),
"gam_7" = list(method = "gam", k = 7)
),
include = "PC"
)
p5$plot
# Using logistic regression as user-defined estimation function, fitting on percentiles
# Function needs to take exactly these two arguments
my_est <- function(outcome, score) {
# Calculate percentiles
perc <- ecdf(score)(score)
# Fit
m <- glm(outcome ~ perc, family = "binomial")
# Generate predictions
preds <- predict(m, type = "response")
# Return a data.frame with exactly these columns
return(
data.frame(
score = score,
percentile = perc,
outcome = outcome,
estimate = preds
)
)
}
p6 <- riskProfile(
outcome = truth,
score = rscore,
methods = list(my_lr = my_est)
)
p6$plot
# Using cgam as user-defined estimation function
# Note that you can also use the predefined cgam using methods = "cgam"
# Attach needed library
# Watch out for masking of mgcv::s and cgam::s if both are attached
library(cgam, quietly = TRUE)
#>
#> Attaching package: ‘cgam’
#> The following objects are masked from ‘package:coneproj’:
#>
#> conc, conv, decr, decr.conc, decr.conv, incr, incr.conc, incr.conv
# Function needs to take exactly these two arguments
my_est <- function(outcome, score) {
# Fit on raw predictions with space = "E"
m <- cgam(
outcome ~ s.incr(score, numknots = 5, space = "E"),
family = "binomial"
)
# Generate predictions and convert to vector
preds <- predict(m, type = "response")$fit
# Return a data.frame with exactly these columns
out <- data.frame(
score = score,
percentile = ecdf(score)(score),
outcome = outcome,
estimate = preds
)
return(out)
}
p7 <- riskProfile(
outcome = truth,
score = rscore,
methods = list(my_cgam = my_est)
)
p7$plot
# Prevalence adjustment to 0.1
p8 <- riskProfile(outcome = truth, score = rscore, prev.adj = 0.1)
p8$plot