Various metrics for measuring model performance.
metrics.Rd
Various metrics for measuring model performance.
Usage
score_log_loss(y, y_hat, na_rm = FALSE)
score_quadratic(y, y_hat, na_rm = FALSE)
check_score_fun(score_fun)
metrics_unc(score_fun, y, na_rm = FALSE)
metrics_R2(score_fun, y, y_hat, na_rm = FALSE)
metrics_fit_calib(y, y_hat, rev_fct = FALSE)
metrics_DI(score_fun, y, y_hat_calib, na_rm = FALSE)
metrics_MI(score_fun, y, y_hat, y_hat_calib, na_rm = FALSE)
metrics_r2(y, y_hat, y_hat_calib, na_rm = FALSE)
Arguments
- y
Vector of observations.
- y_hat
Vector of predictions.
- na_rm
Logical, defaults to
FALSE
. Should NAs be removed?- score_fun
A scoring function:
score_quadratic
,score_log_loss
, or a user-defined scoring rule. See below for more details.- rev_fct
Logical, defaults to
FALSE
. Switch the factor level of the data before performing calibration. Only relevant for binary response.- y_hat_calib
Vector of calibrated predictions. See below for more details.
Value
metrics_fit_calib
returns an mgcv::gam()
model fit, otherwise a number.
Functions
score_log_loss()
: Binary log loss scorescore_quadratic()
: Quadratic scorecheck_score_fun()
: Utility function for checking the properties of a user-definedscore_fun
.metrics_unc()
: Uncertaintymetrics_R2()
: R^2 metricmetrics_fit_calib()
: Fit calibration curve usingmgcv::gam()
. Note that NAs are always dropped.metrics_DI()
: Discrimination indexmetrics_MI()
: Miscalibration indexmetrics_r2()
: r^2 metric based on slope oflm
Scoring function
One can use predefined scores like score_quadratic
or score_log_loss
.
If those do not fit the needs, a user-defined scoring function can also be used.
This function needs to take exactly 3 arguments: y
(truth values),
y_hat
(estimated values), and na_rm
(should NAs be removed?):
both
y
andy_hat
are numeric (not factors!)na_rm
is a scalar logical
It needs to return a number.
There is a utility function check_score_fun
to check if the user-defined function is
programmed correctly.
It checks the input and the output, but not if the actual returned value makes sense.
Calibration
To obtain calibrated predictions,
fit a calibration model and predict based on that model.
Users can use their own calibration model or make use of metrics_fit_calib
,
which fits an mgcv::gam()
model with smoother mgcv::s(., k = -1)
(automatic knot selection).
If the input y
is a factor, then a binomial family is used, otherwise a gaussian.
NAs are always dropped.
Continuous response example:
calibration_model <- metrics_fit_calib(
y = truth,
y_hat = prediction
)
calib_pred <- predict(calibration_model)
Binary response example:
calibration_model <- metrics_fit_calib(
y = factor(truth, levels = c("0", "1")),
y_hat = prediction
)
calib_pred <- predict(calibration_model, type = "response")
In the binary case, make sure that:
y
is a factor with correct level setting. Usually "0" is the reference (first) level and "1" is the event (second level). This may clash withyardstick
setting where the first level is by default the "event" level.y_hat
are probabilities (not a log of odds).returned calibrated predictions
calib_pred
are also probabilities by settingtype = "response"
.
Examples
# Scores
score_quadratic(y = c(1.34, 2.8), y_hat = c(1.34, 2.8)) # must be 0
#> [1] 0
score_quadratic(y = 0.5, 0) # must be 0.5**2 = 0.25
#> [1] 0.25
score_log_loss(y = c(0, 1), y_hat = c(0.01, 0.9)) # must be close to 0
#> [1] 0.05770543
score_log_loss(y = 0, y_hat = 0) # undefined
#> [1] NaN
check_score_fun(score_quadratic) # passes without errors
# Metrics based on `lm` model
mod <- lm(hp ~ ., data = mtcars)
truth <- mtcars$hp
pred <- predict(mod)
# calibration fit and calibrated predictions
calib_mod <- metrics_fit_calib(y = truth, y_hat = pred)
calib_pred <- predict(calib_mod)
metrics_unc(score_fun = "score_quadratic", y = truth)
#> [1] 4553.965
metrics_R2(score_fun = "score_quadratic", y = truth, y_hat = pred)
#> [1] 0.9027993
metrics_DI(score_fun = "score_quadratic", y = truth, y_hat_calib = calib_pred)
#> [1] 0.9222323
metrics_MI(score_fun = "score_quadratic", y = truth, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.01943302
# Note that R^2 = DI - MI
metrics_r2(y = truth, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.9027993
# Metrics based on `glm` model (logistic regression)
# Note the correct setting of levels
mod <- glm(factor(vs, levels = c("0", "1")) ~ hp + mpg, data = mtcars, family = "binomial")
truth_fct <- factor(mtcars$vs, levels = c("0", "1"))
truth_num <- mtcars$vs
pred <- predict(mod, type = "response") # type = "response" returns probabilities
# calibration fit and calibrated predictions
calib_mod <- metrics_fit_calib(y = truth_fct, y_hat = pred)
calib_pred <- predict(calib_mod, type = "response") # type = "response" returns probabilities
metrics_unc(score_fun = "score_quadratic", y = truth_num)
#> [1] 0.2460938
metrics_R2(score_fun = "score_quadratic", y = truth_num, y_hat = pred)
#> [1] 0.6498564
metrics_DI(score_fun = "score_quadratic", y = truth_num, y_hat_calib = calib_pred)
#> [1] 0.7358166
metrics_MI(score_fun = "score_quadratic", y = truth_num, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.08596014
# Note that R^2 = DI - MI
metrics_r2(y = truth_num, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.6499537