Skip to contents

Various metrics for measuring model performance.

Usage

score_log_loss(y, y_hat, na_rm = FALSE)

score_quadratic(y, y_hat, na_rm = FALSE)

check_score_fun(score_fun)

metrics_unc(score_fun, y, na_rm = FALSE)

metrics_R2(score_fun, y, y_hat, na_rm = FALSE)

metrics_fit_calib(y, y_hat, rev_fct = FALSE)

metrics_DI(score_fun, y, y_hat_calib, na_rm = FALSE)

metrics_MI(score_fun, y, y_hat, y_hat_calib, na_rm = FALSE)

metrics_r2(y, y_hat, y_hat_calib, na_rm = FALSE)

Arguments

y

Vector of observations.

y_hat

Vector of predictions.

na_rm

Logical, defaults to FALSE. Should NAs be removed?

score_fun

A scoring function: score_quadratic, score_log_loss, or a user-defined scoring rule. See below for more details.

rev_fct

Logical, defaults to FALSE. Switch the factor level of the data before performing calibration. Only relevant for binary response.

y_hat_calib

Vector of calibrated predictions. See below for more details.

Value

metrics_fit_calib returns an mgcv::gam() model fit, otherwise a number.

Functions

  • score_log_loss(): Binary log loss score

  • score_quadratic(): Quadratic score

  • check_score_fun(): Utility function for checking the properties of a user-defined score_fun.

  • metrics_unc(): Uncertainty

  • metrics_R2(): R^2 metric

  • metrics_fit_calib(): Fit calibration curve using mgcv::gam(). Note that NAs are always dropped.

  • metrics_DI(): Discrimination index

  • metrics_MI(): Miscalibration index

  • metrics_r2(): r^2 metric based on slope of lm

Scoring function

One can use predefined scores like score_quadratic or score_log_loss. If those do not fit the needs, a user-defined scoring function can also be used. This function needs to take exactly 3 arguments: y (truth values), y_hat (estimated values), and na_rm (should NAs be removed?):

  • both y and y_hat are numeric (not factors!)

  • na_rm is a scalar logical

It needs to return a number. There is a utility function check_score_fun to check if the user-defined function is programmed correctly. It checks the input and the output, but not if the actual returned value makes sense.

Calibration

To obtain calibrated predictions, fit a calibration model and predict based on that model. Users can use their own calibration model or make use of metrics_fit_calib, which fits an mgcv::gam() model with smoother mgcv::s(., k = -1) (automatic knot selection). If the input y is a factor, then a binomial family is used, otherwise a gaussian. NAs are always dropped.

Continuous response example:

calibration_model <- metrics_fit_calib(
  y = truth,
  y_hat = prediction
)
calib_pred <- predict(calibration_model)

Binary response example:

calibration_model <- metrics_fit_calib(
  y = factor(truth, levels = c("0", "1")),
  y_hat = prediction
)
calib_pred <- predict(calibration_model, type = "response")

In the binary case, make sure that:

  • y is a factor with correct level setting. Usually "0" is the reference (first) level and "1" is the event (second level). This may clash with yardstick setting where the first level is by default the "event" level.

  • y_hat are probabilities (not a log of odds).

  • returned calibrated predictions calib_pred are also probabilities by setting type = "response".

Examples

# Scores
score_quadratic(y = c(1.34, 2.8), y_hat = c(1.34, 2.8)) # must be 0
#> [1] 0
score_quadratic(y = 0.5, 0) # must be 0.5**2 = 0.25
#> [1] 0.25

score_log_loss(y = c(0, 1), y_hat = c(0.01, 0.9)) # must be close to 0
#> [1] 0.05770543
score_log_loss(y = 0, y_hat = 0) # undefined
#> [1] NaN

check_score_fun(score_quadratic) # passes without errors

# Metrics based on `lm` model
mod <- lm(hp ~ ., data = mtcars)
truth <- mtcars$hp
pred <- predict(mod)

# calibration fit and calibrated predictions
calib_mod <- metrics_fit_calib(y = truth, y_hat = pred)
calib_pred <- predict(calib_mod)

metrics_unc(score_fun = "score_quadratic", y = truth)
#> [1] 4553.965
metrics_R2(score_fun = "score_quadratic", y = truth, y_hat = pred)
#> [1] 0.9027993
metrics_DI(score_fun = "score_quadratic", y = truth, y_hat_calib = calib_pred)
#> [1] 0.9222323
metrics_MI(score_fun = "score_quadratic", y = truth, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.01943302
# Note that R^2 = DI - MI
metrics_r2(y = truth, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.9027993

# Metrics based on `glm` model (logistic regression)
# Note the correct setting of levels
mod <- glm(factor(vs, levels = c("0", "1")) ~ hp + mpg, data = mtcars, family = "binomial")
truth_fct <- factor(mtcars$vs, levels = c("0", "1"))
truth_num <- mtcars$vs
pred <- predict(mod, type = "response") # type = "response" returns probabilities

# calibration fit and calibrated predictions
calib_mod <- metrics_fit_calib(y = truth_fct, y_hat = pred)
calib_pred <- predict(calib_mod, type = "response") # type = "response" returns probabilities

metrics_unc(score_fun = "score_quadratic", y = truth_num)
#> [1] 0.2460938
metrics_R2(score_fun = "score_quadratic", y = truth_num, y_hat = pred)
#> [1] 0.6498564
metrics_DI(score_fun = "score_quadratic", y = truth_num, y_hat_calib = calib_pred)
#> [1] 0.7358166
metrics_MI(score_fun = "score_quadratic", y = truth_num, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.08596014
# Note that R^2 = DI - MI
metrics_r2(y = truth_num, y_hat = pred, y_hat_calib = calib_pred)
#> [1] 0.6499537